LLM 관련 주요 논문 - 2025-11-26

1. PRInTS: Reward Modeling for Long-Horizon Information Seeking


2. AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning


3. EEG-VLM: A Hierarchical Vision-Language Model with Multi-Level Feature Alignment and Visually Enhanced Language-Guided Reasoning for EEG Image-Based Sleep Stage Prediction


4. LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models


5. Synthesizing Visual Concepts as Vision-Language Programs


6. MoodBench 1.0: An Evaluation Benchmark for Emotional Companionship Dialogue Systems


7. UNeMo: Collaborative Visual-Language Reasoning and Navigation via a Multimodal World Model


8. NEZHA: A Zero-sacrifice and Hyperspeed Decoding Architecture for Generative Recommendations


9. HERMES: Towards Efficient and Verifiable Mathematical Reasoning in LLMs


10. HuggingR$^{4}$: A Progressive Reasoning Framework for Discovering Optimal Model Companions


11. MAGMA-Edu: Multi-Agent Generative Multimodal Framework for Text-Diagram Educational Question Generation


12. ORIGAMISPACE: Benchmarking Multimodal LLMs in Multi-Step Spatial Reasoning with Mathematical Constraints


13. A Multimodal Conversational Agent for Tabular Data Analysis


14. Natural Emergent Misalignment from Reward Hacking in Production RL


15. Progressive Localisation in Localist LLMs


16. KGpipe: Generation and Evaluation of Pipelines for Data Integration into Knowledge Graphs


17. Weakly-supervised Latent Models for Task-specific Visual-Language Control


18. The Catastrophic Paradox of Human Cognitive Frameworks in Large Language Model Evaluation: A Comprehensive Empirical Analysis of the CHC-LLM Incompatibility


19. Steering Latent Traits, Not Learned Facts: An Empirical Study of Activation Control Limits


20. How Far Can LLMs Emulate Human Behavior?: A Strategic Analysis via the Buy-and-Sell Negotiation Game


21. Leveraging Evidence-Guided LLMs to Enhance Trustworthy Depression Diagnosis


22. Alignment Faking - the Train -> Deploy Asymmetry: Through a Game-Theoretic Lens with Bayesian-Stackelberg Equilibria


23. ChemVTS-Bench: Evaluating Visual-Textual-Symbolic Reasoning of Multimodal Large Language Models in Chemistry


24. Training Emergent Joint Associations: A Reinforcement Learning Approach to Creative Thinking in Language Models


25. QuickLAP: Quick Language-Action Preference Learning for Autonomous Driving Agents


26. Learning to Debug: LLM-Organized Knowledge Trees for Solving RTL Assertion Failures


27. AI- and Ontology-Based Enhancements to FMEA for Advanced Systems Engineering: Current Developments and Future Directions


28. M3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark


29. Bridging Symbolic Control and Neural Reasoning in LLM Agents: The Structured Cognitive Loop


30. Cognitive Inception: Agentic Reasoning against Visual Deceptions by Injecting Skepticism


31. Prompt Less, Smile More: MTP with Semantic Engineering in Lieu of Prompt Engineering


32. Beyond Protein Language Models: An Agentic LLM Framework for Mechanistic Enzyme Design


33. SLMFix: Leveraging Small Language Models for Error Fixing with Reinforcement Learning


34. Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens


35. Be My Eyes: Extending Large Language Models to New Modalities Through Multi-Agent Collaboration


36. Leveraging LLMs for reward function design in reinforcement learning control tasks


37. Generative Query Expansion with Multilingual LLMs for Cross-Lingual Information Retrieval


38. What Drives Cross-lingual Ranking? Retrieval Approaches with Multilingual Language Models


39. Open-weight genome language model safeguards: Assessing robustness via adversarial fine-tuning


40. Solar-GECO: Perovskite Solar Cell Property Prediction with Geometric-Aware Co-Attention


41. A Nutrition Multimodal Photoplethysmography Language Model


42. Medusa: Cross-Modal Transferable Adversarial Attacks on Multimodal Medical Retrieval-Augmented Generation


43. MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization


44. In Machina N400: Pinpointing Where a Causal Language Model Detects Semantic Violations


45. Are Large Vision Language Models Truly Grounded in Medical Images? Evidence from Italian Clinical Visual Question Answering


46. Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization


47. LLM-Based Agentic Negotiation for 6G: Addressing Uncertainty Neglect and Tail-Event Risk


48. From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation


49. GraphMind: Theorem Selection and Conclusion Generation Framework with Dynamic GNN for LLM Reasoning


50. Large Language Model-Assisted Planning of Electric Vehicle Charging Infrastructure with Real-World Case Study


51. MedSAM3: Delving into Segment Anything with Medical Concepts


52. OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs


53. FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning


54. SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression


55. Skeletons Matter: Dynamic Data Augmentation for Text-to-Query


56. Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations


57. Look It Up: Analysing Internal Web Search Capabilities of Modern LLMs


58. LLM-Driven Kernel Evolution: Automating Driver Updates in Linux


59. How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining


60. Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models


61. CoreEval: Automatically Building Contamination-Resilient Datasets with Real-World Knowledge toward Reliable LLM Evaluation


62. KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit


63. Generating Reading Comprehension Exercises with Large Language Models for Educational Applications


64. Time Travel: LLM-Assisted Semantic Behavior Localization with Git Bisect


65. Pre-Filtering Code Suggestions using Developer Behavioral Telemetry to Optimize LLM-Assisted Programming


66. Optimizing LLM Code Suggestions: Feedback-Driven Timing with Lightweight State Bounds


67. Solving a Research Problem in Mathematical Statistics with AI Assistance


68. HyperbolicRAG: Enhancing Retrieval-Augmented Generation with Hyperbolic Representations


69. Any4D: Open-Prompt 4D Generation from Natural Language and Images


70. RhinoInsight: Improving Deep Research through Control Mechanisms for Model Behavior and Context


71. Thinking Ahead: Foresight Intelligence in MLLMs and World Models


72. Empathetic Cascading Networks: A Multi-Stage Prompting Technique for Reducing Social Biases in Large Language Models


73. VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking


74. MedVision: Dataset and Benchmark for Quantitative Medical Image Analysis


75. FHE-Agent: Automating CKKS Configuration for Practical Encrypted Inference via an LLM-Guided Agentic Framework


76. Kitty: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost


77. Health system learning achieves generalist neuroimaging models


78. No Free Lunch in Language Model Bias Mitigation? Targeted Bias Reduction Can Exacerbate Unmitigated LLM Biases


79. Majority of the Bests: Improving Best-of-N via Bootstrapping


80. OpenGloss: A Synthetic Encyclopedic Dictionary and Semantic Knowledge Graph


81. Strategic Decision Framework for Enterprise LLM Adoption


82. Re(Visiting) Time Series Foundation Models in Finance


83. Multimodal Continual Learning with MLLMs from Multi-scenario Perspectives


84. MindEval: Benchmarking Language Models on Multi-turn Mental Health Support


85. Evaluating perturbation robustnessof generative systems that use COBOL code inputs


86. Shadows in the Code: Exploring the Risks and Defenses of LLM-based Multi-Agent Software Development Systems


87. DocPTBench: Benchmarking End-to-End Photographed Document Parsing and Translation


88. General Agentic Memory Via Deep Research


89. Findings of the BlackboxNLP 2025 Shared Task: Localizing Circuits and Causal Variables in Language Models


90. OmniStruct: Universal Text-to-Structure Generation across Diverse Schemas


91. AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert


92. Clinician-Directed Large Language Model Software Generation for Therapeutic Interventions in Physical Rehabilitation


93. LLM Reasoning for Cold-Start Item Recommendation


94. Hybrid Agentic AI and Multi-Agent Systems in Smart Manufacturing


95. Can LLMs Help Allocate Public Health Resources? A Case Study on Childhood Lead Testing


96. Enhancing Large Language Models for Automated Homework Assessment in Undergraduate Circuit Analysis


97. ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization


98. Towards a General Framework for HTN Modeling with LLMs


99. Nested Unfolding Network for Real-World Concealed Object Segmentation


100. Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models


101. VCU-Bridge: Hierarchical Visual Connotation Understanding via Semantic Bridging


102. The Alignment Paradox of Medical Large Language Models in Infertility Care: Decoupling Algorithmic Improvement from Clinical Decision-making Quality


103. MASTEST: A LLM-Based Multi-Agent System For RESTful API Tests


104. Extracting Interaction-Aware Monosemantic Concepts in Recommender Systems


105. Plan-X: Instruct Video Generation via Semantic Planning


106. Measuring the Impact of Lexical Training Data Coverage on Hallucination Detection in Large Language Models


107. PA-FAS: Towards Interpretable and Generalizable Multimodal Face Anti-Spoofing via Path-Augmented Reinforcement Learning


108. Towards Efficient LLM-aware Heterogeneous Graph Learning


109. Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction


110. A superpersuasive autonomous policy debating system


111. APRIL: Annotations for Policy evaluation with Reliable Inference from LLMs


112. Point of Order: Action-Aware LLM Persona Modeling for Realistic Civic Simulation


113. Episodic Memory in Agentic Frameworks: Suggesting Next Tasks


114. Understanding Counting Mechanisms in Large Language and Vision-Language Models


115. Liberating Logic in the Age of AI: Going Beyond Programming with Computational Thinking


116. ARISE: Agentic Rubric-Guided Iterative Survey Engine for Automated Scholarly Paper Generation


117. Datacenters in the Desert: Feasibility and Sustainability of LLM Inference in the Middle East


118. A Cross-Cultural Assessment of Human Ability to Detect LLM-Generated Fake News about South Africa


119. Research and Prototyping Study of an LLM-Based Chatbot for Electromagnetic Simulations


120. Chatbots to strengthen democracy: An interdisciplinary seminar to train identifying argumentation techniques of science denial


121. LLM and Agent-Driven Data Analysis: A Systematic Approach for Enterprise Applications and System-level Deployment


122. MURMUR: Using cross-user chatter to break collaborative language agents in groups


123. Empa: An AI-Powered Virtual Mentor for Developing Global Collaboration Skills in HPC Education


124. Evaluating Adversarial Vulnerabilities in Modern Large Language Models


125. Model-to-Model Knowledge Transmission (M2KT): A Data-Free Framework for Cross-Model Understanding Transfer


126. Can we use LLMs to bootstrap reinforcement learning? – A case study in digital health behavior change


127. From Competition to Coordination: Market Making as a Scalable Framework for Safe and Aligned Multi-Agent LLM Systems


128. From Projection to Prediction: Beyond Logits for Scalable Language Models


129. GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms


130. LLM-Powered Text-Attributed Graph Anomaly Detection via Retrieval-Augmented Reasoning


131. Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation


132. Efficient Mathematical Reasoning Models via Dynamic Pruning and Knowledge Distillation


133. Binary BPE: A Family of Cross-Platform Tokenizers for Binary Analysis


134. Generative Caching for Structurally Similar Prompts and Responses


135. LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models


136. $A^3$: Attention-Aware Accurate KV Cache Fusion for Fast Large Language Model Serving


137. A Multidisciplinary Design and Optimization (MDO) Agent Driven by Large Language Models


138. AURA: Adaptive Unified Reasoning and Automation with LLM-Guided MARL for NextG Cellular Networks