LLM 관련 주요 논문 - 2025-09-30

1. Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective


2. Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time


3. StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models


4. The Emergence of Altruism in Large-Language-Model Agents Society


5. REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model


6. TrueGradeAI: Retrieval-Augmented and Bias-Resistant AI for Transparent and Explainable Digital Assessments


7. Estimating the Empowerment of Language Model Agents


8. InfiAgent: Self-Evolving Pyramid Agent Framework for Infinite Scenarios


9. GeoSketch: A Neural-Symbolic Approach to Geometric Multimodal Reasoning with Auxiliary Line Construction and Affine Transformation


10. Guiding Evolution of Artificial Life Using Vision-Language Models


11. Do LLM Agents Know How to Ground, Recover, and Assess? A Benchmark for Epistemic Competence in Information-Seeking Agents


12. Large Language Models as Nondeterministic Causal Models


13. InfiMed-Foundation: Pioneering Advanced Multimodal Medical Models with Compute-Efficient Pre-Training and Multi-Stage Fine-Tuning


14. Evaluating LLMs for Combinatorial Optimization: One-Phase and Two-Phase Heuristics for 2D Bin-Packing


15. Log2Plan: An Adaptive GUI Automation Framework Integrated with Task Mining Approach


16. The Thinking Spectrum: An Emperical Study of Tunable Reasoning in LLMs through Model Merging


17. GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments


18. Bilinear relational structure fixes reversal curse and enables consistent model editing


19. CoBel-World: Harnessing LLM Reasoning to Build a Collaborative Belief World for Optimizing Embodied Multi-Agent Collaboration


20. Reimagining Agent-based Modeling with Large Language Model Agents via Shachi


21. DS-STAR: Data Science Agent via Iterative Planning and Verification


22. ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration


23. D-Artemis: A Deliberative Cognitive Framework for Mobile GUI Multi-Agents


24. Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety


25. UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios


26. GeoEvolve: Automating Geospatial Model Discovery via Multi-Agent Large Language Models


27. Correct Reasoning Paths Visit Shared Decision Pivots


28. Towards mitigating information leakage when evaluating safety monitors


29. See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation


30. VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing


31. CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning


32. Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs


33. Hierarchical Representation Matching for CLIP-based Class-Incremental Learning


34. WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning


35. Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity


36. Language Models Can Learn from Verbal Feedback Without Scalar Rewards


37. Variational Reasoning for Language Models


38. Towards Efficient Online Exploration for Reinforcement Learning with Human Feedback


39. Quantile Advantage Estimation for Entropy-Safe Reasoning


40. Retrieval-Augmented Guardrails for AI-Drafted Patient-Portal Messages: Error Taxonomy Construction and Large-Scale Evaluation


41. InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models


42. OFMU: Optimization-Driven Framework for Machine Unlearning


43. Exploring Solution Divergence and Its Effect on Large Language Model Problem Solving



45. MDAR: A Multi-scene Dynamic Audio Reasoning Benchmark


46. Chimera: Diagnosing Shortcut Learning in Visual-Language Understanding


47. Partial Parameter Updates for Efficient Distributed Training


48. Explaining multimodal LLMs via intra-modal token interactions


49. RAU: Reference-based Anatomical Understanding with Vision Language Models


50. SpinGPT: A Large-Language-Model Approach to Playing Poker Correctly


51. Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM Approach


52. What Is The Political Content in LLMs’ Pre- and Post-Training Data?


53. CHRONOBERG: Capturing Language Evolution and Temporal Awareness in Foundation Models


54. Stochastic activations


55. Transformers Can Learn Connectivity in Some Graphs but Not Others


56. HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space


57. Leveraging Large Language Models for Robot-Assisted Learning of Morphological Structures in Preschool Children with Language Vulnerabilities


58. Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks


59. Secure and Efficient Access Control for Computer-Use Agents via Context Space


60. Beyond Textual Context: Structural Graph Encoding with Adaptive Space Alignment to alleviate the hallucination of LLMs


61. Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance


62. FeatBench: Evaluating Coding Agents on Feature Implementation for Vibe Coding


63. Polysemous Language Gaussian Splatting via Matching-based Mask Lifting


64. Thinking in Many Modes: How Composite Reasoning Elevates Large Language Model Performance with Limited Data


65. Question-Driven Analysis and Synthesis: Building Interpretable Thematic Trees with LLMs for Text Clustering and Controllable Generation


66. The Outputs of Large Language Models are Meaningless


67. Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs


68. Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding


69. R-Capsule: Compressing High-Level Plans for Efficient Large Language Model Reasoning


70. Multi-Agent Path Finding via Offline RL and LLM Collaboration


71. Universal Legal Article Prediction via Tight Collaboration between Supervised Classification Model and LLM


72. SecureAgentBench: Benchmarking Secure Code Generation under Realistic Vulnerability Scenarios


73. The Rogue Scalpel: Activation Steering Compromises LLM Safety


74. Fuzzy Reasoning Chain (FRC): An Innovative Reasoning Framework from Fuzziness to Clarity


75. Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics


76. Black-Box Hallucination Detection via Consistency Under the Uncertain Expression


77. ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models


78. Benchmarking and Mitigate Psychological Sycophancy in Medical Vision-Language Models


79. Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning


80. From Superficial Outputs to Superficial Learning: Risks of Large Language Models in Education


81. Active Attacks: Red-teaming LLMs via Adaptive Environments


82. Debiasing Large Language Models in Thai Political Stance Detection via Counterfactual Calibration


83. Why Chain of Thought Fails in Clinical Text Understanding


84. SAGE: Scene Graph-Aware Guidance and Execution for Long-Horizon Manipulation Tasks


85. AutoSCORE: Enhancing Automated Scoring with Multi-Agent Large Language Models via Structured Component Recognition


86. A Large-Scale Dataset and Citation Intent Classification in Turkish with LLMs


87. You Can’t Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors


88. Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards


89. No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping


90. Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization


91. Enhancing Low-Rank Adaptation with Structured Nonlinear Transformations


92. Graph of Agents: Principled Long Context Modeling by Emergent Multi-Agent Collaboration


93. Can Large Language Models Autoformalize Kinematics?


94. DiTraj: training-free trajectory control for video diffusion transformer


95. Evaluating and Improving Cultural Awareness of Reward Models for LLM Alignment


96. FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning


97. Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models


98. Self-Speculative Biased Decoding for Faster Live Translation


99. POLO: Preference-Guided Multi-Turn Reinforcement Learning for Lead Optimization


100. QueryGym: Step-by-Step Interaction with Relational Databases


101. MobiLLM: An Agentic AI Framework for Closed-Loop Threat Mitigation in 6G Open RANs


102. InvBench: Can LLMs Accelerate Program Verification with Invariant Synthesis?


103. Guiding Audio Editing with Audio Language Model


104. OjaKV: Context-Aware Online Low-Rank KV Cache Compression with Oja’s Rule


105. Multi-Objective Reinforcement Learning for Large Language Model Optimization: Visionary Perspective


106. Preemptive Detection and Steering of LLM Misalignment via Latent Reachability


107. Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training


108. Dual-Head Reasoning Distillation: Improving Classifier Accuracy with Train-Time-Only Reasoning


109. Learning to Reason with Mixture of Tokens


110. Gender Stereotypes in Professional Roles Among Saudis: An Analytical Study of AI-Generated Images Using Language Models


111. One Model, Many Morals: Uncovering Cross-Linguistic Misalignments in Computational Moral Reasoning


112. PhenoMoler: Phenotype-Guided Molecular Optimization via Chemistry Large Language Model


113. How Large Language Models Need Symbolism


114. MIXRAG : Mixture-of-Experts Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering


115. ReGeS: Reciprocal Retrieval-Generation Synergy for Conversational Recommender Systems


116. Context Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMs


117. Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models


118. Influence Guided Context Selection for Effective Retrieval-Augmented Generation


119. MDF-MLLM: Deep Fusion Through Cross-Modal Feature Alignment for Contextually Aware Fundoscopic Image Classification


120. A Novel Differential Feature Learning for Effective Hallucination Detection and Classification


121. Phrase-grounded Fact-checking for Automatically Generated Chest X-ray Reports


122. KV-Efficient VLA: A Method of Speed up Vision Language Model with RNN-Gated Chunked KV Cache


123. Random Direct Preference Optimization for Radiography Report Generation


124. PIR-RAG: A System for Private Information Retrieval in Retrieval-Augmented Generation