LLM 관련 주요 논문 - 2025-10-10

1. CaRT: Teaching LLM Agents to Know When They Know Enough


2. AutoMLGen: Navigating Fine-Grained Optimization for Coding Agents


3. Looking to Learn: Token-wise Dynamic Gating for Low-Resource Vision-Language Modelling


4. Revisiting Hallucination Detection with Effective Rank-based Uncertainty


5. QAgent: A modular Search Agent with Interactive Query Understanding


6. LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings


7. Beyond Pass@k: Breadth-Depth Metrics for Reasoning Boundaries


8. First Try Matters: Revisiting the Role of Reflection in Reasoning Models


9. Chain-of-Trigger: An Agentic Backdoor that Paradoxically Enhances Agentic Robustness


10. Selection, Reflection and Self-Refinement: Revisit Reasoning Tasks via a Causal Lens


11. AutoQual: An LLM Agent for Automated Discovery of Interpretable Features for Review Quality Assessment


12. Multi-Condition Conformal Selection


13. LinguaSim: Interactive Multi-Vehicle Testing Scenario Generation via Natural Language Instruction Based on Large Language Models


14. AILoRA: Function-Aware Asymmetric Initialization for Low-Rank Adaptation of Large Language Models


15. Language Models Do Not Embed Numbers Continuously


16. VoiceAgentBench: Are Voice Assistants ready for agentic tasks?


17. TaoSR-SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance


18. Enabling Personalized Long-term Interactions in LLM-based Agents through Persistent Memory and User Profiles


19. Profit Mirage: Revisiting Information Leakage in LLM-based Financial Agents


20. Understanding DeepResearch via Reports


21. Augur: Modeling Covariate Causal Associations in Time Series via Large Language Models


22. FinMR: A Knowledge-Intensive Multimodal Benchmark for Advanced Financial Reasoning


23. An LLM-Powered Cooperative Framework for Large-Scale Multi-Vehicle Navigation


24. GCPO: When Contrast Fails, Go Gold


25. An approach for systematic decomposition of complex llm tasks


26. From Noisy to Native: LLM-driven Graph Restoration for Test-Time Graph Domain Adaptation


27. Haibu Mathematical-Medical Intelligent Agent:Enhancing Large Language Model Reliability in Medical Tasks via Verifiable Reasoning Chains


28. SurveyG: A Multi-Agent LLM Framework with Hierarchical Citation Graph for Automated Survey Generation


29. oMeBench: Towards Robust Benchmarking of LLMs in Organic Mechanism Elucidation and Reasoning


30. Multimodal Safety Evaluation in Generative Agent Social Simulations


31. Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models


32. Traceability and Accountability in Role-Specialized Multi-Agent LLM Pipelines


33. AgentAsk: Multi-Agent Systems Need to Ask


34. An Evaluation Study of Hybrid Methods for Multilingual PII Detection


35. Measuring and Mitigating Identity Bias in Multi-Agent Debate via Anonymization



37. Evaluation of LLMs for Process Model Analysis and Optimization


38. ExpertAgent: Enhancing Personalized Education through Dynamic Planning and Retrieval-Augmented Long-Chain Reasoning


39. TS-Agent: A Time Series Reasoning Agent with Iterative Statistical Insight Gathering


40. ProSEA: Problem Solving via Exploration Agents


41. Base Models Know How to Reason, Thinking Models Learn When


42. L2M-AID: Autonomous Cyber-Physical Defense by Fusing Semantic Reasoning of Large Language Models with Multi-Agent Reinforcement Learning (Preprint)


43. BLAZER: Bootstrapping LLM-based Manipulation Agents with Zero-Shot Data Generation


44. ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation


45. MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning


46. VideoNorms: Benchmarking Cultural Awareness of Video Language Models


47. On the optimization dynamics of RLVR: Gradient gap and step size thresholds


48. SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models


49. CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards


50. To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models


51. DeepPrune: Parallel Scaling without Inter-trace Redundancy


52. xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning


53. Evaluating Small Vision-Language Models on Distance-Dependent Traffic Perception


54. Iterated Agent for Symbolic Regression


55. Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization


56. Opponent Shaping in LLM Agents


57. Contrastive Decoding for Synthetic Data Generation in Low-Resource Language Modeling


58. The Hidden Bias: A Study on Explicit and Implicit Political Stereotypes in Large Language Models


59. LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions


60. Memory Retrieval and Consolidation in Large Language Models through Function Tokens


61. Sentiment Matters: An Analysis of 200 Human-SAV Interactions


62. NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions


63. DACIP-RC: Domain Adaptive Continual Instruction Pre-Training via Reading Comprehension on Business Conversations


64. AI Knowledge Assist: An Automated Approach for the Creation of Knowledge Bases for Conversational AI Agents


65. Think Just Enough: Sequence-Level Entropy as a Confidence Signal for LLM Reasoning


66. Improving Temporal Understanding Logic Consistency in Video-Language Models via Attention Enhancement


67. Approximate Domain Unlearning for Vision-Language Models


68. Interpreting LLM-as-a-Judge Policies via Verifiable Global Explanations


69. Lossless Vocabulary Reduction for Auto-Regressive Language Models


70. The Price of Thought: A Multilingual Analysis of Reasoning, Performance, and Cost of Negotiation in Large Language Models


71. Everything is Plausible: Investigating the Impact of LLM Rationales on Human Notions of Plausibility


72. A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models


73. TaoSR-AGRL: Adaptive Guided Reinforcement Learning Framework for E-commerce Search Relevance


74. Towards Reliable LLM-based Robot Planning via Combined Uncertainty Estimation


75. Past, Present, and Future of Bug Tracking in the Generative AI Era


76. Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks


77. Fewer Weights, More Problems: A Practical Attack on LLM Pruning



79. Active Confusion Expression in Large Language Models: Leveraging World Models toward Better Social Reasoning


80. LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?


81. A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning


82. STEPER: Step-wise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models


83. Towards Human-Like Grading: A Unified LLM-Enhanced Framework for Subjective Question Evaluation


84. Contrastive Weak-to-strong Generalization


85. AdaSwitch: Adaptive Switching Generation for Knowledge Distillation


86. Self-Improving LLM Agents at Test-Time


87. MetaDefense: Defending Finetuning-based Jailbreak Attack Before and During Generation


88. SIMU: Selective Influence Machine Unlearning


89. Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents


90. Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models


91. LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology


92. IntentionVLA: Generalizable and Efficient Embodied Intention Reasoning for Human-Robot Interaction


93. Drift No More? Context Equilibria in Multi-Turn LLM Interactions


94. ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning


95. Parallel Test-Time Scaling for Latent Reasoning Models


96. AppForge: From Assistant to Independent Developer - Are GPTs Ready for Software Development?


97. Rethinking Reasoning: A Survey on Reasoning-based Backdoors in LLMs


98. Stress-Testing Model Specs Reveals Character Differences among Language Models


99. OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference


100. Banking Done Right: Redefining Retail Banking with Language-Centric AI


101. Vocabulary embeddings organize linguistic structure early in language model training


102. Investigating Thematic Patterns and User Preferences in LLM Interactions using BERTopic


103. TRAVL: A Recipe for Making Video-Language Models Better Judges of Physics Implausibility


104. OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs


105. MLLM4TS: Leveraging Vision and Multimodal Language Models for General Time-Series Analysis


106. When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs


107. Can Speech LLMs Think while Listening?


108. Can Lessons From Human Teams Be Applied to Multi-Agent Systems? The Role of Structure, Diversity, and Interaction Dynamics


109. LASER: An LLM-based ASR Scoring and Evaluation Rubric


110. Haystack Engineering: Context Engineering for Heterogeneous and Agentic Long-Context Evaluation


111. Attention to Order: Transformers Discover Phase Transitions via Learnability


112. Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts