LLM 관련 주요 논문 - 2026-02-11

1. Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning


2. Chain of Mindset: Reasoning with Adaptive Cognitive Modes


3. Discovering High Level Patterns from Simulation Traces


4. Closing Reasoning Gaps in Clinical Agents with Differential Reasoning Learning


5. Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?


6. Would a Large Language Model Pay Extra for a View? Inferring Willingness to Pay from Subjective Choices


7. GHS-TDA: A Synergistic Reasoning Framework Integrating Global Hypothesis Space with Topological Data Analysis


8. ClinAlign: Scaling Healthcare Alignment from Clinician Preference


9. Autoregressive Direct Preference Optimization


10. SpotAgent: Grounding Visual Geo-localization in Large Vision-Language Models through Agentic Reasoning


11. P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads


12. Auditing Multi-Agent LLM Reasoning Trees Outperforms Majority Vote and LLM-as-Judge


13. CoMMa: Contribution-Aware Medical Multi-Agents From A Game-Theoretic Perspective


14. PABU: Progress-Aware Belief Update for Efficient LLM Agents


15. A Small-Scale System for Autoregressive Program Synthesis Enabling Controlled Experimentation


16. Biases in the Blind Spot: Detecting What LLMs Fail to Mention


17. Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization


18. Fake-HR1: Rethinking reasoning of vision language model for synthetic image detection


19. Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference


20. Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design


21. A Unified Assessment of the Poverty of the Stimulus Argument for Neural Language Models


22. Routing, Cascades, and User Choice for LLMs


23. Text summarization via global structure awareness


24. Decomposing Reasoning Efficiency in Large Language Models


25. Flexible Entropy Control in RLVR with Gradient-Preserving Perspective


26. Maastricht University at AMIYA: Adapting LLMs for Dialectal Arabic using Fine-tuning and MBR Decoding


27. GenSeg-R1: RL-Driven Vision-Language Grounding for Fine-Grained Referring Segmentation


28. MATA: Multi-Agent Framework for Reliable and Flexible Table Question Answering


29. Stop Testing Attacks, Start Diagnosing Defenses: The Four-Checkpoint Framework Reveals Where LLM Safety Breaks


30. AGMark: Attention-Guided Dynamic Watermarking for Large Vision-Language Models


31. On the Optimal Reasoning Length for RL-Trained Language Models


32. Context-Aware Counterfactual Data Augmentation for Gender Bias Mitigation in Language Models


33. Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs


34. LEMUR: A Corpus for Robust Fine-Tuning of Multilingual Law Embedding Models for Retrieval


35. Comprehensive Comparison of RAG Methods Across Multi-Domain Conversational QA


36. EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies


37. Seeing the Goal, Missing the Truth: Human Accountability for AI Bias


38. Beware of the Batch Size: Hyperparameter Bias in Evaluating LoRA


39. Listen to the Layers: Mitigating Hallucinations with Inter-Layer Disagreement


40. NOWJ @BioCreative IX ToxHabits: An Ensemble Deep Learning Approach for Detecting Substance Use and Contextual Information in Clinical Texts


41. SWE-AGI: Benchmarking Specification-Driven Software Construction with MoonBit in the Era of Autonomous Agents


42. Conceptual Cultural Index: A Metric for Cultural Specificity via Relative Generality


43. Evaluating Social Bias in RAG Systems: When External Context Helps and Reasoning Hurts


44. A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors


45. Sci-VLA: Agentic VLA Inference Plugin for Long-Horizon Tasks in Scientific Experiments


46. Accelerating Post-Quantum Cryptography via LLM-Driven Hardware-Software Co-Design


47. LLMAC: A Global and Explainable Access Control Framework with Large Language Model


48. Contractual Deepfakes: Can Large Language Models Generate Contracts?


49. BiasScope: Towards Automated Detection of Bias in LLM-as-a-Judge Evaluation


50. Behavioral Economics of AI: LLM Biases and Corrections


51. AgentCgroup: Understanding and Controlling OS Resources of AI Agents


52. Beyond Uniform Credit: Causal Credit Assignment for Policy Optimization


53. Clarifying Shampoo: Adapting Spectral Descent to Stochasticity and the Parameter Trajectory


54. Don’t Shoot The Breeze: Topic Continuity Model Using Nonlinear Naive Bayes With Attention


55. Empowering Contrastive Federated Sequential Recommendation with LLMs


56. Effective Reasoning Chains Reduce Intrinsic Dimensionality


57. VLM-Guided Iterative Refinement for Surgical Image Segmentation with Foundation Models


58. MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks


59. $n$-Musketeers: Reinforcement Learning Shapes Collaboration Among Language Models


60. What do Geometric Hallucination Detection Metrics Actually Measure?


61. Benchmarking the Energy Savings with Speculative Decoding Strategies


62. Distributed Hybrid Parallelism for Large Language Models: Comparative Study and System Design Guide


63. NarraScore: Bridging Visual Narrative and Musical Dynamics via Hierarchical Affective Control


64. AntigenLM: Structure-Aware DNA Language Modeling for Influenza


65. Scaling GraphLLM with Bilevel-Optimized Sparse Querying