LLM 관련 주요 논문 - 2026-03-26

1. AI-Supervisor: Autonomous AI Research Supervision via a Persistent Research World Model


2. Enhanced Mycelium of Thought (EMoT): A Bio-Inspired Hierarchical Reasoning Architecture with Strategic Dormancy and Mnemonic Encoding


3. ELITE: Experiential Learning and Intent-Aware Transfer for Self-improving Embodied Agents


4. Language-Grounded Multi-Agent Planning for Personalized and Fair Participatory Urban Sensing



6. AnalogAgent: Self-Improving Analog Circuit Design Automation with LLM Agents


7. DUPLEX: Agentic Dual-System Planning via LLM-Driven Information Extraction


8. SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems


9. VehicleMemBench: An Executable Benchmark for Multi-User Long-Term Memory in In-Vehicle Agents


10. Efficient Benchmarking of AI Agents


11. LLMs Do Not Grade Essays Like Humans


12. Grounding Vision and Language to 3D Masks for Long-Horizon Box Rearrangement


13. GTO Wizard Benchmark


14. Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments


15. Environment Maps: Structured Environmental Representations for Long-Horizon Agents


16. PLDR-LLMs Reason At Self-Organized Criticality


17. VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models


18. LensWalk: Agentic Video Understanding by Planning How You See in Videos


19. Evaluating Chunking Strategies For Retrieval-Augmented Generation in Oil and Gas Enterprise Documents


20. UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience


21. Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs


22. When AI Meets Early Childhood Education: Large Language Models as Assessment Teammates in Chinese Preschools


23. MolEvolve: LLM-Guided Evolutionary Search for Interpretable Molecular Optimization


24. Enhancing Efficiency and Performance in Deepfake Audio Detection through Neuron-level dropin & Neuroplasticity Mechanisms


25. Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing


26. Large Language Model Guided Incentive Aware Reward Design for Cooperative Multi-Agent Reinforcement Learning


27. The Specification Gap: Coordination Failure Under Partial Knowledge in Code Agents


28. Environment-Grounded Multi-Agent Workflow for Autonomous Penetration Testing


29. Who Benefits from RAG? The Role of Exposure, Utility and Attribution Bias


30. Where Do Your Citations Come From? Citation-Constellation: A Free, Open-Source, No-Code, and Auditable Tool for Citation Network Decomposition with Complementary BARON and HEROCON Scores



32. A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula


33. MedAidDialog: A Multilingual Multi-Turn Medical Dialogue Dataset for Accessible Healthcare


34. The Alignment Tax: Response Homogenization in Aligned LLMs and Its Implications for Uncertainty Estimation


35. Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization


36. When Understanding Becomes a Risk: Authenticity and Safety Risks in the Emerging Image Generation Paradigm


37. Mitigating Object Hallucinations in LVLMs via Attention Imbalance Rectification


38. Schema on the Inside: A Two-Phase Fine-Tuning Method for High-Efficiency Text-to-SQL at Scale


39. Understanding the Challenges in Iterative Generative Optimization with LLMs


40. From Untamed Black Box to Interpretable Pedagogical Orchestration: The Ensemble of Specialized LLMs Architecture for Adaptive Tutoring


41. The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More


42. Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage


43. Revealing Multi-View Hallucination in Large Vision-Language Models


44. Self-Distillation for Multi-Token Prediction


45. HDPO: Hybrid Distillation Policy Optimization via Privileged Self-Distillation


46. Can VLMs Reason Robustly? A Neuro-Symbolic Investigation


47. PoliticsBench: Benchmarking Political Values in Large Language Models with Multi-Turn Roleplay


48. Perturbation: A simple and efficient adversarial tracer for representation learning in language models


49. Object Search in Partially-Known Environments via LLM-informed Model-based Planning and Prompt Selection


50. The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense


51. AI-driven Intent-Based Networking Approach for Self-configuration of Next Generation Networks


52. The Diminishing Returns of Early-Exit Decoding in Modern LLMs


53. Assessment Design in the AI Era: A Method for Identifying Items Functioning Differentially for Humans and Chatbots


54. PLACID: Privacy-preserving Large language models for Acronym Clinical Inference and Disambiguation


55. Probing Ethical Framework Representations in Large Language Models: Structure, Entanglement, and Methodological Challenges



57. Ukrainian Visual Word Sense Disambiguation Benchmark


58. A Theory of LLM Information Susceptibility


59. LLMLOOP: Improving LLM-Generated Code and Tests through Automated Iterative Feedback Loops


60. LLMORPH: Automated Metamorphic Testing of Large Language Models


61. Wafer-Level Etch Spatial Profiling for Process Monitoring from Time-Series with Time-LLM


62. APreQEL: Adaptive Mixed Precision Quantization For Edge LLMs


63. Synthetic Mixed Training: Scaling Parametric Knowledge Acquisition Beyond RAG


64. CAPTCHA Solving for Native GUI Agents: Automated Reasoning-Action Data Generation and Self-Corrective Training


65. Mixture of Demonstrations for Textual Graph Understanding and Question Answering


66. Large Language Models and Scientific Discourse: Where’s the Intelligence?


67. MDKeyChunker: Single-Call LLM Enrichment with Rolling Keys and Key-Based Restructuring for High-Accuracy RAG


68. Generating Hierarchical JSON Representations of Scientific Sentences Using LLMs


69. Did You Forget What I Asked? Prospective Memory Failures in Large Language Models


70. Konkani LLM: Multi-Script Instruction Tuning and Evaluation for a Low-Resource Indian Language


71. Navigating the Concept Space of Language Models


72. Qworld: Question-Specific Evaluation Criteria for LLMs


73. Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages


74. From Physician Expertise to Clinical Agents: Preserving, Standardizing, and Scaling Physicians’ Medical Expertise with Lightweight LLM


75. MedMT-Bench: Can LLMs Memorize and Understand Long Multi-Turn Conversations in Medical Scenarios?


76. MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens


77. Training a Large Language Model for Medical Coding Using Privacy-Preserving Synthetic Clinical Data


78. DepthCharge: A Domain-Agnostic Framework for Measuring Depth-Dependent Knowledge in Large Language Models


79. Berta: an open-source, modular tool for AI-enabled clinical documentation


80. S-Path-RAG: Semantic-Aware Shortest-Path Retrieval Augmented Generation for Multi-Hop Knowledge Graph Question Answering


81. DISCO: Document Intelligence Suite for COmparative Evaluation


82. Visuospatial Perspective Taking in Multimodal Language Models


83. Internal Safety Collapse in Frontier Large Language Models


84. Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes


85. Leveraging Computerized Adaptive Testing for Cost-effective Evaluation of Large Language Models in Medical Benchmarking


86. Evidence for Limited Metacognition in LLMs