LLM 관련 주요 논문 - 2026-03-31

1. Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning


2. The Ultimate Tutorial for AI-driven Scale Development in Generative Psychometrics: Releasing AIGENIE from its Bottle


3. Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning


4. MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models


5. Towards a Medical AI Scientist


6. The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation


7. COvolve: Adversarial Co-Evolution of Large-Language-Model-Generated Policies and Environments via Two-Player Zero-Sum Game


8. Deep Research of Deep Research: From Transformer to Agent, From AI to AI for Science


9. CoE: Collaborative Entropy for Uncertainty Quantification in Agentic Multi-LLM Systems


10. A Multi-Agent Rhizomatic Pipeline for Non-Linear Literature Analysis


11. Evaluating LLMs for Answering Student Questions in Introductory Programming Courses


12. EpiPersona: Persona Projection and Episode Coupling for Pluralistic Preference Modeling


13. PReD: An LLM-based Foundation Multimodal Model for Electromagnetic Perception, Recognition, and Decision


14. SLOW: Strategic Logical-inference Open Workspace for Cognitive Adaptation in AI Tutoring


15. Meta-Harness: End-to-End Optimization of Model Harnesses


16. Beyond the Answer: Decoding the Behavior of LLMs as Scientific Reasoners


17. CARV: A Diagnostic Benchmark for Compositional Analogical Reasoning in Multimodal LLMs


18. GEAKG: Generative Executable Algorithm Knowledge Graphs


19. GAAMA: Graph Augmented Associative Memory for Agents


20. Let the Agent Steer: Closed-Loop Ranking Optimization via Influence Exchange


21. TianJi:An autonomous AI meteorologist for discovering physical mechanisms in atmospheric science


22. DSevolve: Enabling Real-Time Adaptive Scheduling on Dynamic Shop Floor with LLM-Evolved Heuristic Portfolios


23. Dual-Stage LLM Framework for Scenario-Centric Semantic Interpretation in Driving Assistance


24. PeopleSearchBench: A Multi-Dimensional Benchmark for Evaluating AI-Powered People Search Platforms


25. AstraAI: LLMs, Retrieval, and AST-Guided Assistance for HPC Codebases


26. Greedy Is a Strong Default: Agents as Iterative Optimizers


27. Heterogeneous Debate Engine: Identity-Grounded Cognitive Architecture for Resilient LLM-Based Ethical Tutoring


28. Defend: Automated Rebuttals for Peer Review with Minimal Author Guidance


29. LLM Readiness Harness: Evaluation, Observability, and CI Gates for LLM/RAG Applications


30. Beyond Completion: Probing Cumulative State Tracking to Predict LLM Agent Performance


31. A Comparative Study in Surgical AI: Datasets, Foundation Models, and Barriers to Med-AGI


32. CounterMoral: Editing Morals in Language Models


33. AutoMS: Multi-Agent Evolutionary Search for Cross-Physics Inverse Microstructure Design


34. Aligning LLMs with Graph Neural Solvers for Combinatorial Optimization


35. daVinci-LLM:Towards the Science of Pretraining


36. MediHive: A Decentralized Agent Collective for Medical Reasoning


37. When Verification Hurts: Asymmetric Effects of Multi-Agent Feedback in Logic Proof Tutoring


38. Transparency as Architecture: Structural Compliance Gaps in EU AI Act Article 50 II


39. SAGAI-MID: A Generative AI-Driven Middleware for Dynamic Runtime Interoperability


40. AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding


41. AMIGO: Agentic Multi-Image Grounding Oracle Benchmark


42. Information-Theoretic Limits of Safety Verification for Self-Improving Systems


43. ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning


44. Moving Beyond Review: Applying Language Models to Planning and Translation in Reflection


45. Navigating the Mirage: A Dual-Path Agentic Framework for Robust Misleading Chart Question Answering


46. CirrusBench: Evaluating LLM-based Agents Beyond Correctness in Real-World Cloud Service Environments


47. Fine-Tuning Large Language Models for Cooperative Tactical Deconfliction of Small Unmanned Aerial Systems


48. Domain-Invariant Prompt Learning for Vision-Language Models


49. Hydra: Unifying Document Retrieval and Generation in a Single Vision-Language Model


50. RAD-LAD: Rule and Language Grounded Autonomous Driving in Real-Time


51. Courtroom-Style Multi-Agent Debate with Progressive RAG and Role-Switching for Controversial Claim Verification


52. Evolutionary Discovery of Reinforcement Learning Algorithms via Large Language Models


53. Membership Inference Attacks against Large Audio Language Models


54. Coherent Without Grounding, Grounded Without Success: Observability and Epistemic Failure


55. Crossing the NL/PL Divide: Information Flow Analysis Across the NL/PL Boundary in LLM-Integrated Code


56. Integrating Multimodal Large Language Model Knowledge into Amodal Completion


57. Building evidence-based knowledge graphs from full-text literature for disease-specific biomedical reasoning


58. Merge and Conquer: Instructing Multilingual Models by Adding Target Language Weights


59. Categorical Perception in Large Language Model Hidden States: Structural Warping at Digit-Count Boundaries


60. DiffAttn: Diffusion-Based Drivers’ Visual Attention Prediction with LLM-Enhanced Semantic Reasoning


61. ERPO: Token-Level Entropy-Regulated Policy Optimization for Large Reasoning Models


62. Evaluating Privilege Usage of Agents on Real-World Tools


63. Does Claude’s Constitution Have a Culture?


64. Transcription and Recognition of Italian Parliamentary Speeches Using Vision-Language Models


65. MolmoPoint: Better Pointing for VLMs with Grounding Tokens


66. Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers


67. ViviDoc: Generating Interactive Documents through Human-Agent Collaboration


68. CDH-Bench: A Commonsense-Driven Hallucination Benchmark for Evaluating Visual Fidelity in Vision-Language Models


69. JaWildText: A Benchmark for Vision-Language Models on Japanese Scene Text Understanding


70. Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey


71. ITQ3_S: High-Fidelity 3-bit LLM Inference via Interleaved Ternary Quantization with Rotation-Domain Smoothing


72. KVSculpt: KV Cache Compression as Distillation


73. Towards Context-Aware Image Anonymization with Multi-Agent Reasoning


74. EvA: An Evidence-First Audio Understanding Paradigm for LALMs


75. Umwelt Engineering: Designing the Cognitive Worlds of Linguistic Agents


76. STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding


77. InnerPond: Fostering Inter-Self Dialogue with a Multi-Agent Approach for Introspection


78. Toward Reliable Evaluation of LLM-Based Financial Multi-Agent Systems: Taxonomy, Coordination Primacy, and Cost Awareness


79. A Systematic Taxonomy of Security Vulnerabilities in the OpenClaw AI Agent Framework


80. Learning to Focus and Precise Cropping: A Reinforcement Learning Framework with Information Gaps and Grounding Loss for MLLMs


81. AgentSwing: Adaptive Parallel Context Management Routing for Long-Horizon Web Agents


82. Difference Feedback: Generating Multimodal Process-Level Supervision for VLM Reinforcement Learning


83. On Token’s Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models


84. Multi-Agent Dialectical Refinement for Enhanced Argument Classification


85. Improving Attributed Long-form Question Answering with Intent Awareness


86. Multiple-Prediction-Powered Inference


87. The Geometry of Harmful Intent: Training-Free Anomaly Detection via Angular Deviation in LLM Residual Streams


88. Grounding Social Perception in Intuitive Physics


89. Conditional Factuality Controlled LLMs with Generalization Certificates via Conformal Sampling


90. Culturally Adaptive Explainable LLM Assessment for Multilingual Information Disorder: A Human-in-the-Loop Approach


91. GUIDE: Guided Updates for In-context Decision Evolution in LLM-Driven Spacecraft Operations


92. Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP


93. Amalgam: Hybrid LLM-PGM Synthesis Algorithm for Accuracy and Realism


94. Zero-shot Vision-Language Reranking for Cross-View Geolocalization


95. Diagnosing and Repairing Unsafe Channels in Vision-Language Models via Causal Discovery and Dual-Modal Safety Subspace Projection


96. EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams


97. SafetyDrift: Predicting When AI Agents Cross the Line Before They Actually Do


98. Sovereign Context Protocol: An Open Attribution Layer for Human-Generated Content in the Age of Large Language Models


99. ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding


100. Debiasing Large Language Models toward Social Factors in Online Behavior Analytics through Prompt Knowledge Tuning


101. Persona-Based Simulation of Human Opinion at Population Scale


102. AutoSiMP: Autonomous Topology Optimization from Natural Language via LLM-Driven Problem Configuration and Adaptive Solver Control


103. ASTER – Agentic Science Toolkit for Exoplanet Research


104. Are LLMs Good For Quantum Software, Architecture, and System Design?


105. Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation


106. Stable Reasoning, Unstable Responses: Mitigating LLM Deception via Stability Asymmetry


107. GISclaw: An Open-Source LLM-Powered Agent System for Full-Stack Geospatial Analysis


108. VAN-AD: Visual Masked Autoencoder with Normalizing Flow For Time Series Anomaly Detection


109. SpatialAnt: Autonomous Zero-Shot Robot Navigation via Active Scene Reconstruction and Visual Anticipation


110. A Regression Framework for Understanding Prompt Component Impact on LLM Performance


111. Squish and Release: Exposing Hidden Hallucinations by Making Them Surface as Safety Signals


112. Throughput Optimization as a Strategic Lever in Large-Scale AI Systems: Evidence from Dataloader and Memory Profiling Innovations


113. Resolving the Robustness-Precision Trade-off in Financial RAG through Hybrid Document-Routed Retrieval


114. GroupRAG: Cognitively Inspired Group-Aware Retrieval and Reasoning via Knowledge-Driven Problem Structuring


115. Explaining, Verifying, and Aligning Semantic Hierarchies in Vision-Language Model Embeddings


116. Robust Batch-Level Query Routing for Large Language Models under Cost and Capacity Constraints


117. CRISP: Characterizing Relative Impact of Scholarly Publications


118. A Step Toward Federated Pretraining of Multimodal Large Language Models


119. Limits of Imagery Reasoning in Frontier LLM Models


120. Learning to Select Visual In-Context Demonstrations


121. From Content to Audience: A Multimodal Annotation Framework for Broadcast Television Analytics


122. Edge Reliability Gap in Vision-Language Models: Quantifying Failure Modes of Compressed VLMs Under Visual Corruption


123. Aesthetic Assessment of Chinese Handwritings Based on Vision Language Models


124. SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model


125. Distilled Large Language Model-Driven Dynamic Sparse Expert Activation Mechanism


126. Contextual inference from single objects in Vision-Language models


127. SEAR: Schema-Based Evaluation and Routing for LLM Gateways


128. Agentic AI for Human Resources: LLM-Driven Candidate Assessment


129. The Cognitive Divergence: AI Context Windows, Human Attention Decline, and the Delegation Feedback Loop


130. SpatialPoint: Spatial-aware Point Prediction for Embodied Localization


131. LITTA: Late-Interaction and Test-Time Alignment for Visually-Grounded Multimodal Retrieval


132. AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment


133. AI Meets Mathematics Education: A Case Study on Supporting an Instructor in a Large Mathematics Class with Context-Aware AI


134. Can AI be a Teaching Partner? Evaluating ChatGPT, Gemini, and DeepSeek across Three Teaching Strategies


135. ReCQR: Incorporating conversational query rewriting to improve Multimodal Image Retrieval


136. Bridge-RAG: An Abstract Bridge Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter


137. M-RAG: Making RAG Faster, Stronger, and More Efficient


138. SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs


139. Exploring Cultural Variations in Moral Judgments with Large Language Models