LLM 관련 주요 논문 - 2026-03-24

1. MARCUS: An agentic, multimodal vision-language model for cardiac diagnosis and management


2. GSEM: Graph-based Self-Evolving Memory for Experience Augmented Clinical Reasoning


3. A Context Engineering Framework for Improving Enterprise AI Agents based on Digital-Twin MDP


4. Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models


5. The Presupposition Problem in Representation Genesis


6. EvoIdeator: Evolving Scientific Ideas through Checklist-Grounded Reinforcement Learning


7. CurvZO: Adaptive Curvature-Guided Sparse Zeroth-Order Optimization for Efficient LLM Fine-Tuning


8. Compensating Visual Insufficiency with Stratified Language Guidance for Long-Tail Class Incremental Learning


9. MIND: Multi-agent inference for negotiation dialogue in travel planning


10. Deterministic Hallucination Detection in Medical VQA via Confidence-Evidence Bayesian Gain


11. AI Token Futures Market: Commoditization of Compute and Derivatives Contract Design


12. Mirage The Illusion of Visual Understanding


13. Silicon Bureaucracy and AI Test-Oriented Education: Contamination Sensitivity and Score Confidence in LLM Benchmarks


14. EnterpriseLab: A Full-Stack Platform for developing and deploying agents in Enterprises


15. A Multidisciplinary AI Board for Multimodal Dementia Characterization and Risk Assessment


16. Mind over Space: Can Multimodal Large Language Models Mentally Navigate?


17. Adaptive Robust Estimator for Multi-Agent Reinforcement Learning


18. Counterfactual Credit Policy Optimization for Multi-Agent Collaboration


19. DomAgent: Leveraging Knowledge Graphs and Case-Based Reasoning for Domain-Specific Code Generation


20. Silent Commitment Failure in Instruction-Tuned Language Models: Evidence of Governability Divergence Across Architectures


21. Persona Vectors in Games: Measuring and Steering Strategies via Activation Vectors


22. AdaRubric: Task-Adaptive Rubrics for LLM Agent Evaluation


23. AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling


24. RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models


25. Improving Coherence and Persistence in Agentic AI for System Optimization


26. The Library Theorem: How External Organization Governs Agentic Reasoning Capacity


27. Graph of States: Solving Abductive Tasks with Large Language Models


28. ConsRoute:Consistency-Aware Adaptive Query Routing for Cloud-Edge-Device Large Language Models


29. Revisiting Tree Search for LLMs: Gumbel and Sequential Halving for Budget-Scalable Reasoning


30. Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs


31. ORACLE: Optimizing Reasoning Abilities of Large Language Models via Constraint-Led Synthetic Data Elicitation


32. KLDrive: Fine-Grained 3D Scene Reasoning for Autonomous Driving based on Knowledge Graph


33. Knowledge Boundary Discovery for Large Language Models


34. A Framework for Low-Latency, LLM-driven Multimodal Interaction on the Pepper Robot


35. Can we automatize scientific discovery in the cognitive sciences?


36. Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions


37. Do LLM-Driven Agents Exhibit Engagement Mechanisms? Controlled Tests of Information Load, Descriptive Norms, and Popularity Cues


38. Modeling Epistemic Uncertainty in Social Perception via Rashomon Set Agents


39. AI-Driven Multi-Agent Simulation of Stratified Polyamory Systems: A Computational Framework for Optimizing Social Reproductive Efficiency


40. Towards Intelligent Geospatial Data Discovery: a knowledge graph-driven multi-agent framework powered by large language models


41. Attention in Space: Functional Roles of VLM Heads for Spatial Reasoning


42. From 50% to Mastery in 3 Days: A Low-Resource SOP for Localizing Graduate-Level AI Tutors via Shadow-RAG


43. Seed1.8 Model Card: Towards Generalized Real-World Agency


44. Context Cartography: Toward Structured Governance of Contextual Space in Large Language Model Systems


45. LLM-Driven Heuristic Synthesis for Industrial Process Control: Lessons from Hot Steel Rolling


46. Grounded Chess Reasoning in Language Models via Master Distillation


47. Deep reflective reasoning in interdependence constrained structured data extraction from clinical notes for digital health


48. LLM-Enhanced Energy Contrastive Learning for Out-of-Distribution Detection in Text-Attributed Graphs


49. Me, Myself, and $π$ : Evaluating and Explaining LLM Introspection


50. FactorSmith: Agentic Simulation Generation via Markov Decision Process Decomposition with Planner-Designer-Critic Refinement


51. Domain-Specialized Tree of Thought through Plug-and-Play Predictors


52. ProMAS: Proactive Error Forecasting for Multi-Agent Systems Using Markov Transition Dynamics


53. AgenticGEO: A Self-Evolving Agentic System for Generative Engine Optimization


54. UniMotion: A Unified Framework for Motion-Text-Vision Understanding and Generation


55. ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model


56. 3D-Layout-R1: Structured Reasoning for Language-Instructed Spatial Editing


57. Confidence-Based Decoding is Provably Efficient for Diffusion Language Models


58. SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation


59. Evaluating the Reliability and Fidelity of Automated Judgment Systems of Large Language Models


60. SPA: A Simple but Tough-to-Beat Baseline for Knowledge Injection


61. CayleyPy-4: AI-Holography. Towards analogs of holographic string dualities for AI tasks


62. Seeing is Improving: Visual Feedback for Iterative Text Layout Refinement


63. Enhancing Document-Level Machine Translation via Filtered Synthetic Corpora and Two-Stage LLM Adaptation


64. Multimodal Survival Analysis with Locally Deployable Large Language Models


65. Mamba-VMR: Multimodal Query Augmentation via Generated Videos for Precise Temporal Grounding


66. On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation


67. On the Failure of Topic-Matched Contrast Baselines in Multi-Directional Refusal Abliteration


68. Uncertainty-guided Compositional Alignment with Part-to-Whole Semantic Representativeness in Hyperbolic Vision-Language Models


69. ROM: Real-time Overthinking Mitigation via Streaming Detection and Intervention


70. SecureBreak – A dataset towards safe and secure models


71. Parameter-Efficient Fine-Tuning for Medical Text Summarization: A Comparative Study of Lora, Prompt Tuning, and Full Fine-Tuning


72. P^2O: Joint Policy and Prompt Optimization


73. Manifold-Aware Exploration for Reinforcement Learning in Video Generation


74. SemEval-2026 Task 12: Abductive Event Reasoning: Towards Real-World Event Causal Inference for Large Language Models


75. Rethinking Token Reduction for Large Vision-Language Models


76. Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models


77. Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks


78. AgenticRec: End-to-End Tool-Integrated Policy Optimization for Ranking-Oriented Recommender Agents


79. mSFT: Addressing Dataset Mixtures Overfiting Heterogeneously in Multi-task SFT


80. Riemannian Geometry Speaks Louder Than Words: From Graph Foundation Model to Next-Generation Graph Intelligence


81. PRISM: Breaking the O(n) Memory Wall in Long-Context LLM Inference via O(1) Photonic Block Selection



83. CatRAG: Functor-Guided Structural Debiasing with Retrieval Augmentation for Fair LLMs


84. SafePilot: A Framework for Assuring LLM-enabled Cyber-Physical Systems


85. Efficient Failure Management for Multi-Agent Systems with Reasoning Trace Representation


86. When Documents Disagree: Measuring Institutional Variation in Transplant Guidance with Retrieval-Augmented Language Models


87. KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning


88. LLM-Powered Workflow Optimization for Multidisciplinary Software Development: An Automotive Industry Case Study


89. Efficient Fine-Tuning Methods for Portuguese Question Answering: A Comparative Study of PEFT on BERTimbau and Exploratory Evaluation of Generative LLMs


90. Benchmarking Bengali Dialectal Bias: A Multi-Stage Framework Integrating RAG-Based Translation and Human-Augmented RLAIF


91. COINBench: Moving Beyond Individual Perspectives to Collective Intent Understanding


92. enhancing reasoning accuracy in large language models during inference time


93. When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning


94. WARBENCH: A Comprehensive Benchmark for Evaluating LLMs in Military Decision-Making


95. Conversation Tree Architecture: A Structured Framework for Context-Aware Multi-Branch LLM Conversations


96. Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity


97. QMoP: Query Guided Mixture-of-Projector for Efficient Visual Token Compression


98. Context Selection for Hypothesis and Statistical Evidence Extraction from Full-Text Scientific Articles


99. LLM-based Automated Architecture View Generation: Where Are We Now?


100. Prompt replay: speeding up grpo with on-policy reuse of high-signal prompts


101. Reward Sharpness-Aware Fine-Tuning for Diffusion Models


102. TRACE: A Multi-Agent System for Autonomous Physical Reasoning in Seismological Science


103. Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains


104. Mitigating Selection Bias in Large Language Models via Permutation-Aware GRPO


105. ALL-FEM: Agentic Large Language models Fine-tuned for Finite Element Methods


106. How AI Systems Think About Education: Analyzing Latent Preference Patterns in Large Language Models


107. ECI: Effective Contrastive Information to Evaluate Hard-Negatives


108. Detection of adversarial intent in Human-AI teams using LLMs


109. Learning to Aggregate Zero-Shot LLM Agents for Corporate Disclosure Classification


110. Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models


111. User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction


112. AC4A: Access Control for Agents


113. Mitigating Shortcut Reasoning in Language Models: A Gradient-Aware Training Approach


114. RubricRAG: Towards Interpretable and Reliable LLM Evaluation via Domain Knowledge Retrieval for Rubric Generation


115. SozKZ: Training Efficient Small Language Models for Kazakh from Scratch


116. Reasoning Topology Matters: Network-of-Thought for Complex Reasoning Tasks


117. Satellite-to-Street: Synthesizing Post-Disaster Views from Satellite Imagery via Generative Vision Models


118. PAVE: Premise-Aware Validation and Editing for Retrieval-Augmented LLMs


119. Weber’s Law in Transformer Magnitude Representations: Efficient Coding, Representational Geometry, and Psychophysical Laws in Language Models


120. AEGIS: From Clues to Verdicts – Graph-Guided Deep Vulnerability Reasoning via Dialectics and Meta-Auditing


121. Permutation-Consensus Listwise Judging for Robust Factuality Evaluation


122. An Industrial-Scale Retrieval-Augmented Generation Framework for Requirements Engineering: Empirical Evaluation with Automotive Manufacturing Data


123. Epistemic Observability in Language Models


124. Evaluating Large Language Models on Historical Health Crisis Knowledge in Resource-Limited Settings: A Hybrid Multi-Metric Study


125. ReBOL: Retrieval via Bayesian Optimization with Batched LLM Relevance Observations and Query Reformulation


126. Measuring Reasoning Trace Legibility: Can Those Who Understand Teach?


127. Diffutron: A Masked Diffusion Language Model for Turkish Language


128. Policies Permitting LLM Use for Polishing Peer Reviews Are Currently Not Enforceable


129. Solver-Aided Verification of Policy Compliance in Tool-Augmented LLM Agents


130. ALICE: A Multifaceted Evaluation Framework of Large Audio-Language Models’ In-Context Learning Ability


131. Coding Agents are Effective Long-Context Processors


132. PEARL: Personalized Streaming Video Understanding Model


133. Thinking in Different Spaces: Domain-Specific Latent Geometry Survives Cross-Architecture Translation


134. KV Cache Optimization Strategies for Scalable and Efficient LLM Inference


135. The production of meaning in the processing of natural language


136. Memory poisoning and secure multi-agent systems


137. Leum-VL Technical Report


138. Procedural Refinement by LLM-driven Algorithmic Debugging for ARC-AGI-2


139. When Agents Disagree: The Selection Bottleneck in Multi-Agent LLM Pipelines


140. GIP-RAG: An Evidence-Grounded Retrieval-Augmented Framework for Interpretable Gene Interaction and Pathway Impact Analysis


141. The Causal Impact of Tool Affordance on Safety Alignment in LLM Agents


142. Bypassing Document Ingestion: An MCP Approach to Financial Q&A


143. Semantic Tool Discovery for Large Language Models: A Vector-Based Approach to MCP Tool Selection


144. kRAIG: A Natural Language-Driven Agent for Automated DataOps Pipeline Generation


145. From Human Interfaces to Agent Interfaces: Rethinking Software Design in the Age of AI-Native Systems


146. On the Fragility of AI Agent Collusion


147. Understanding Pruning Regimes in Vision-Language Models Through Domain-Aware Layer Selection


148. Deciphering Scientific Reasoning Steps from Outcome Data for Molecule Optimization


149. SciNav: A General Agent Framework for Scientific Coding Tasks


150. Decoding the decoder: Contextual sequence-to-sequence modeling for intracortical speech decoding


151. Writing literature reviews with AI: principles, hurdles and some lessons learned


152. Email in the Era of LLMs


153. Characterizing the ability of LLMs to recapitulate Americans’ distributional responses to public opinion polling questions across political issues


154. The Arrival of AGI? When Expert Personas Exceed Expert Benchmarks


155. Locally Coherent Parallel Decoding in Diffusion Language Models


156. Exploring Teacher-Chatbot Interaction and Affect in Block-Based Programming


157. Children’s Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs


158. RedacBench: Can AI Erase Your Secrets?


159. Enhancing Safety of Large Language Models via Embedding Space Separation


160. Measuring Research Convergence in Interdisciplinary Teams Using Large Language Models and Graph Analytics