LLM 관련 주요 논문 - 2026-05-15

1. OpenDeepThink: Parallel Reasoning via Bradley–Terry Aggregation


2. APWA: A Distributed Architecture for Parallelizable Agentic Workflows


3. Dual-Dimensional Consistency: Balancing Budget and Quality in Adaptive Inference-Time Scaling


4. Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use


5. Small, Private Language Models as Teammates for Educational Assessment Design


6. Explainable Detection of Depression Status Shifts from User Digital Traces


7. Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems


8. A Deterministic Agentic Workflow for HS Tariff Classification: Multi-Dimensional Rule Reasoning with Interpretable Decisions


9. Emotion-Attended Stateful Memory (EASM):The Architecture for Hyper-Personalization at Scale


10. A Heterogeneous Temporal Memory Governance Framework for Long-Term LLM Persona Consistency


11. AI Outperforms Humans in Personalized Image Aesthetics Assessment via LLM-Based Interviews and Semantic Feature Extraction


12. XDomainBench: Diagnosing Reasoning Collapse in High-Dimensional Scientific Knowledge Composition


13. Agentifying Patient Dynamics within LLMs through Interacting with Clinical World Model


14. $π$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows



16. MindGap: A Conversational AI Framework for Upstream Neuroplastic Intervention in Post-Traumatic Stress Disorder


17. Teaching Large Language Models When Not to Know: Learning Temporal Critique for Ex-Ante Reasoning


18. Sycophancy is an Educational Safety Risk: Why LLM Tutors Need Sycophancy Benchmarks


19. Prompt Segmentation and Annotation Optimisation: Controlling LLM Behaviour via Optimised Segment-Level Annotations


20. Complacent, Not Sycophantic: Reframing Large Language Models and Designing AI Literacy for Complacent Machines


21. VerbalValue: A Socially Intelligent Virtual Host for Sales-Driven Live Commerce


22. Cattle Trade: A Multi-Agent Benchmark for LLM Bluffing, Bidding, and Bargaining


23. Deepchecks: Evaluating Retrieval-Augmented Generation (RAG)


24. LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning


25. From Table to Cell: Attention for Better Reasoning with TABALIGN


26. OmniDrop: Layer-wise Token Pruning for Omni-modal LLMs via Query-Guidance


27. Stateful Reasoning via Insight Replay


28. Prompting Policies for Multi-step Reasoning and Tool-Use in Black-box LLMs with Iterative Distillation of Experience


29. BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE


30. DVMap: Fine-Grained Pluralistic Value Alignment via High-Consensus Demographic-Value Mapping


31. Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis


32. Nexus : An Agentic Framework for Time Series Forecasting


33. Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces


34. CrystalReasoner: Reasoning and RL for Property-Conditioned Crystal Structure Generation


35. Hypergraph Enterprise Agentic Reasoner over Heterogeneous Business Systems


36. Good to Go: The LOOP Skill Engine That Hits 99% Success and Slashes Token Usage by 99% via One-Shot Recording and Deterministic Replay


37. SimPersona: Learning Discrete Buyer Personas from Raw Clickstreams for Grounded E-Commerce Agents


38. Grounded Continuation: A Linear-Time Runtime Verifier for LLM Conversations


39. Agentic Systems as Boosting Weak Reasoning Models


40. Distribution-Aware Algorithm Design with LLM Agents


41. SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration


42. Know When To Fold ‘Em: Token-Efficient LLM Synthetic Data Generation via Multi-Stage In-Flight Rejection


43. Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning


44. SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks



46. Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use


47. Enhanced and Efficient Reasoning in Large Learning Models


48. From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents


49. Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems


50. A Two-Dimensional Framework for AI Agent Design Patterns: Cognitive Function and Execution Topology


51. GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration


52. Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment


53. MeMo: Memory as a Model


54. Self-Distilled Agentic Reinforcement Learning


55. Widening the Gap: Exploiting LLM Quantization via Outlier Injection


56. Improving Multi-turn Dialogue Consistency with Self-Recall Thinking


57. Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs


58. On the Cultural Anachronism and Temporal Reasoning in Vision Language Models


59. TFGN: Task-Free, Replay-Free Continual Pre-Training Without Catastrophic Forgetting at LLM Scale


60. SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning


61. AI Knows When It’s Being Watched: Functional Strategic Action and Contextual Register Modulation in Large Language Models


62. SemaTune: Semantic-Aware Online OS Tuning with Large Language Models


63. Generalized Priority-Aware Shapley Value


64. COTCAgent: Preventive Consultation via Probabilistic Chain-of-Thought Completion


65. Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance


66. Quantifying and Mitigating Premature Closure in Frontier LLMs


67. Viverra: Text-to-Code with Guarantees


68. MHSA: A Lightweight Framework for Mitigating Hallucinations via Steered Attention in LVLMs


69. Your CLIP has 164 dimensions of noise: Exploring the embeddings covariance eigenspectrum of contrastively pretrained vision-language transformers


70. Towards In-Depth Root Cause Localization for Microservices with Multi-Agent Recursion-of-Thought


71. IFPV: An Integrated Multi-Agent Framework for Generative Operational Planning and High-Fidelity Plan Verification


72. XFP: Quality-Targeted Adaptive Codebook Quantization with Sparse Outlier Separation for LLM Inference


73. GPart: End-to-End Isometric Fine-Tuning via Global Parameter Partitioning


74. Beyond AI as Assistants: Toward Autonomous Discovery in Cosmology


75. Graphs of Research: Citation Evolution Graphs as Supervision for Research Idea Generation


76. Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces


77. Beyond What to Select: A Plug-and-play Oscillatory Data-Volume Scheduling for Efficient Model Training


78. Streaming Speech-to-Text Translation with a SpeechLLM


79. Cognitive-Uncertainty Guided Knowledge Distillation for Accurate Classification of Student Misconceptions


80. EVA: Editing for Versatile Alignment against Jailbreaks


81. Non-linear Interventions on Large Language Models


82. Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining


83. Mechanical Enforcement for LLM Governance:Evidence of Governance-Task Decoupling in Financial Decision Systems


84. TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability


85. Towards Label-Free Single-Cell Phenotyping Using Multi-Task Learning


86. Vision-Core Guided Contrastive Learning for Balanced Multi-modal Prognosis Prediction of Stroke


87. SceneFunRI: Reasoning the Invisible for Task-Driven Functional Object Localization


88. AI-assisted cultural heritage dissemination: Comparing NMT and glossary-augmented LLM translation in rock art documents


89. Agentic Design of Compositional Descriptors via Autoresearch for Materials Science Applications


90. MultiEmo-Bench: Multi-label Visual Emotion Analysis for Multi-modal Large Language Models


91. Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution


92. Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy


93. Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits


94. RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation


95. Dimension-Level Intent Fidelity Evaluation for Large Language Models: Evidence from Structured Prompt Ablation


96. Contestable Multi-Agent Debate with Arena-based Argumentative Computation for Multimedia Verification


97. When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition


98. MemLineage: Lineage-Guided Enforcement for LLM Agent Memory


99. The Great Pretender: A Stochasticity Problem in LLM Jailbreak


100. SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades


101. Agentic Recommender System with Hierarchical Belief-State Memory


102. Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning


103. Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement


104. Correctness-Aware Repository Filtering Under Maximum Effective Context Window Constraints


105. ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition


106. To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model


107. Web Agents Should Adopt the Plan-Then-Execute Paradigm


108. Watermarking Game-Playing Agents in Perfect-Information Extensive-Form Games


109. Dynamics of the Transformer Residual Stream: Coupling Spectral Geometry to Network Topology


110. Active Learners as Efficient PRP Rerankers


111. AudioMosaic: Contrastive Masked Audio Representation Learning


112. Diagnosing Training Inference Mismatch in LLM Reinforcement Learning


113. PreFT: Prefill-only finetuning for efficient inference


114. LLM-Based Robustness Testing of Microservice Applications: An Empirical Study


115. Why Retrieval-Augmented Generation Fails: A Graph Perspective


116. Thinking Ahead: Prospection-Guided Retrieval of Memory with Language Models


117. ExploitBench: A Capability Ladder Benchmark for LLM Cybersecurity Agents


118. ROK-FORTRESS: Measuring the Effect of Geopolitical Transcreation for National Security and Public Safety


119. Reinforcement Learning for Tool-Calling Agents in Fast Healthcare Interoperability Resources (FHIR)


120. Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards


121. ProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows


122. PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts


123. Derivation Prompting: A Logic-Based Method for Improving Retrieval-Augmented Generation


124. Do Language Models Align with Brains? Prediction Scores Are Not Enough


125. Towards Resource-Efficient LLMs: End-to-End Energy Accounting of Distillation Pipelines


126. Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction


127. EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents


128. AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills


129. Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning


130. Multi-Scale Dequant: Eliminating Dequantization Bottleneck via Activation Decomposition for Efficient LLM Inference


131. TERMS-Bench: Diagnosing LLM Negotiation Agents Beyond Deal Rate


132. AIS: Adaptive Importance Sampling for Quantized RL


133. A Non-Destructive Methodological Framework for Modernizing Legacy Clinical Reporting Systems for AI-Driven Pharmacoinformatics: A SAS Case Study


134. ARES-LSHADE: Autoresearch-Enhanced LSHADE with Memetic Polish for the GNBG Benchmark


135. Large Language Models for Web Accessibility: A Systematic Literature Review


136. BiSpikCLM: A Spiking Language Model integrating Softmax-Free Spiking Attention and Spike-Aware Alignment Distillation


137. GAMBIT: A Three-Mode Benchmark for Adversarial Robustness in Multi-Agent LLM Collectives


138. Hidden State Poisoning Attacks against Mamba-based Language Models