LLM 관련 주요 논문 - 2026-05-13

1. Formalize, Don’t Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers


2. Semantic Reward Collapse and the Preservation of Epistemic Integrity in Adaptive AI Systems


3. ProfiliTable: Profiling-Driven Tabular Data Processing via Agentic Workflows


4. Classifier Context Rot: Monitor Performance Degrades with Context Length


5. $δ$-mem: Efficient Online Memory for Large Language Models


6. Reinforcing VLAs in Task-Agnostic World Models


7. Towards Automated Air Traffic Safety Assessment Around Non-Towered Airports Using Large Language Models


8. LISA: Cognitive Arbitration for Signal-Free Autonomous Intersection Management


9. How Useful Is Cross-Domain Generalization for Training LLM Monitors?


10. No Action Without a NOD: A Heterogeneous Multi-Agent Architecture for Reliable Service Agents


11. Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems


12. MolDeTox: Evaluating Language Model’s Stepwise Fragment Editing for Molecular Detoxification


13. ALGOGEN: Tool-Generated Verifiable Traces for Reliable Algorithm Visualization


14. MM-OptBench: A Solver-Grounded Benchmark for Multimodal Optimization Modeling


15. BoolXLLM: LLM-Assisted Explainability for Boolean Models


16. To Whom Do Language Models Align? Measuring Principal Hierarchies Under High-Stakes Competing Demands


17. Large Language Models as Amortized Pareto-Front Generators for Constrained Bi-Objective Convex Optimization


18. OmniRefine: Alignment-Aware Cooperative Compression for Efficient Omnimodal Large Language Models


19. LLMs and the ZPD



21. BadSKP: Backdoor Attacks on Knowledge Graph-Enhanced LLMs with Soft Prompts


22. On the Limitations of Large Language Models for Conceptual Database Modeling


23. Assessing and Mitigating Miscalibration in LLM-Based Social Science Measurement


24. Counterfactual Trace Auditing of LLM Agent Skills


25. From Noise to Diversity: Random Embedding Injection in LLM Reasoning


26. Domain Restriction via Multi SAE Layer Transitions


27. Rethinking Supervision Granularity: Segment-Level Learning for LLM-Based Theorem Proving


28. On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment


29. Automated Reformulation of Robust Optimization via Memory-Augmented Large Language Models


30. Why Users Go There: World Knowledge-Augmented Generative Next POI Recommendation


31. Beyond Inefficiency: Systemic Costs of Incivility in Multi-Agent Monte Carlo Simulations


32. Towards Visually Grounded Multimodal Summarization via Cross-Modal Transformer and Gated Attention


33. When Reasoning Traces Become Performative: Step-Level Evidence that Chain-of-Thought Is an Imperfect Oversight Channel


34. OptArgus: A Multi-Agent System to Detect Hallucinations in LLM-based Optimization Modeling


35. Allegory of the Cave: Measurement-Grounded Vision-Language Learning


36. SafeSteer: A Decoding-level Defense Mechanism for Multimodal Large Language Models


37. Toward Stable Value Alignment: Introducing Independent Modules for Consistent Value Guidance


38. Measuring What Matters Beyond Text: Evaluating Multimodal Summaries by Quality, Alignment, and Diversity


39. Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion


40. A CAP-like Trilemma for Large Language Models: Correctness, Non-bias, and Utility under Semantic Underdetermination


41. Seirênes: Adversarial Self-Play with Evolving Distractions for LLM Reasoning


42. Can LLM Agents Respond to Disasters? Benchmarking Heterogeneous Geospatial Reasoning in Emergency Operations


43. GAR: Carbon-Aware Routing for LLM Inference via Constrained Optimization


44. Read, Grep, and Synthesize: Diagnosing Cross-Domain Seed Exposure for LLM Research Ideation


45. Controllable User Simulation


46. AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration – Learning from Cheap, Optimizing Expensive


47. Hierarchical LLM-Driven Control for HAPS-Assisted UAV Networks: Joint Optimization of Flight and Connectivity


48. Engagement Process: Rethinking the Temporal Interface of Action and Observation


49. Breaking $\textit{Winner-Takes-All}$: Cooperative Policy Optimization Improves Diverse LLM Reasoning


50. Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning


51. A Mechanistic Investigation of Supervised Fine Tuning


52. Attributing Emergence in Million-Agent Systems


53. AcuityBench: Evaluating Clinical Acuity Identification and Uncertainty Alignment


54. LLM-X: A Scalable Negotiation-Oriented Exchange for Communication Among Personal LLM Agents


55. Causal Bias Detection in Generative Artifical Intelligence


56. CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing


57. Rethinking Evaluation for LLM Hallucination Detection: A Desiderata, A New RAG-based Benchmark, New Insights


58. LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer?


59. Template-as-Ontology: Configurable Synthetic Data Infrastructure for Cross-Domain Manufacturing AI Validation


60. Unlocking LLM Creativity in Science through Analogical Reasoning


61. The Semantic Training Gap: Ontology-Grounded Tool Architectures for Industrial AI Agent Systems


62. Rethinking LLMOps for Fraud and AML: Building a Compliance-Grade LLM Serving Stack


63. PIVOT: Bridging Planning and Execution in LLM Agents via Trajectory Refinement


64. Don’t Look at the Numbers: Visual Anchoring Bias and Layer-wise Representation in VLMs


65. The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes


66. OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents


67. A Cascaded Generative Approach for e-Commerce Recommendations


68. AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward


69. Learning, Fast and Slow: Towards LLMs That Adapt Continually


70. The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events


71. Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space


72. Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling


73. OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning


74. Scalable Token-Level Hallucination Detection in Large Language Models


75. Discrete Flow Matching for Offline-to-Online Reinforcement Learning


76. Agent-Based Post-Hoc Correction of Agricultural Yield Forecasts


77. Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models


78. MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answering


79. BSO: Safety Alignment Is Density Ratio Matching


80. PriorZero: Bridging Language Priors and World Models for Decision Making


81. TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching


82. Set-Aggregated Genome Embeddings for Microbiome Abundance Prediction


83. Iterative Audit Convergence in LLM-Managed Multi-Agent Systems: A Case Study in Prompt Engineering Quality Assurance


84. Reconnecting Fragmented Citation Networks with Semantic Augmentation


85. Mind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMs


86. Harness Engineering as Categorical Architecture


87. Uncertainty Quantification for LLM-based Code Generation


88. Mitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decoding


89. CIDR: A Large-Scale Industrial Source Code Dataset for Software Engineering Research


90. It’s Not the Size: Harness Design Determines Operational Stability in Small Language Models


91. Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction


92. Hölder Policy Optimisation


93. Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation


94. Efficient and Adaptive Human Activity Recognition via LLM Backbones


95. SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces


96. CR^2: Cost-Aware Risk-Controlled Routing for Wireless Device-Edge LLM Inference


97. The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures


98. Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems


99. IPI-proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents Against Indirect Prompt Injection


100. Very Efficient Listwise Multimodal Reranking for Long Documents


101. EvoNav: Evolutionary Reward Function Design for Robot Navigation with Large Language Models


102. GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation


103. OTT-Vid: Optimal Transport Temporal Token Compression for Video Large Language Models


104. Behavioral Integrity Verification for AI Agent Skills


105. CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating


106. Cochise: A Reference Harness for Autonomous Penetration Testing


107. Evolutionary Task Discovery: Advancing Reasoning Frontiers via Skill Composition and Complexity Scaling


108. Reviving In-domain Fine-tuning Methods for Source-Free Cross-domain Few-shot Learning


109. Every Bit, Everywhere, All at Once: A Binomial Multibit LLM Watermark


110. Unlocking UML Class Diagram Understanding in Vision Language Models


111. Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization


112. From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation


113. When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models


114. Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information


115. PRISM: A Geometric Risk Bound that Decomposes Drift into Scale, Shape, and Head


116. Keep What Audio Cannot Say: Context-Preserving Token Pruning for Omni-LLMs


117. DiffScore: Text Evaluation Beyond Autoregressive Likelihood


118. Three Regimes of Context-Parametric Conflict: A Predictive Framework and Empirical Validation


119. When Looking Is Not Enough: Visual Attention Structure Reveals Hallucination in MLLMs


120. Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting


121. A Study on Hidden Layer Distillation for Large Language Model Pre-Training


122. Understanding and Preventing Entropy Collapse in RLVR with On-Policy Entropy Flow Optimization


123. SpatialForge: Bootstrapping 3D-Aware Spatial Reasoning from Open-World 2D Images


124. Predictive Maps of Multi-Agent Reasoning: A Successor-Representation Spectrum for LLM Communication Topologies


125. Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection


126. Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty


127. fg-expo: Frontier-guided exploration-prioritized policy optimization via adaptive kl and gaussian curriculum


128. Deep Reasoning in General Purpose Agents via Structured Meta-Cognition



130. Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence


131. Much of Geospatial Web Search Is Beyond Traditional GIS


132. Epistemic Uncertainty for Test-Time Discovery


133. Beyond Similarity Search: Tenure and the Case for Structured Belief State in LLM Memory


134. SOMA: Efficient Multi-turn LLM Serving via Small Language Model


135. Natural Language based Specification and Verification


136. ReAD: Reinforcement-Guided Capability Distillation for Large Language Models


137. Discovery of Interpretable Surrogates via Agentic AI: Application to Gravitational Waves


138. Localization Boosting for Growth Markets: Mitigating Cross-Locale Behavioral Bias in Learning-to-Rank


139. gwBenchmarks: Stress-Testing LLM Agents on High-Precision Gravitational Wave Astronomy


140. Curriculum Learning-Guided Progressive Distillation in Large Language Models


141. RETUYT-INCO at BEA 2026 Shared Task 2: Meta-prompting in Rubric-based Scoring for German


142. Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning


143. Comment and Control: Hijacking Agentic Workflows via Context-Grounded Evolution


144. Leveraging RAG for Training-Free Alignment of LLMs


145. ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload


146. Continuous Discovery of Vulnerabilities in LLM Serving Systems with Fuzzing


147. Adversarial SQL Injection Generation with LLM-Based Architectures


148. CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration


149. The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models


150. Benchmarking LLM-Based Static Analysis for Secure Smart Contract Development: Reliability, Limitations, and Potential Hybrid Solutions


151. Quantifying the Reconstructability of Astrophysical Methods with Large Language Models and Information Theory: A Case Study in Spectral Reconstruction


152. ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV


153. Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training


154. Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs


155. Enabling Performant and Flexible Model-Internal Observability for LLM Inference


156. MCPShield: Content-Aware Attack Detection for LLM Agent Tool-Call Traffic


157. On Problems of Implicit Context Compression for Software Engineering Agents


158. The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck


159. Sequential Behavioral Watermarking for LLM Agents


160. FragBench: Cross-Session Attacks Hidden in Benign-Looking Fragments


161. SCOPE: Siamese Contrastive Operon Pair Embeddings for Functional Sequence Representation and Classification


162. Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness


163. LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models


164. An Execution-Verified Multi-Language Benchmark for Code Semantic Reasoning


165. MT-JailBench: A Modular Benchmark for Understanding Multi-Turn Jailbreak Attacks


166. SkillGen: Verified Inference-Time Agent Skill Synthesis


167. Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures


168. Skill Drift Is Contract Violation: Proactive Maintenance for LLM Agent Skill Libraries


169. Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning


170. LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection


171. PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks


172. Rotation-Preserving Supervised Fine-Tuning


173. Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models


174. Context-Gated Associative Retrieval: From Theory to Transformers


175. MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media