LLM 관련 주요 논문 - 2026-05-25

1. SkillOpt: Executive Strategy for Self-Evolving Agent Skills


2. SPACENUM: Revisiting Spatial Numerical Understanding in VLMs


3. Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment


4. MemAudit: Post-hoc Auditing of Poisoned Agent Memory via Causal Attribution and Structural Anomaly Detection


5. One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents


6. EDGE-OPD: Internalizing Privileged Context with Evidence Guided On-Policy Distillation


7. When Planning Fails Despite Correct Execution: On Epistemic Calibration for LLM-Based Multi-Agent Systems


8. Human-in-the-Loop Multi-Agent Ventilator Decision Support with Contextual Bandit Preference Learning


9. DART: Semantic Recoverability for Structured Tool Agents


10. Parallel Context Compaction for Long-Horizon LLM Agent Serving


11. Design and Report Benchmarks for Knowledge Work


12. GENSTRAT: Toward a Science of Strategic Reasoning in Large Language Models


13. Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems


14. PathCal: State-Aware Reflection-Marker Calibration for Efficient Reasoning


15. The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems


16. Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems


17. LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws


18. ETCHR: Editing To Clarify and Harness Reasoning


19. PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs


20. Human Decision-Making with Persuasive and Narrative LLM Explanations


21. It’s the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt


22. PhotoFlow: Agentic 3D Virtual Photography Missions


23. OnePred: Next-Query Prediction via Recursive Intent Memory in Multi-Turn Conversations


24. CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception


25. DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling


26. HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval


27. PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA


28. CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test


29. AI Assurance: A Comprehensive Testing Strategy for Enterprise AI Systems


30. AI Security Research Should Better Incentivize Defense Research


31. Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals


32. XWind: A Cross-site Router for Large Language Model Inference Serving at Renewable Energy Farms


33. CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs


34. Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning


35. EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation


36. ChainFlow-VLA: Causal Flow Planning with Vision-Language Models


37. PaP-NF: Probabilistic Long-Term Time Series Forecasting via Prefix-as-Prompt Reprogramming and Normalizing Flows


38. FastKernels: Benchmarking GPU Kernel Generation in Production


39. Adaptive Mass-Segmented KV Compression for Long-Context Reasoning


40. Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning


41. Autonomous Frontier-Based Exploration with VLM Guidance


42. Generative AI and the Reorganization of Labor Demand


43. CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection


44. Security of LLM-generated Code: A Comparative Analysis


45. Anytime Training with Schedule-Free Spectral Optimization


46. Decomposing and Measuring Evaluation Awareness


47. Model Collapse as Cultural Evolution


48. DreamerNLplus: Interpretable Modeling of Mental Health Dynamics from Social Media Timelines using Hybrid Rule-Based and RAG Methods


49. Do Language Models Know What Not to Say? Causal Evidence for Statistical Preemption in LLMs


50. Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography


51. Brain-LLM Alignment Tracks Training Data, Not Typology


52. MadEvolve: Evolutionary Optimization of Trading Systems with Large Language Models


53. A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism


54. Memorization Dynamics of Fill-in-the-Middle Pretraining


55. LLM Code Smells: A Taxonomy and Detection Approach


56. A mathematical theory of balancing relational generalization and memorization


57. Graph Alignment Topology as an Inductive Bias for Grounding Detection


58. Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision?


59. Transcoders Trace Visual Grounding and Hallucinations in Vision-Language Models


60. How Far Will They Go? Red-Teaming Online Influence with Large Language Models


61. When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions


62. MedExpMem: Adapting Experience Memory for Differential Diagnosis


63. The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models


64. PrefBench: Evaluating Zero-Shot LLM Agents in Hidden-Preference Personalized Pricing Negotiations


65. ObjectCache: Layerwise Object-Storage Retrieval for KV Cache Reuse


66. Strategic Coercion Within Alliances: The Greenland Sovereignty Game as an AI Stress Test


67. LFRAG: Layout-oriented Fine-grained Retrieval-Augmented Generation on Multimodal Document Understanding


68. Evaluating Large Language Models in a Complex Hidden Role Game


69. KPI2KVI: A Multi Agent Workflow for Calculating Key Value Indicators from Service Descriptions