LLM 관련 주요 논문 - 2026-05-22

1. LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems



3. Towards a General Intelligence and Interface for Wearable Health Data


4. HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools


5. Beyond Acoustic Emotion Recognition: Multimodal Pathos Analysis in Political Speech Using LLM-Based and Acoustic Emotion Models


6. Can AI Make Conflicts Worse? An Alignment Failure in LLM Deployment Across Conflict Contexts


7. AMEL: Accumulated Message Effects on LLM Judgments


8. Is Capability a Liability? More Capable Language Models Make Worse Forecasts When It Matters Most


9. WorkstreamBench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance


10. AtelierEval: Agentic Evaluation of Humans & LLMs as Text-to-Image Prompters


11. Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning


12. Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning


13. Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost


14. Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings


15. Meta-Soft: Leveraging Composable Meta-Tokens for Context-Preserving KV Cache Compression


16. SciCore-Mol: Augmenting Large Language Models with Pluggable Molecular Cognition Modules


17. Evaluating Large Language Models as Live Strategic Agents: Provider Performance, Hybrid Decomposition, and Operational Gaps in Timed Risk Play


18. SGR-Bench: Benchmarking Search Agents on State-Gated Retrieval


19. CLORE: Content-Level Optimization for Reasoning Efficiency


20. Skill Weaving: Efficient LLM Improvement via Modular Skillpacks


21. LLM-Metrics: Measuring Research Impact Through Large Language Model Memory


22. Measuring Cross-Modal Synergy: A Benchmark for VLM Explainability


23. Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents


24. ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs


25. IdleSpec: Exploiting Idle Time via Speculative Planning for LLM Agents


26. Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents


27. Efficient Agentic Reasoning Through Self-Regulated Simulative Planning


28. Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?


29. ArborKV: Structure-Aware KV Cache Management for Scaling Tree-based LLM Reasoning


30. Enhancing Visual Token Representations for Video Large Language Models via Training-Free Spatial-Temporal Pooling and Gridding


31. Active Evidence-Seeking and Diagnostic Reasoning in Large Language Models for Clinical Decision Support


32. The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems


33. Format-Constraint Coupling in Knowledge Graph Construction from Statistical Tables


34. AI-Enabled Serious Games: Integrating Intelligence and Adaptivity in Training Systems


35. Planning in the LLM Era: Building for Reliability and Efficiency


36. Implicit Safety Alignment from Crowd Preferences


37. Trace2Skill: Verifier-Guided Skill Evolution for Long-Context EDA Agents


38. What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct


39. SMDD-Bench: Can LLMs Solve Real-World Small Molecule Drug Design Tasks?


40. AttuneBench: A Conversation-Based Benchmark for LLM Emotional Intelligence


41. Latent-space Attacks for Refusal Evasion in Language Models


42. The Shape of Testimony: A Scalable Framework for Oral History Archive Comparison


43. Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs



45. DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback


46. Reducing Political Manipulation with Consistency Training


47. Understanding Data Temporality Impact on Large Language Models Pre-training


48. Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation


49. AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild


50. Moral Semantics Survive Machine Translation: Cross-Lingual Evidence from Moral Foundations Corpora


51. Healthcare LLM Benchmarks Are Only as Good as Their Explicit Assumptions


52. Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents


53. Beyond Temperature: Hyperfitting as a Late-Stage Geometric Expansion


54. VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis



56. The Neural Compiler: Program-to-Network Translation for Hybrid Scientific Machine Learning


57. From Correlation to Cause: A Five-Stage Methodology for Feature Analysis in Transformer Language Models


58. Steins;Gate Drive: Semantic Safety Arbitration over Structured Futures for Latency-Decoupled LLM Planning


59. Towards Clinically Interpretable Ophthalmic VQA via Spatially-Grounded Lesion Evidence


60. DeferMem: Query-Time Evidence Distillation via Reinforcement Learning for Long-Term Memory QA


61. VeriScale: Adversarial Test-Suite Scaling for Verifiable Code Generation


62. TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation


63. Bernini: Latent Semantic Planning for Video Diffusion


64. Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions


65. One LR Doesn’t Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs


66. EmoTrack: Robust Depression Tracking from Counseling Transcripts across Session Regimes


67. MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering


68. Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning


69. What are the Right Symmetries for Formal Theorem Proving?



71. SWE-Mutation: Can LLMs Generate Reliable Test Suites in Software Engineering?


72. One-Way Policy Optimization for Self-Evolving LLMs


73. TextTeacher: What Can Language Teach About Images?


74. Not Yet: Humans Outperform LLMs in a Colonel Blotto Tournament


75. JMed48k: A Multi-Profession Japanese Medical Licensing Benchmark for Vision-Language Model Evaluation


76. From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning


77. LABO: LLM-Accelerated Bayesian Optimization through Broad Exploration and Selective Experimentation


78. GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation


79. Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems


80. Ex-GraphRAG: Interpretable Evidence Routing for Graph-Augmented LLMs


81. Learning Spatiotemporal Sensitivity in Video LLMs via Counterfactual Reinforcement Learning


82. Interpreting and Enhancing Emotional Circuits in Large Vision-Language Models via Cross-Modal Information Flow


83. LLM Retrieval for Stable and Predictable Ad Recommendations


84. ChronoMedicalWorld: A Medical World Model for Learning Patient Trajectories from Longitudinal Care Data


85. MLLMs Know When Before Speaking: Revealing and Recovering Temporal Grounding via Attention Cues


86. CausalGuard: Conformal Inference under Graph Uncertainty


87. SDGBiasBench: Benchmarking and Mitigating Vision–Language Models’ Biases in Sustainable Development Goals


88. MAVEN: A Multi-stage Agentic Annotation Pipeline for Video Reasoning Tasks


89. EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control


90. The Illusion of Reasoning: Exposing Evasive Data Contamination in LLMs via Zero-CoT Truncation


91. CrossVLA: Cross-Paradigm Post-Training and Inference Optimization for Vision-Language-Action Models


92. OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning


93. Comparing LLM and Fine-Tuned Model Performance on NVDRS Circumstance Extraction with Varying Prompt Complexity


94. Does Slightly Mean Somewhat? Measuring Vague Intensity Words in LLM Numeric Actions


95. Probabilistic Attribution For Large Language Models


96. TBP-mHC: full expressivity for manifold-constrained hyper connections through transportation polytopes


97. PocketAgents: A Manifest-Driven Library of Autonomous Defense Agents


98. Value-Gradient Hypothesis of RL for LLMs


99. Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming


100. Flat-Pack Bench: Evaluating Spatio-Temporal Understanding in Large Vision-Language Models through Furniture Assembly


101. CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety


102. RefusalBench: Why Refusal Rate Misranks Frontier LLMs on Biological Research Prompts


103. Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs


104. Detecting Synthetic Political Narratives in Cross-Platform Social Media Discourse


105. Protein Thoughts: Interpretable Reasoning with Tree of Thoughts and Embedding-Space Flow Matching for Protein-Protein Interaction Discovery


106. Harnesses for Inference-Time Alignment over Execution Trajectories


107. Predicting Performance of Symbolic and Prompt Programs with Examples


108. Autonomous LLM Agents & CTFs: A Second Look


109. HealthCraft: A Reinforcement Learning Safety Environment for Emergency Medicine


110. Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation


111. High-speed Networking for Giga-Scale AI Factories