LLM 관련 주요 논문 - 2026-05-06

1. OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories


2. SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment


3. From Intent to Execution: Composing Agentic Workflows with Agent Recommendation


4. QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs


5. EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics


6. Quantifying the human visual exposome with vision language models


7. Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards


8. Agentic-imodels: Evolving agentic interpretability tools via autoresearch


9. ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting


10. Say the Mission, Execute the Swarm: Agent-Enhanced LLM Reasoning in the Web-of-Drones


11. OracleProto: A Reproducible Framework for Benchmarking LLM Native Forecasting via Knowledge Cutoff and Temporal Masking


12. AdapShot: Adaptive Many-Shot In-Context Learning with Semantic-Aware KV Cache Reuse


13. Where Paths Split: Localized, Calibrated Control of Moral Reasoning in Large Language Models


14. FinSTaR: Towards Financial Reasoning with Time Series Reasoning Models


15. Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models


16. Robust Agent Compensation (RAC): Teaching AI Agents to Compensate


17. GeoDecider: A Coarse-to-Fine Agentic Workflow for Explainable Lithology Classification


18. ReasonAudio: A Benchmark for Evaluating Reasoning Beyond Matching in Text-Audio Retrieval


19. What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis


20. Automated Large-scale CVRP Solver Design via LLM-assisted Flexible MCTS


21. Revisiting the Travel Planning Capabilities of Large Language Models


22. Enhancing Agent Safety Judgment: Controlled Benchmark Rewriting and Analogical Reasoning for Deceptive Out-of-Distribution Scenarios


23. Evaluating Prompting and Execution-Based Methods for Deterministic Computation in LLMs


24. ADAPTS: Agentic Decomposition for Automated Protocol-agnostic Tracking of Symptoms


25. Stop Automating Peer Review Without Rigorous Evaluation


26. Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?


27. Learning Correct Behavior from Examples: Validating Sequential Execution in Autonomous Agents


28. Programmatic Context Augmentation for LLM-based Symbolic Regression


29. Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense


30. CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing


31. Safety and accuracy follow different scaling laws in clinical large language models


32. Physics-Grounded Multi-Agent Architecture for Traceable, Risk-Aware Human-AI Decision Support in Manufacturing


33. The Counterexample Game: Iterated Conceptual Analysis and Repair in Language Models


34. Atomic Fact-Checking Increases Clinician Trust in Large Language Model Recommendations for Oncology Decision Support: A Randomized Controlled Trial


35. Steer Like the LLM: Activation Steering that Mimics Prompting


36. Deco: Extending Personal Physical Objects into Pervasive AI Companion through a Dual-Embodiment Framework


37. MCJudgeBench: A Benchmark for Constraint-Level Judge Evaluation in Multi-Constraint Instruction Following


38. TRACE: A Metrologically-Grounded Engineering Framework for Trustworthy Agentic AI Systems in Operationally Critical Domains


39. Before Forgetting, Learn to Remember: Revisiting Foundational Learning Failures in LVLM Unlearning Benchmarks


40. Segmenting Human-LLM Co-authored Text via Change Point Detection


41. SAM-NER: Semantic Archetype Mediation for Zero-Shot Named Entity Recognition


42. SERE: Structural Example Retrieval for Enhancing LLMs in Event Causality Identification


43. Tailored Prompts, Targeted Protection: Vulnerability-Specific LLM Analysis for Smart Contracts


44. ELAS: Efficient Pre-Training of Low-Rank Large Language Models via 2:4 Activation Sparsity


45. Multi-Agent Strategic Games with LLMs



47. ProgramBench: Can Language Models Rebuild Programs From Scratch?


48. Revisiting Graph-Tokenizing Large Language Models: A Systematic Evaluation of Graph Token Understanding


49. MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models


50. MEMSAD: Gradient-Coupled Anomaly Detection for Memory Poisoning in Retrieval-Augmented Agents


51. CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification


52. Detecting Stealth Sycophancy in Mental-Health Dialogue with Dynamic Emotional Signature Graphs


53. FINER-SQL: Boosting Small Language Models for Text-to-SQL


54. Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis


55. Discovering Reinforcement Learning Interfaces with Large Language Models


56. SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents


57. Can Multimodal Large Language Models Understand Pathologic Movements? A Pilot Study on Seizure Semiology


58. VLMaxxing through FrameMogging Training-Free Anti-Recomputation for Video Vision-Language Models


59. LLM-ADAM: A Generalizable LLM Agent Framework for Pre-Print Anomaly Detection in Additive Manufacturing


60. DGPO: Distribution Guided Policy Optimization for Fine Grained Credit Assignment


61. SHIELD: A Diverse Clinical Note Dataset and Distilled Small Language Models for Enterprise-Scale De-identification


62. RLDX-1 Technical Report


63. MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory


64. Self-Mined Hardness for Safety Fine-Tuning


65. When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI


66. From Knowledge to Action: Outcomes of the 2025 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry


67. Pact: A Choreographic Language for Agentic Ecosystems


68. PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization


69. ARISE: A Repository-level Graph Representation and Toolset for Agentic Fault Localization and Program Repair


70. Gated Subspace Inference for Transformer Acceleration


71. Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation


72. Multilingual Safety Alignment via Self-Distillation


73. Finite-Size Gradient Transport in Large Language Model Pretraining: From Cascade Size to Intensive Transport Efficiency


74. Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use


75. RouteHijack: Routing-Aware Attack on Mixture-of-Experts LLMs


76. Exploring Pass-Rate Reward in Reinforcement Learning for Code Generation


77. Proteo-R1: Reasoning Foundation Models for De Novo Protein Design


78. EvoJail: Evolutionary Diverse Jailbreak Prompt Generation for Large Language Models


79. Reasoning-Guided Grounding: Elevating Video Anomaly Detection through Multimodal Large Language Models


80. Delay, Plateau, or Collapse: Evaluating the Impact of Systematic Verification Error on RLVR


81. On the Invariants of Softmax Attention


82. Same Voice, Different Lab: On the Homogenization of Frontier LLM Personalities