LLM 관련 주요 논문 - 2026-05-18


2. FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast


3. Fully Open Meditron: An Auditable Pipeline for Clinical LLMs


4. Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most


5. Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP


6. Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems


7. Look Before You Leap: Autonomous Exploration for LLM Agents


8. Property-Guided LLM Program Synthesis for Planning


9. Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law


10. PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control


11. Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design


12. SaaS-Bench: Can Computer-Use Agents Leverage Real-World SaaS to Solve Professional Workflows?


13. ALSO: Adversarial Online Strategy Optimization for Social Agents


14. Can We Trust AI-Inferred User States. A Psychometric Framework for Validating the Reliability of Users States Classification by LLMs in Operational Environments


15. Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR


16. PRISM: Prompt Reliability via Iterative Simulation and Monitoring for Enterprise Conversational AI


17. ColPackAgent: Agent-Skill-Guided Hard-Particle Monte Carlo Workflows for Colloidal Packing


18. TopoEvo: A Topology-Aware Self-Evolving Multi-Agent Framework for Root Cause Analysis in Microservices


19. See Before You Code: Learning Visual Priors for Spatially Aware Educational Animation Generation


20. STAR: A Stage-attributed Triage and Repair framework for RCA Agents in Microservices


21. DRS-GUI: Dynamic Region Search for Training-Free GUI Grounding


22. RTL-BenchMT: Dynamic Maintenance of RTL Generation Benchmark Through Agent-Assisted Analysis and Revision


23. CAPS: Cascaded Adaptive Pairwise Selection for Efficient Parallel Reasoning


24. From LLM-Generated Conjectures to Lean Formalizations: Automated Polynomial Inequality Proving via Sum-of-Squares Certificates


25. Belief Engine: Configurable and Inspectable Stance Dynamics in Multi-Agent LLM Deliberation


26. Zero-Shot Goal Recognition with Large Language Models


27. Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning


28. SMCEvolve: Principled Scientific Discovery via Sequential Monte Carlo Evolution


29. Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution


30. ICRL: Learning to Internalize Self-Critique with Reinforcement Learning


31. CAX-Agent: A Lightweight Agent Harness for Reliable APDL Automation


32. Fair outputs, Biased Internals: Causal Potency and Asymmetry of Latent Bias in LLMs for High-Stakes Decisions


33. SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime Interfaces


34. Does Theory of Mind Improvement Really Benefit Human-AI Interactions? Empirical Findings from Interactive Evaluations


35. AI-Mediated Communication Can Steer Collective Opinion


36. Offline Semantic Guidance for Efficient Vision-Language-Action Policy Distillation


37. paper.json: A Coordination Convention for LLM-Agent-Actionable Papers


38. DebiasRAG: A Tuning-Free Path to Fair Generation in Large Language Models through Retrieval-Augmented Generation


39. Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks


40. VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation


41. RecMem: Recurrence-based Memory Consolidation for Efficient and Effective Long-Running LLM Agents


42. From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation


43. Can Vision Language Models Be Adaptive in Mathematics Education? A Learner Model-based Rubric Study


44. CitePrism: Human-in-the-Loop AI for Citation Auditing and Editorial Integrity


45. LoCO: Low-rank Compositional Rotation Fine-tuning


46. Modeling Music as a Time-Frequency Image: A 2D Tokenizer for Music Generation


47. GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions


48. BiomedAP: A Vision-Informed Dual-Anchor Framework with Gated Cross-Modal Fusion for Robust Medical Vision-Language Adaptation


49. UAM: A Dual-Stream Perspective on Forgetting in VLA Training


50. H-Mem: A Novel Memory Mechanism for Evolving and Retrieving Agent Memory via a Hybrid Structure


51. ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models


52. VLMs Trace Without Tracking: Diagnosing Failures in Visual Path Following


53. A Few GPUs, A Whole Lotta Scale: Faithful LLM Training Emulation with PrismLLM


54. Detecting Privilege Escalation in Polyglot Microservices via Agentic Program Analysis


55. AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs


56. DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation


57. RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably


58. Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs


59. Hybrid LLM-based Intelligent Framework for Robot Task Scheduling


60. Retrieval-Augmented Large Language Models for Schema-Constrained Clinical Information Extraction


61. GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero


62. DrugSAGE:Self-evolving Agent Experience for Efficient State-of-the-Art Drug Discovery


63. Runtime-Structured Task Decomposition for Agentic Coding Systems


64. $f$-Trajectory Balance: A Loss Family for Tuning GFlowNets, Generative Models, and LLMs with Off- and On-Policy Data


65. Margin-Adaptive Confidence Ranking for Reliable LLM Judgement


66. From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery


67. Representation Without Reward: A JEPA Audit for LLM Fine-Tuning


68. Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory


69. LEAP: Trajectory-Level Evaluation of LLMs in Iterative Scientific Design


70. Hidden in Memory: Sleeper Memory Poisoning in LLM Agents


71. From I/O to Code with Discovery Agent


72. GQA-μP: The maximal parameterization update for grouped query attention


73. GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding


74. Hydra: Efficient, Correct Code Generation via Checkpoint-and-Rollback Support


75. Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench


76. GenAI-Driven Approach to RISC-V Supply Chain Exploration


77. Effective Harness Engineering for Algorithm Discovery with Coding Agents


78. Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time


79. An LLM-RAG Approach for Healthy Eating Index-Informed Personalized Food Recommendations


80. Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels


81. AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices


82. Agent4POI: Agentic Context-Conditioned Affordance Reasoning for Multimodal Point-of-Interest Recommendation