LLM 관련 주요 논문 - 2026-05-05

1. Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross–Language Code Clone Detection


2. SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering


3. When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition


4. U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning


5. Mitigating Misalignment Contagion by Steering with Implicit Traits


6. ORPilot: A Production-Oriented Agentic LLM-for-OR Tool for Optimization Modeling


7. Hybrid Inspection and Task-Based Access Control in Zero-Trust Agentic AI


8. Foundation-Model-Based Agents in Industrial Automation: Purposes, Capabilities, and Open Challenges


9. On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length


10. Strategy-Aware Optimization Modeling with Reasoning LLMs


11. GRAIL: A Deep-Granularity Hybrid Resonance Framework for Real-Time Agent Discovery via SLM-Enhanced Indexing


12. Shadow-Loom: Causal Reasoning over Graphical World Model of Narratives


13. Position: How can Graphs Help Large Language Models?


14. Measuring AI Reasoning: A Guide for Researchers


15. A Compound AI Agent for Conversational Grant Discovery


16. Anon: Extrapolating Optimizer Adaptivity Across the Real Spectrum


17. EngiAgent: Fully Connected Coordination of LLM Agents for Solving Open-ended Engineering Problems with Feasible Solutions


18. Complexity Horizons of Compressed Models in Analog Circuit Analysis


19. Towards Understanding Specification Gaming in Reasoning Models


20. Zero-Shot Confidence Estimation for Small LLMs: When Supervised Baselines Aren’t Worth Training


21. PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments


22. Perturbation Dose Responses in Recursive LLM Loops: Raw Switching, Stochastic Floors, and Persistent Escape under Append, Replace, and Dialog Updates


23. CoVSpec: Efficient Device-Edge Co-Inference for Vision-Language Models via Speculative Decoding


24. Submodular Benchmark Selection


25. CBV: Clean-label Backdoor Attacks on Vision Language Models via Diffusion Models


26. MEMAUDIT: An Exact Package-Oracle Evaluation Protocol for Budgeted Long-Term LLM Memory Writing


27. Retrieval and Multi-Hop Reasoning in 1M-Token Context Windows: Evaluating LLMs on Classical Chinese Text


28. Planner Matters! An Efficient and Unbalanced Multi-agent Collaboration Framework for Long-horizon Planning


29. The Dynamic Gist-Based Memory Model (DGMM): A Memory-Centric Architecture for Artificial Intelligence


30. NORA: A Harness-Engineered Autonomous Research Agent for End-to-End Spatial Data Science


31. Model Spec Midtraining: Improving How Alignment Training Generalizes


32. 12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation


33. Moira: Language-driven Hierarchical Reinforcement Learning for Pair Trading


34. A Language for Describing Agentic LLM Contexts


35. Disentangling Intent from Role: Adversarial Self-Play for Persona-Invariant Safety Alignment


36. CyberAId: AI-Driven Cybersecurity for Financial Service Providers


37. NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent Profiles


38. Are LLMs More Skeptical of Entertainment News?


39. CP-SynC: Multi-Agent Zero-Shot Constraint Modeling in MiniZinc with Synthesized Checkers


40. Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework


41. Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling



43. Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization


44. Structural Ranking of the Cognitive Plausibility of Computational Models of Analogy and Metaphors with the Minimal Cognitive Grid


45. DiagramNet: An End-to-End Recognition Framework and Dataset for Non-Standard System-Level Diagrams


46. Truth or Tribe: How In-group Favoritism Prioritize Facts in Persona Agents


47. Segment-Aligned Policy Optimization for Multi-Modal Reasoning


48. Valley3: Scaling Omni Foundation Models for E-commerce


49. Faithful Mobile GUI Agents with Guided Advantage Estimator


50. GR-Ben: A General Reasoning Benchmark for Evaluating Process Reward Models


51. NEURON: A Neuro-symbolic System for Grounded Clinical Explainability


52. LLMs Should Not Yet Be Credited with Decision Explanation


53. Position: Safety and Fairness in Agentic AI Depend on Interaction Topology, Not on Model Scale or Alignment


54. A Low-Latency Fraud Detection Layer for Detecting Adversarial Interaction Patterns in LLM-Powered Agents


55. PERSA: Reinforcement Learning for Professor-Style Personalized Feedback with LLMs



57. Towards Multi-Agent Autonomous Reasoning in Hydrodynamics


58. Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy


59. A Knowledge-Driven LLM-Based Decision-Support System for Explainable Defect Analysis and Mitigation Guidance in Laser Powder Bed Fusion


60. Effect-Transparent Governance for AI Workflow Architectures: Semantic Preservation, Expressive Minimality, and Decidability Boundaries


61. ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable Citations


62. Understanding Emergent Misalignment via Feature Superposition Geometry


63. AI Agents for Sustainable SMEs: A Green ESG Assessment Framework


64. SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection


65. Bolek: A Multimodal Language Model for Molecular Reasoning


66. AI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development


67. Perceptual Flow Network for Visually Grounded Reasoning


68. Fuzzy Fingerprinting Encoder Pre-trained Language Models for Emotion Recognition in Conversations: Human Assessment and Validity Study


69. CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation


70. Beyond State Machines: Executing Network Procedures with Agentic Tool-Calling Sequences


71. A Semantic Autonomy Framework for VLM-Integrated Indoor Mobile Robots: Hybrid Deterministic Reasoning and Cross-Robot Adaptive Memory


72. Benchmarking Retrieval Strategies for Biomedical Retrieval-Augmented Generation: A Controlled Empirical Study


73. From Experimental Limits to Physical Insight: A Retrieval-Augmented Multi-Agent Framework for Interpreting Searches Beyond the Standard Model


74. When Stress Becomes Signal: Detecting Antifragility-Compatible Regimes in Multi-Agent LLM Systems


75. LLM-Assisted Repository-Level Generation with Structured Spec-Driven Engineering


76. Causal Software Engineering: A Vision and Roadmap


77. Is It Novel and Why? Fine-Grained Patent Novelty Prediction Based on Passage Retrieval


78. Enhancing Multimodal In-Context Learning via Inductive-Deductive Reasoning


79. When Correct Isn’t Usable: Improving Structured Output Reliability in Small Language Models


80. APIOT: Autonomous Vulnerability Management Across Bare-Metal Industrial OT Networks


81. LLM-enabled Social Agents


82. Reliability-Oriented Multilingual Orthopedic Diagnosis: A Domain-Adaptive Modeling and a Conceptual Validation Framework


83. On the Privacy of LLMs: An Ablation Study


84. When Alignment Isn’t Enough: Response-Path Attacks on LLM Agents


85. DocSync: Agentic Documentation Maintenance via Critic-Guided Reflexion


86. Context-Aware Wireless Token Communication via Joint Token Masking and Detection


87. EditPropBench: Measuring Factual Edit Propagation in Scientific Manuscripts


88. Pair2Score: Pairwise-to-Absolute Transfer for LLM-Based Essay Scoring


89. What Single-Prompt Accuracy Misses: A Multi-Variant Reliability Audit of Language Models


90. A Multimodal Dataset for Visually Grounded Ambiguity in Machine Translation


91. Conventional Commit Classification using Large Language Models and Prompt Engineering



93. Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration


94. RefusalGuard: Geometry-Preserving Fine-Tuning for Safety in LLMs


95. Chart-FR1: Visual Focus-Driven Fine-Grained Reasoning on Dense Charts


96. Spatiotemporal Hidden-State Dynamics as a Signature of Internal Reasoning in Large Language Models


97. RMGAP: Benchmarking the Generalization of Reward Models across Diverse Preferences


98. Selector-Guided Autonomous Curriculum for One-Shot Reinforcement Learning from Verifiable Rewards


99. Discover Fast Power Allocation Solution for Multi-Target Tracking via AlphaEvolve Evolution


100. Khala: Scaling Acoustic Token Language Models Toward High-Fidelity Music Generation


101. The Compliance Gap: Why AI Systems Promise to Follow Process Instructions but Don’t


102. Talk is Cheap, Communication is Hard: Dynamic Grounding Failures and Repair in Multi-Agent Negotiation


103. Architectural Obsolescence of Unhardened Agentic-AI Runtimes


104. GEASS: Training-Free Caption Steering for Hallucination Mitigation in Vision-Language Models


105. SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving


106. Probe-Geometry Alignment: Erasing the Cross-Sequence Memorization Signature Below Chance


107. BIM Information Extraction Through LLM-based Adaptive Exploration


108. GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory


109. AI Alignment via Incentives and Correction


110. Prosa: Rubric-Based Evaluation of LLMs on Real User Chats in Brazilian Portuguese


111. Where Do Prompt Perturbations Break Generation? A Segment-Level View of Robustness in LoRA-Tuned Language Models


112. KG-First, LLM-Fallback: A Hybrid Microservice for Grounded Skill Search and Explanation


113. Neuro-Symbolic Agents for Hallucination-Free Requirements Reuse


114. Automated Interpretability and Feature Discovery in Language Models with Agents


115. 6G Needs Agents: Toward Agentic AI-Native Networks for Autonomous Intelligence


116. FT-RAG: A Fine-grained Retrieval-Augmented Generation Framework for Complex Table Reasoning


117. Practical Limits of Autonomous Test Repair: A Multi-Agent Case Study with LLM-Driven Discovery and Self-Correction


118. VisInject: Disruption != Injection – A Dual-Dimension Evaluation of Universal Adversarial Attacks on Vision-Language Models


119. HepScript: A Dual-Use DSL for Human-AI Collaborative Data Analysis Workflows in High-Energy Physics


120. Medmarks: A Comprehensive Open-Source LLM Benchmark Suite for Medical Tasks


121. AMSnet-q: Unsupervised Circuit Identification and Performance Labeling for AMS Circuits


122. Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning


123. LiveFMBench: Unveiling the Power and Limits of Agentic Workflows in Specification Generation


124. Using LLMs in Software Design: An Empirical Study of GitHub and A Practitioner Survey


125. Focus on the Core: Empowering Diffusion Large Language Models by Self-Contrast


126. Model-Based Proactive Cost Generation for Learning Safe Policies Offline with Limited Violation Data


127. Active Reasoning Vision-Language Models via Sequential Experimental Design


128. GraphSculptor: Sculpting Pre-training Coreset for Graph Self-supervised Learning


129. Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation


130. Position: LLM Serving Needs Mathematical Optimization and Algorithmic Foundations, Not Just Heuristics


131. The Garden of Forking Paths: Narrative Arc-Conditioned Gameplay Planning


132. MindMelody: A Closed-Loop EEG-Driven System for Personalized Music Intervention


133. Minimizing Collateral Damage in Activation Steering


134. Component-Aware Self-Speculative Decoding in Hybrid Language Models


135. Interpretable Difficulty-Aware Knowledge Tracing in Tutor-Student Dialogues


136. A Sentence Relation-Based Approach to Sanitizing Malicious Instructions


137. LLM Ghostbusters: Surgical Hallucination Suppression via Adaptive Unlearning


138. EmoMM: Benchmarking and Steering MLLM for Multimodal Emotion Recognition under Conflict and Missingness


139. CLEAR: Revealing How Noise and Ambiguity Degrade Reliability in LLMs for Medicine


140. Model Organisms Are Leaky: Perplexity Differencing Often Reveals Finetuning Objectives



142. Seeking Information with RAG-Assistants: Does Model Size Matter in Human-AI Collaborations?


143. Ablation Study of Multimodal Perception, Language Grounding, and Control for Human-Robot Interaction in an Object Detection and Grasping Task


144. “I Don’t Know” – Towards Appropriate Trust with Certainty-Aware Retrieval Augmented Generation


145. E-MIA: Exam-Style Black-Box Membership Inference Attacks against RAG Systems


146. Co-Generative De Novo Functional Protein Design


147. StyleShield: Exposing the Fragility of AIGC Detectors through Continuous Controllable Style Transfer


148. TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation


149. Generalized Category Discovery under Domain Shifts: From Vision to Vision-Language Models


150. Retrieval-Guided Generation for Safer Histopathology Image Captioning


151. X2SAM: Any Segmentation in Images and Videos


152. OceanPile: A Large-Scale Multimodal Ocean Corpus for Foundation Models


153. BRITE: A Benchmark for Reliable and Interpretable T2V Evaluation on Implausible Scenarios


154. H-Probes: Extracting Hierarchical Structures From Latent Representations of Language Models


155. Graph Query Generation with Constraint-guided Large Language Agents


156. The Oracle’s Fingerprint: Correlated AI Forecasting Errors and the Limits of Bias Transmission


157. Agentopic: A Generative AI Agent Workflow for Explainable Topic Modeling


158. GhostServe: A Lightweight Checkpointing System in the Shadow for Fault-Tolerant LLM Serving


159. Separating Intelligence from Execution: A Workflow Engine for the Model Context Protocol