LLM 관련 주요 논문 - 2026-04-17

1. Generalization in LLM Problem Solving: The Case of the Shortest Path


2. Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations


3. RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography


4. Context Over Content: Exposing Evaluation Faking in Automated Judges


5. Meituan Merchant Business Diagnosis via Policy-Guided Dual-Process User Simulation


6. IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning


7. OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis


8. From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench


9. Autogenesis: A Self-Evolving Agent Protocol


10. Towards Faster Language Model Inference Using Mixture-of-Experts Flow Matching


11. COEVO: Co-Evolutionary Framework for Joint Functional Correctness and PPA Optimization in LLM-Based RTL Generation


12. Dr.~RTL: Autonomous Agentic RTL Optimization through Tool-Grounded Self-Improvement


13. Discovering Novel LLM Experts via Task-Capability Coevolution


14. Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models


15. ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints


16. Governing Reflective Human-AI Collaboration: A Framework for Epistemic Scaffolding and Traceable Reasoning


17. The Missing Knowledge Layer in AI: A Framework for Stable Human-AI Reasoning


18. Beyond Literal Summarization: Redefining Hallucination for Medical SOAP Note Evaluation


19. The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows


20. MirrorBench: Evaluating Self-centric Intelligence in MLLMs by Introducing a Mirror


21. CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning


22. Disentangle-then-Refine: LLM-Guided Decoupling and Structure-Aware Refinement for Graph Contrastive Learning


23. The Agentification of Scientific Research: A Physicist’s Perspective


24. SGA-MCTS: Decoupling Planning from Execution via Training-Free Atomic Experience Retrieval


25. HWE-Bench: Benchmarking LLM Agents on Real-World Hardware Bug Repair Tasks


26. CAMO: An Agentic Framework for Automated Causal Discovery from Micro Behaviors to Macro Emergence in LLM Agent Simulations


27. DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation


28. Acceptance Dynamics Across Cognitive Domains in Speculative Decoding


29. Rethinking Patient Education as Multi-turn Multi-modal Interaction


30. Targeted Exploration via Unified Entropy Control for Reinforcement Learning


31. Learning to Draw ASCII Improves Spatial Reasoning in Language Models


32. El Agente Forjador: Task-Driven Agent Generation for Quantum Simulation


33. GDPR Auto-Formalization with AI Agents and Human Verification


34. Enhancing Mental Health Counseling Support in Bangladesh using Culturally-Grounded Knowledge


35. TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification


36. Dissecting Failure Dynamics in Large Language Model Reasoning


37. Quantifying Cross-Query Contradictions in Multi-Query LLM Reasoning


38. Pushing the Limits of On-Device Streaming ASR: A Compact, High-Accuracy English Model for Low-Latency Inference


39. Seeing Through Circuits: Faithful Mechanistic Interpretability for Vision Transformers


40. Evo-MedAgent: Beyond One-Shot Diagnosis with Agents That Remember, Reflect, and Improve


41. Response-Aware User Memory Selection for LLM Personalization


42. AIBuildAI: An AI Agent for Automatically Building AI Models


43. Credo: Declarative Control of LLM Pipelines via Beliefs and Policies


44. Seeing Through Experts Eyes A Foundational Vision Language Model Trained on Radiologists Gaze and Reasoning


45. GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification


46. Simulating Human Cognition: Heartbeat-Driven Autonomous Thinking Activity Scheduling for LLM-based AI systems


47. NuHF Claw: A Risk Constrained Cognitive Agent Framework for Human Centered Procedure Support in Digital Nuclear Control Rooms


48. Why Do Vision Language Models Struggle To Recognize Human Emotions?


49. Prism: Symbolic Superoptimization of Tensor Programs


50. CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas


51. VisPCO: Visual Token Pruning Configuration Optimization via Budget-Aware Pareto-Frontier Learning for Vision-Language Models


52. Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines


53. Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models


54. IUQ: Interrogative Uncertainty Quantification for Long-Form Large Language Model Generation


55. Autonomous Evolution of EDA Tools: Multi-Agent Self-Evolved ABC


56. Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization


57. UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards


58. Calibration-Gated LLM Pseudo-Observations for Online Contextual Bandits


59. RaTA-Tool: Retrieval-based Tool Selection with Multimodal Large Language Models


60. Can LLMs Score Medical Diagnoses and Clinical Reasoning as well as Expert Panels?


61. Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models


62. RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding


63. Vibe-Coding: Feedback-Based Automated Verification with no Human Code Inspection, a Feasibility Study


64. MetaDent: Labeling Clinical Images for Vision-Language Models in Dentistry


65. Schema Key Wording as an Instruction Channel in Structured Generation under Constrained Decoding


66. ClimateCause: Complex and Implicit Causal Structures in Climate Reports


67. Zero-Shot Retail Theft Detection via Orchestrated Vision Models: A Model-Agnostic, Cost-Effective Alternative to Trained Single-Model Systems


68. Which bird does not have wings: Negative-constrained KGQA with Schema-guided Semantic Matching and Self-directed Refinement


69. Bounded Autonomy for Enterprise AI: Typed Action Contracts and Consumer-Side Execution


70. Fact4ac at the Financial Misinformation Detection Challenge Task: Reference-Free Financial Misinformation Detection via Fine-Tuning and Few-Shot Prompting of Large Language Models


71. StoryCoder: Narrative Reformulation for Structured Reasoning in LLM Code Generation


72. ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving


73. Retrieve, Then Classify: Corpus-Grounded Automation of Clinical Value Set Authoring


74. Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection


75. CausalDetox: Causal Head Selection and Intervention for Language Model Detoxification


76. Mechanistic Decoding of Cognitive Constructs in LLMs


77. AgileLog: A Forkable Shared Log for Agents on Data Streams


78. CPGRec+: A Balance-oriented Framework for Personalized Video Game Recommendations


79. Generative Augmented Inference


80. Don’t Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG


81. VeriGraphi: A Multi-Agent Framework of Hierarchical RTL Generation for Large Hardware Designs


82. Hierarchical vs. Flat Iteration in Shared-Weight Transformers


83. LLMs taking shortcuts in test generation: A study with SAP HANA and LevelDB


84. SpaceMind: A Modular and Self-Evolving Embodied Vision-Language Agent Framework for Autonomous On-orbit Servicing


85. Generating Concept Lexicalizations via Dictionary-Based Cross-Lingual Sense Projection


86. BiCon-Gate: Consistency-Gated De-colloquialisation for Dialogue Fact-Checking


87. Coalition Formation in LLM Agent Networks: Stability Analysis and Convergence Guarantees


88. Modular Continual Learning via Zero-Leakage Reconstruction Routing and Autonomous Task Discovery


89. SatBLIP: Context Understanding and Feature Identification from Satellite Imagery with Vision-Language Learning


90. The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal Language Models


91. APEX-MEM: Agentic Semi-Structured Memory with Temporal Reasoning for Long-Term Conversational AI


92. When PCOS Meets Eating Disorders: An Explainable AI Approach to Detecting the Hidden Triple Burden


93. Mamba-SSM with LLM Reasoning for Biomarker Discovery: Causal Feature Refinement via Chain-of-Thought Gene Evaluation


94. Faithfulness Serum: Mitigating the Faithfulness Gap in Textual Explanations of LLM Decisions via Attribution Guidance


95. Challenges and Future Directions in Agentic Reverse Engineering Systems


96. DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines


97. EuropeMedQA Study Protocol: A Multilingual, Multimodal Medical Examination Dataset for Language Model Evaluation


98. Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization


99. Reinforcement Learning via Value Gradient Flow


100. ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents


101. Evaluation of Agents under Simulated AI Marketplace Dynamics


102. FRESCO: Benchmarking and Optimizing Re-rankers for Evolving Semantic Conflict in Retrieval-Augmented Generation


103. TRACE: A Conversational Framework for Sustainable Tourism Recommendation with Agentic Counterfactual Explanations



105. PriHA: A RAG-Enhanced LLM Framework for Primary Healthcare Assistant in Hong Kong


106. CROP: Token-Efficient Reasoning in Large Language Models via Regularized Prompt Optimization


107. PolyBench: Benchmarking LLM Forecasting and Trading Capabilities on Live Prediction Market Data


108. MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining


109. The PICCO Framework for Large Language Model Prompting: A Taxonomy and Reference Architecture for Prompt Structure


110. Grading the Unspoken: Evaluating Tacit Reasoning in Quantum Field Theory and String Theory with LLMs


111. Internal Knowledge Without External Expression: Probing the Generalization Boundary of a Classical Chinese Language Model


112. An Underexplored Frontier: Large Language Models for Rare Disease Patient Education and Communication – A scoping review


113. Tug-of-War within A Decade: Conflict Resolution in Vulnerability Analysis via Teacher-Guided Retrieval-Augmented Generations


114. Benchmarking Linguistic Adaptation in Comparable-Sized LLMs: A Study of Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali


115. Stateful Evidence-Driven Retrieval-Augmented Generation with Iterative Reasoning


116. Chinese Essay Rhetoric Recognition Using LoRA, In-context Learning and Model Ensemble


117. SeaAlert: Critical Information Extraction From Maritime Distress Communications with Large Language Models


118. Can Large Language Models Detect Methodological Flaws? Evidence from Gesture Recognition for UAV-Based Rescue Operation Based on Deep Learning


119. HUOZIIME: An On-Device LLM-enhanced Input Method for Deep Personalization


120. MemGround: Long-Term Memory Evaluation Kit for Large Language Models in Gamified Scenarios