LLM 관련 주요 논문 - 2026-04-10

1. Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest


2. SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions


3. From Safety Risk to Design Principle: Peer-Preservation in Multi-Agent LLM Systems and Its Implications for Orchestrated Democratic Discourse Analysis


4. KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation


5. Learning Who Disagrees: Demographic Importance Weighting for Modeling Annotator Distributions with DiADEM


6. Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing


7. SkillClaw: Let Skills Evolve Collectively with Agentic Evolver


8. Don’t Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents


9. ASPECT:Analogical Semantic Policy Execution via Language Conditioned Transfer


10. ProMedical: Hierarchical Fine-Grained Criteria Modeling for Medical LLM Alignment via Explicit Injection


11. Neural-Symbolic Knowledge Tracing: Injecting Educational Knowledge into Deep Learning for Responsible Learner Modelling


12. Aligning Agents via Planning: A Benchmark for Trajectory-Level Reward Modeling



14. Revise: A Framework for Revising OCRed text in Practical Information Systems with Data Contamination Strategy


15. ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models


16. IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling


17. Wiring the ‘Why’: A Unified Taxonomy and Survey of Abductive Reasoning in LLMs


18. Are we still able to recognize pearls? Machine-driven peer review and the risk to creativity: An explainable RAG-XAI detection framework with markers extraction


19. WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models


20. MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems


21. Visual Perceptual to Conceptual First-Order Rule Learning Networks


22. DialBGM: A Benchmark for Background Music Recommendation from Everyday Multi-Turn Dialogues


23. SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility


24. Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation


25. Automatic Generation of Executable BPMN Models from Medical Guidelines


26. Lightweight LLM Agent Memory with Small Language Models


27. The Cartesian Cut in Agentic AI


28. CivBench: Progress-Based Evaluation for LLMs’ Strategic Decision-Making in Civilization V


29. Emotion Concepts and their Function in a Large Language Model


30. Towards Knowledgeable Deep Research: Framework and Benchmark


31. IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures


32. Multi-Agent Orchestration for High-Throughput Materials Screening on a Leadership-Class System


33. From Debate to Decision: Conformal Social Choice for Safe Multi-Agent Deliberation


34. Bridging Natural Language and Interactive What-If Interfaces via LLM-Generated Declarative Specification


35. How Independent are Large Language Models? A Statistical Framework for Auditing Behavioral Entanglement and Reweighting Verifier Ensembles


36. Reasoning Graphs: Deterministic Agent Accuracy through Evidence-Centric Chain-of-Thought Feedback


37. Too long; didn’t solve


38. From Papers to Property Tables: A Priority-Based LLM Workflow for Materials Data Extraction


39. ReflectRM: Boosting Generative Reward Models via Self-Reflection within a Unified Judgment Framework


40. CLEAR: Context Augmentation from Contrastive Learning of Experience via Agentic Reflection


41. ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training


42. M-ArtAgent: Evidence-Based Multimodal Agent for Implicit Art Influence Discovery


43. Munkres’ General Topology Autoformalized in Isabelle/HOL


44. An Analysis of Artificial Intelligence Adoption in NIH-Funded Research


45. AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation


46. OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks


47. What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal


48. Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization


49. CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning


50. KV Cache Offloading for Context-Intensive Tasks


51. Synthetic Data for any Differentiable Target


52. Phantasia: Context-Adaptive Backdoors in Vision Language Models


53. TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs


54. A GAN and LLM-Driven Data Augmentation Framework for Dynamic Linguistic Pattern Modeling in Chinese Sarcasm Detection


55. PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models


56. Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models


57. Lost in the Hype: Revealing and Dissecting the Performance Degradation of Medical Multimodal Large Language Models in Image Classification


58. Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions


59. DMax: Aggressive Parallel Decoding for dLLMs


60. SeLaR: Selective Latent Reasoning in Large Language Models


61. Can Vision Language Models Judge Action Quality? An Empirical Evaluation


62. CIAO - Code In Architecture Out - Automated Software Architecture Documentation with Large Language Models


63. Distributed Multi-Layer Editing for Rule-Level Knowledge in Large Language Models


64. Behavior-Aware Item Modeling via Dynamic Procedural Solution Representations for Knowledge Tracing


65. HyperMem: Hypergraph Memory for Long-Term Conversations


66. EditCaption: Human-Aligned Instruction Synthesis for Image Editing via Supervised Fine-Tuning and Direct Preference Optimization


67. MedVR: Annotation-Free Medical Visual Reasoning via Agentic Reinforcement Learning


68. AT-ADD: All-Type Audio Deepfake Detection Challenge Evaluation Plan


69. ViVa: A Video-Generative Value Model for Robot Reinforcement Learning


70. Multimodal Reasoning with LLM for Encrypted Traffic Interpretation: A Benchmark


71. Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference


72. Small Vision-Language Models are Smart Compressors for Long Video Understanding


73. OV-Stitcher: A Global Context-Aware Framework for Training-Free Open-Vocabulary Semantic Segmentation


74. AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models


75. From Gaze to Guidance: Interpreting and Adapting to Users’ Cognitive Needs with Multimodal Gaze-Aware AI Assistants


76. 3DrawAgent: Teaching LLM to Draw in 3D with Early Contrastive Experience


77. LINE: LLM-based Iterative Neuron Explanations for Vision Models


78. LogAct: Enabling Agentic Reliability via Shared Logs


79. A Decomposition Perspective to Long-context Reasoning for LLMs


80. AtomEval: Atomic Evaluation of Adversarial Claims in Fact Verification


81. DSCA: Dynamic Subspace Concept Alignment for Lifelong VLM Editing


82. Rethinking Data Mixing from the Perspective of Large Language Models


83. TOOLCAD: Exploring Tool-Using Large Language Models in Text-to-CAD Generation with Reinforcement Learning


84. On-Policy Distillation of Language Models for Autonomous Vehicle Motion Planning


85. Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning


86. Same Outcomes, Different Journeys: A Trace-Level Framework for Comparing Human and GUI-Agent Behavior in Production Search Systems


87. Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction


88. Dynamic Attentional Context Scoping: Agent-Triggered Focus Sessions for Isolated Per-Agent Steering in Multi-Agent LLM Orchestration


89. TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation


90. Data Selection for Multi-turn Dialogue Instruction Tuning


91. PyVRP$^+$: LLM-Driven Metacognitive Heuristic Evolution for Hybrid Genetic Search in Vehicle Routing Problems


92. Networking-Aware Energy Efficiency in Agentic AI Inference: A Survey


93. QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training–Inference Mismatch


94. ReRec: Reasoning-Augmented LLM-based Recommendation Assistant via Reinforcement Fine-tuning


95. Filling the Gaps: Selective Knowledge Augmentation for LLM Recommenders


96. Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers


97. More Capable, Less Cooperative? When LLMs Fail At Zero-Cost Collaboration


98. PolicyLong: Towards On-Policy Context Extension


99. Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models


100. TEMPER: Testing Emotional Perturbation in Quantitative Reasoning


101. MIMIC-Py: An Extensible Tool for Personality-Driven Automated Game Testing with Large Language Models


102. Beyond Pedestrians: Caption-Guided CLIP Framework for High-Difficulty Video-based Person Re-Identification



104. Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization


105. An Imperfect Verifier is Good Enough: Learning with Noisy Rewards


106. DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification


107. DCD: Domain-Oriented Design for Controlled Retrieval-Augmented Generation


108. Don’t Measure Once: Measuring Visibility in AI Search (GEO)


109. Learning is Forgetting: LLM Training As Lossy Compression


110. Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs


111. Generative Experiences for Digital Mental Health Interventions: Evidence from a Randomized Study


112. TR-EduVSum: A Turkish-Focused Dataset and Consensus Framework for Educational Video Summarization


113. MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security


114. EMSDialog: Synthetic Multi-person Emergency Medical Service Dialogue Generation from Electronic Patient Care Reports via Multi-LLM Agents


115. The Shrinking Lifespan of LLMs in Science


116. SYN-DIGITS: A Synthetic Control Framework for Calibrated Digital Twin Simulation


117. Beyond Human-Readable: Rethinking Software Engineering Conventions for the Agentic Development Era


118. Triage: Routing Software Engineering Tasks to Cost-Effective LLM Tiers via Code Quality Signals


119. Enabling Intrinsic Reasoning over Dense Geospatial Embeddings with DFR-Gemma


120. Private Seeds, Public LLMs: Realistic and Privacy-Preserving Synthetic Data Generation


121. GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents


122. SubSearch: Intermediate Rewards for Unsupervised Guided Reasoning in Complex Retrieval


123. FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios


124. Breaking the Illusion of Identity in LLM Tooling


125. Self-Calibrating LLM-Based Analog Circuit Sizing with Interpretable Design Equations


126. Playing DOOM with 1.3M Parameters: Specialized Small Models vs Large Language Models for Real-Time Game Control


127. Latent Structure of Affective Representations in Large Language Models


128. The Role of Emotional Stimuli and Intensity in Shaping Large Language Model Behavior