LLM 관련 주요 논문 - 2026-04-06

1. Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models


2. Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?


3. InfoSeeker: A Scalable Hierarchical Parallel Agent Framework for Web Information Seeking


4. AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents


5. Analysis of Optimality of Large Language Models on Planning Problems


6. Multi-Turn Reinforcement Learning for Tool-Calling Agents with Iterative Reward Calibration


7. ESL-Bench: An Event-Driven Synthetic Longitudinal Benchmark for Health Agents


8. CharTool: Tool-Integrated Visual Reasoning for Chart Understanding


9. Improving Role Consistency in Multi-Agent Collaboration via Quantitative Role Clarity


10. Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents


11. DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models


12. Let’s Have a Conversation: Designing and Evaluating LLM Agents for Interactive Optimization


13. OntoKG: Ontology-Oriented Knowledge Graph Construction with Intrinsic-Relational Routing


14. AutoVerifier: An Agentic Automated Verification Framework Using Large Language Models


15. Do Audio-Visual Large Language Models Really See and Hear?


16. Mitigating LLM biases toward spurious social contexts using direct preference optimization


17. Competency Questions as Executable Plans: a Controlled RAG Architecture for Cultural Heritage Storytelling


18. I must delete the evidence: AI Agents Explicitly Cover up Fraud and Violent Crime


19. AIVV: Neuro-Symbolic LLM Agent-Integrated Verification and Validation for Trustworthy Autonomous Systems


20. Compositional Neuro-Symbolic Reasoning


21. Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation


22. Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web


23. Reliability Gated Multi-Teacher Distillation for Low Resource Abstractive Summarization


24. Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models


25. Beyond the Parameters: A Technical Survey of Contextual Enrichment in Large Language Models: From In-Context Prompting to Causal Retrieval-Augmented Generation


26. Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control


27. A Systematic Security Evaluation of OpenClaw and Its Variants


28. Domain-Adapted Retrieval for In-Context Annotation of Pedagogical Dialogue Acts


29. An Independent Safety Evaluation of Kimi K2.5


30. Co-Evolution of Policy and Internal Reward for Language Agents


31. Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems


32. Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study


33. Verbalizing LLMs’ assumptions to explain and control sycophancy


34. Querying Structured Data Through Natural Language Using Language Models


35. JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency


36. R2-Write: Reflection and Revision for Open-Ended Writing with Deep Reasoning


37. Prompt Compression in the Wild: Measuring Latency, Rate Adherence, and Quality for Faster LLM Inference


38. LogicPoison: Logical Attacks on Graph Retrieval-Augmented Generation


39. How Annotation Trains Annotators: Competence Development in Social Influence Recognition


40. Council Mode: Mitigating Hallucination and Bias in LLMs via Multi-Agent Consensus


41. Toward an Artificial General Teacher: Procedural Geometry Data Generation and Visual Grounding with Vision-Language Models


42. One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging


43. LLM+Graph@VLDB’2025 Workshop Summary


44. Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis


45. QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models


46. ChatSVA: Bridging SVA Generation for Hardware Verification via Task-Specific LLMs


47. PaveBench: A Versatile Benchmark for Pavement Distress Perception and Interactive Vision-Language Analysis


48. Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks


49. SentinelAgent: Intent-Verified Delegation Chains for Securing Federal Multi-Agent AI Systems


50. Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs


51. IndustryCode: A Benchmark for Industry Code Generation


52. V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views


53. Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy


54. Trivial Vocabulary Bans Improve LLM Reasoning More Than Deep Linguistic Constraints


55. Efficient3D: A Unified Framework for Adaptive and Debiased Token Reduction in 3D MLLMs


56. Finding Belief Geometries with Sparse Autoencoders


57. Eligibility-Aware Evidence Synthesis: An Agentic Framework for Clinical Trial Meta-Analysis


58. Do Agent Societies Develop Intellectual Elites? The Hidden Power Laws of Collective Cognition in LLM Multi-Agent Systems


59. Too Polite to Disagree: Understanding Sycophancy Propagation in Multi-Agent Systems


60. Generalization Limits of Reinforcement Learning Alignment


61. GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers


62. Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents


63. Making Written Theorems Explorable by Grounding Them in Formal Representations


64. Moondream Segmentation: From Words to Masks


65. High Volatility and Action Bias Distinguish LLMs from Humans in Group Coordination


66. Understanding the Effects of Safety Unalignment on Large Language Models


67. Pragmatics Meets Culture: Culturally-adapted Artwork Description Generation and Evaluation


68. From Theory to Practice: Code Generation Using LLMs for CAPEC and CWE Frameworks


69. Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits


70. Social Meaning in Large Language Models: Structure, Magnitude, and Pragmatic Prompting


71. An Explainable Vision-Language Model Framework with Adaptive PID-Tversky Loss for Lumbar Spinal Stenosis Diagnosis


72. Token-Efficient Multimodal Reasoning via Image Prompt Packaging


73. Automated Malware Family Classification using Weighted Hierarchical Ensembles of Large Language Models


74. VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation


75. On the Geometric Structure of Layer Updates in Deep Language Models


76. When simulations look right but causal effects go wrong: Large language models as behavioral simulators


77. Do We Need Frontier Models to Verify Mathematical Proofs?


78. LumiVideo: An Intelligent Agentic System for Video Color Grading


79. Improving MPI Error Detection and Repair with Large Language Models and Bug References


80. Ambig-IaC: Multi-level Disambiguation for Interactive Cloud Infrastructure-as-Code Synthesis


81. Beyond Message Passing: Toward Semantically Aligned Agent Communication


82. Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis


83. DrugPlayGround: Benchmarking Large Language Models and Embeddings for Drug Discovery


84. Haiku to Opus in Just 10 bits: LLMs Unlock Massive Compression Gains


85. LLM Reasoning with Process Rewards for Outcome-Guided Steps


86. Empirical Sufficiency Lower Bounds for Language Modeling with Locally-Bootstrapped Semantic Structures


87. Reanalyzing L2 Preposition Learning with Bayesian Mixed Effects and a Pretrained Language Model