LLM 관련 주요 논문 - 2026-03-13

1. Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training


2. TopoBench: Benchmarking LLMs on Hard Topological Reasoning


3. Increasing intelligence in AI agents can worsen collective outcomes


4. On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents


5. Can RL Improve Generalization of LLM Agents? An Empirical Study


6. LABSHIELD: A Multimodal Benchmark for Safety-Critical Reasoning and Planning in Scientific Laboratories


7. Learning Transferable Sensor Models via Language-Informed Pretraining


8. Prototype-Based Knowledge Guidance for Fine-Grained Structured Radiology Reporting


9. AdaFuse: Accelerating Dynamic Adapter Inference via Token-Level Pre-Gating and Fused Kernel Optimization


10. Automating Skill Acquisition through Large-Scale Mining of Open-Source Agentic Repositories: A Framework for Multi-Agent Procedural Knowledge Extraction


11. DocSage: An Information Structuring Agent for Multi-Doc Multi-Entity Question Answering


12. From Debate to Deliberation: Structured Collective Reasoning with Typed Epistemic Acts


13. Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework


14. Gender Bias in Generative AI-assisted Recruitment Processes


15. When OpenClaw Meets Hospital: Toward an Agentic Operating System for Dynamic Clinical Workflows


16. Scaling Laws for Educational AI Agents


17. Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks


18. LLMs can construct powerful representations and streamline sample-efficient supervised learning


19. VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought


20. See, Symbolize, Act: Grounding VLMs with Spatial Representations for Better Gameplay


21. Leveraging Large Language Models and Survival Analysis for Early Prediction of Chemotherapy Outcomes


22. AI Knows What’s Wrong But Cannot Fix It: Helicoid Dynamics in Frontier LLMs Under High-Stakes Decisions


23. Multi-Agent Collaboration for Automated Design Exploration on High Performance Computing Systems


24. Verified Multi-Agent Orchestration: A Plan-Execute-Verify-Replan Framework for Complex Query Resolution


25. GPT4o-Receipt: A Dataset and Human Study for AI-Generated Document Forensics


26. Speak or Stay Silent: Context-Aware Turn-Taking in Multi-Party Dialogue


27. Deactivating Refusal Triggers: Understanding and Mitigating Overrefusal in Safety Alignment


28. Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI


29. FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles


30. RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents


31. LLM-Augmented Digital Twin for Policy Evaluation in Short-Video Platforms


32. Counterweights and Complementarities: The Convergence of AI and Blockchain Powering a Decentralized Future


33. AI Psychometrics: Evaluating the Psychological Reasoning of Large Language Models with Psychometric Validities


34. COMPASS: The explainable agentic framework for Sovereignty, Sustainability, Compliance, and Ethics


35. The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning


36. Mind the Sim2Real Gap in User Simulation for Agentic Tasks


37. Reversible Lifelong Model Editing via Semantic Routing-Based LoRA


38. PACED: Distillation at the Frontier of Student Competence


39. A Survey of Reasoning in Autonomous Driving Systems: Open Challenges and Emerging Paradigms


40. Sparking Scientific Creativity via LLM-Driven Interdisciplinary Inspiration


41. BehaviorVLM: Unified Finetuning-Free Behavioral Understanding with Vision-Language Reasoning


42. IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL


43. SommBench: Assessing Sommelier Expertise of Language Models


44. Human-Centred LLM Privacy Audits: Findings and Frictions


45. Resource-Efficient Iterative LLM-Based NAS with Feedback Memory


46. Paper Title: LoV3D: Grounding Cognitive Prognosis Reasoning in Longitudinal 3D Brain MRI via Regional Volume Assessments


47. Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems


48. An Intent of Collaboration: On Agencies between Designers and Emerging (Intelligent) Technologies


49. BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs


50. HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios


51. MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?


52. Understanding LLM Behavior When Encountering User-Supplied Harmful Content in Harmless Tasks


53. Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models


54. Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language


55. ELISA: An Interpretable Hybrid Generative AI Agent for Expression-Grounded Discovery in Single-Cell Genomics


56. You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents


57. RADAR: Closed-Loop Robotic Data Generation via Semantic Planning and Autonomous Causal Environment Reset


58. Compression Favors Consistency, Not Truth: When and Why Language Models Prefer Correct Information


59. OSCBench: Benchmarking Object State Change in Text-to-Video Generation


60. SemBench: A Universal Semantic Framework for LLM Evaluation


61. Entropy-Preserving Reinforcement Learning


62. From Control to Foresight: Simulation as a New Paradigm for Human-Agent Collaboration


63. Tokenization Allows Multimodal Large Language Models to Understand, Generate and Edit Architectural Floor Plans


64. MedPruner: Training-Free Hierarchical Token Pruning for Efficient 3D Medical Image Understanding in Vision-Language Models


65. Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats


66. Performance Evaluation of Open-Source Large Language Models for Assisting Pathology Report Writing in Japanese


67. UtilityMax Prompting: A Formal Framework for Multi-Objective Large Language Model Optimization


68. ReHARK: Refined Hybrid Adaptive RBF Kernels for Robust One-Shot Vision-Language Adaptation


69. KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation


70. INFACT: A Diagnostic Benchmark for Induced Faithfulness and Factuality Hallucinations in Video-LLMs


71. Stop Listening to Me! How Multi-turn Conversations Can Degrade Diagnostic Reasoning


72. Agentic AI for Embodied-enhanced Beam Prediction in Low-Altitude Economy Networks


73. Resolving Java Code Repository Issues with iSWE Agent


74. Novelty Adaptation Through Hybrid Large Language Model (LLM)-Symbolic Planning and LLM-guided Reinforcement Learning


75. Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover


76. Artificial Intelligence for Sentiment Analysis of Persian Poetry


77. Markovian Generation Chains in Large Language Models


78. MDER-DR: Multi-Hop Question Answering with Entity-Centric Summaries


79. A Simple Efficiency Incremental Learning Framework via Vision-Language Model with Nonlinear Multi-Adapters


80. WebWeaver: Breaking Topology Confidentiality in LLM Multi-Agent Systems with Stealthy Context-Based Inference


81. Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers


82. Graph Tokenization for Bridging Graphs and Transformers


83. The Attack and Defense Landscape of Agentic AI: A Comprehensive Survey


84. Quality-Driven Agentic Reasoning for LLM-Assisted Software Design: Questions-of-Thoughts (QoT) as a Time-Series Self-QA Chain


85. CR-Bench: Evaluating the Real-World Utility of AI Code Review Agents


86. From Phase Prediction to Phase Design: A ReAct Agent Framework for High-Entropy Alloy Discovery


87. Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation


88. Exploring Collatz Dynamics with Human-LLM Collaboration