LLM 관련 주요 논문 - 2026-03-11

1. Think Before You Lie: How Reasoning Improves Honesty


2. PathMem: Toward Cognition-Aligned Memory Transformation for Pathology MLLMs


3. MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems


4. Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts


5. Quantifying the Necessity of Chain of Thought through Opaque Serial Depth


6. AutoAgent: Evolving Cognition and Elastic Memory Orchestration for Adaptive Agents


7. OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences


8. EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages


9. MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants


10. PRECEPT: Planning Resilience via Experience, Context Engineering & Probing Trajectories A Unified Framework for Test-Time Adaptation with Compositional Rule Learning and Pareto-Guided Prompt Evolution


11. Enhancing Debunking Effectiveness through LLM-based Personality Adaptation


12. GenePlan: Evolving Better Generalized PDDL Plans using Large Language Models


13. AI Act Evaluation Benchmark: An Open, Transparent, and Reproducible Evaluation Dataset for NLP and RAG Systems


14. Curveball Steering: The Right Direction To Steer Isn’t Always Linear


15. Rescaling Confidence: What Scale Design Reveals About LLM Metacognition


16. Logos: An evolvable reasoning engine for rational molecular design


17. Social-R1: Towards Human-like Social Reasoning in LLMs


18. Cognitively Layered Data Synthesis for Domain Adaptation of LLMs to Space Situational Awareness


19. PrivPRISM: Automatically Detecting Discrepancies Between Google Play Data Safety Declarations and Developer Privacy Policies


20. The Reasoning Trap – Logical Reasoning as a Mechanistic Pathway to Situational Awareness


21. Real-Time Trust Verification for Safe Agentic Actions using TrustBench


22. DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering


23. Deep Tabular Research via Continual Experience-Driven Execution


24. Chaotic Dynamics in Multi-LLM Deliberation


25. Time, Identity and Consciousness in Language Model Agents


26. MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games


27. Meissa: Multi-modal Medical Agentic Intelligence


28. A Consensus-Driven Multi-LLM Pipeline for Missing-Person Investigations


29. AgentOS: From Application Silos to a Natural Language-Driven Data Ecosystem


30. Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance



32. LDP: An Identity-Aware Protocol for Multi-Agent LLM Systems


33. MASEval: Extending Multi-Agent Evaluation from Models to Systems


34. From Data Statistics to Feature Geometry: How Correlations Shape Superposition


35. Understanding the Use of a Large Language Model-Powered Guide to Make Virtual Reality Accessible for Blind and Low Vision People


36. BEACON: Language-Conditioned Navigation Affordance Prediction under Occlusion


37. Towards a Neural Debugger for Python


38. MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning


39. SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases


40. Correction of Transformer-Based Models with Smoothing Pseudo-Projector


41. MITRA: An AI Assistant for Knowledge Retrieval in Physics Collaborations


42. Ego: Embedding-Guided Personalization of Vision-Language Models


43. EXPLORE-Bench: Egocentric Scene Prediction with Long-Horizon Reasoning


44. RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation


45. MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models


46. Mousse: Rectifying the Geometry of Muon with Curvature-Aware Preconditioning


47. ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning


48. ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling


49. AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering


50. Automatic Cardiac Risk Management Classification using large-context Electronic Patients Health Records


51. MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings


52. Grounding Synthetic Data Generation With Vision and Language Models


53. Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation


54. Evolving Prompt Adaptation for Vision-Language Models


55. Common Sense vs. Morality: The Curious Case of Narrative Focus Bias in LLMs


56. Open-World Motion Forecasting


57. Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health


58. TaSR-RAG: Taxonomy-guided Structured Reasoning for Retrieval-Augmented Generation


59. Beyond Scaling: Assessing Strategic Reasoning and Rapid Decision-Making Capability of LLMs in Zero-sum Environments


60. Reading the Mood Behind Words: Integrating Prosody-Derived Emotional Context into Socially Responsive VR Agents


61. Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing


62. Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning


63. DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization


64. Reinforced Generation of Combinatorial Structures: Ramsey Numbers


65. ZeroWBC: Learning Natural Visuomotor Humanoid Control Directly from Human Egocentric Video


66. Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL


67. RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning


68. QUSR: Quality-Aware and Uncertainty-Guided Image Super-Resolution Diffusion Model


69. VIVID-Med: LLM-Supervised Structured Pretraining for Deployable Medical ViTs


70. Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting


71. Automating Detection and Root-Cause Analysis of Flaky Tests in Quantum Software


72. The Missing Memory Hierarchy: Demand Paging for LLM Context Windows


73. Arbiter: Detecting Interference in LLM Agent System Prompts


74. BiCLIP: Domain Canonicalization via Structured Geometric Transformation


75. VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs


76. PathoScribe: Transforming Pathology Data into a Living Library with a Unified LLM-Driven Framework for Semantic Retrieval and Clinical Integration


77. Using Vision Language Foundation Models to Generate Plant Simulation Configurations via In-Context Learning


78. Scale-Plan: Scalable Language-Enabled Task Planning for Heterogeneous Multi-Robot Teams


79. Test-Driven AI Agent Definition (TDAD): Compiling Tool-Using Agents from Behavioral Specifications


80. Large Language Model-Assisted Superconducting Qubit Experiments


81. Turn: A Language for Agentic Computation


82. Hindsight Credit Assignment for Long-Horizon LLM Agents


83. Diagnosing FP4 inference: a layer-wise and block-wise sensitivity analysis of NVFP4 and MXFP4


84. Zipage: Maintain High Request Concurrency for LLM Reasoning through Compressed PagedAttention


85. ARKV: Adaptive and Resource-Efficient KV Cache Management under Limited Memory Budget for Long-Context Inference in LLMs


86. Alignment Is the Disease: Censorship Visibility and Alignment Constraint Complexity as Determinants of Collective Pathology in Multi-Agent LLM Systems


87. SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation


88. CktEvo: Repository-Level RTL Code Benchmark for Design Evolution


89. Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction


90. Let’s Verify Math Questions Step by Step