LLM 관련 주요 논문 - 2026-03-02

1. DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science


2. Uncertainty Quantification for Multimodal Large Language Models with Incoherence-adjusted Semantic Volume


3. LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics



5. RUMAD: Reinforcement-Unifying Multi-Agent Debate


6. EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models


7. Reasoning-Driven Multimodal LLM for Domain Generalization


8. Unlocking Cognitive Capabilities and Analyzing the Perception-Logic Trade-off


9. The Auton Agentic AI Framework


10. ProductResearch: Training E-Commerce Deep Research Agents via Multi-Agent Synthetic Trajectory Distillation


11. From Flat Logs to Causal Graphs: Hierarchical Failure Attribution for LLM-based Multi-Agent Systems


12. ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference


13. PseudoAct: Leveraging Pseudocode Synthesis for Flexible Planning and Action Control in Large Language Model Agents


14. An Agentic LLM Framework for Adverse Media Screening in AML Compliance


15. Do LLMs Benefit From Their Own Words?


16. CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation


17. Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation


18. SafeGen-LLM: Enhancing Safety Generalization in Task Planning for Robotic Systems


19. Task-Centric Acceleration of Small-Language Models


20. ArgLLM-App: An Interactive System for Argumentative Reasoning with Large Language Models


21. Terminology Rarity Predicts Catastrophic Failure in LLM Translation of Low-Resource Ancient Languages: Evidence from Ancient Greek


22. Toward Guarantees for Clinical Reasoning in Vision Language Models via Formal Verification


23. ARGUS: Seeing the Influence of Narrative Features on Persuasion in Argumentative Texts


24. Preference Packing: Efficient Preference Optimization for Large Language Models


25. Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis


26. Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization


27. Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving


28. RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models


29. Interpretable Debiasing of Vision-Language Models for Social Fairness


30. Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking


31. Ask don’t tell: Reducing sycophancy in large language models



33. PointCoT: A Multi-modal Benchmark for Explicit 3D Geometric Reasoning


34. Enhancing Continual Learning for Software Vulnerability Prediction: Addressing Catastrophic Forgetting via Hybrid-Confidence-Aware Selective Replay for Temporal LLM Fine-Tuning


35. See, Act, Adapt: Active Perception for Unsupervised Cross-Domain Visual Adaptation via Personalized VLM-Guided Agent


36. MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models


37. From Static Benchmarks to Dynamic Protocol: Agent-Centric Text Anomaly Detection for Evaluating LLM Reasoning


38. SLA-Aware Distributed LLM Inference Across Device-RAN-Cloud


39. SAGE-LLM: Towards Safe and Generalizable LLM Controller with Fuzzy-CBF Verification and Graph-Structured Knowledge Retrieval for UAV Decision


40. TRIZ-RAGNER: A Retrieval-Augmented Large Language Model for TRIZ-Aware Named Entity Recognition in Patent-Based Contradiction Mining


41. ProtoDCS: Towards Robust and Efficient Open-Set Test-Time Adaptation for Vision-Language Models


42. 3D Modality-Aware Pre-training for Vision-Language Model in MRI Multi-organ Abnormality Detection


43. AudioCapBench: Quick Evaluation on Audio Captioning across Sound, Music, and Speech


44. FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA


45. FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation


46. ReDON: Recurrent Diffractive Optical Neural Processor with Reconfigurable Self-Modulated Nonlinearity


47. LLM-Driven Multi-Turn Task-Oriented Dialogue Synthesis for Realistic Reasoning


48. LFQA-HP-1M: A Large-Scale Human Preference Dataset for Long-Form Question Answering


49. KEEP: A KV-Cache-Centric Memory Management System for Efficient Embodied Planning


50. Pseudo Contrastive Learning for Diagram Comprehension in Multimodal Models


51. Hyperdimensional Cross-Modal Alignment of Frozen Language and Image Models for Efficient Image Captioning


52. BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners via Inter-group Data Augmentation


53. Rudder: Steering Prefetching in Distributed GNN Training using LLM Agents


54. Humans and LLMs Diverge on Probabilistic Inferences


55. Human Supervision as an Information Bottleneck: A Unified Theory of Error Floors in Human-Guided Learning


56. DesignSense: A Human Preference Dataset and Reward Modeling Framework for Graphic Layout Generation


57. Learning to Generate Secure Code via Token-Level Rewards


58. Hello-Chat: Towards Realistic Social Audio Interactions


59. Higress-RAG: A Holistic Optimization Framework for Enterprise Retrieval-Augmented Generation via Dual Hybrid Retrieval, Adaptive Routing, and CRAG


60. Democratizing GraphRAG: Linear, CPU-Only Graph Retrieval for Multi-Hop QA



62. Toward General Semantic Chunking: A Discriminative Framework for Ultra-Long Documents


63. Keyword search is all you need: Achieving RAG-Level Performance without vector databases using agentic tool use