전체 AI 논문 - 2026-03-03

1. DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science


2. A Minimal Agent for Automated Theorem Proving


3. Uncertainty Quantification for Multimodal Large Language Models with Incoherence-adjusted Semantic Volume


4. Learning Flexible Job Shop Scheduling under Limited Buffers and Material Kitting Constraints


5. LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics


6. Recycling Failures: Salvaging Exploration in RLVR via Fine-Grained Off-Policy Guidance


7. Artificial Agency Program: Curiosity, compression, and communication in agents


8. Bi-level RL-Heuristic Optimization for Real-world Winter Road Maintenance


9. Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction


10. CIRCLE: A Framework for Evaluating AI from a Real-World Lens


11. Portfolio Reinforcement Learning with Scenario-Context Rollout


12. Pessimistic Auxiliary Policy for Offline Reinforcement Learning



14. RUMAD: Reinforcement-Unifying Multi-Agent Debate


15. EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models


16. Reasoning-Driven Multimodal LLM for Domain Generalization


17. Unlocking Cognitive Capabilities and Analyzing the Perception-Logic Trade-off


18. The Auton Agentic AI Framework


19. ProductResearch: Training E-Commerce Deep Research Agents via Multi-Agent Synthetic Trajectory Distillation


20. From Flat Logs to Causal Graphs: Hierarchical Failure Attribution for LLM-based Multi-Agent Systems


21. ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference


22. PseudoAct: Leveraging Pseudocode Synthesis for Flexible Planning and Action Control in Large Language Model Agents


23. AI Must Embrace Specialization via Superhuman Adaptable Intelligence


24. MMKG-RDS: Reasoning Data Synthesis via Deep Mining of Multimodal Knowledge Graphs


25. SleepLM: Natural-Language Intelligence for Human Sleep


26. Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem


27. Planning under Distribution Shifts with Causal POMDPs


28. Causal Identification from Counterfactual Data: Completeness and Bounding Results


29. An Agentic LLM Framework for Adverse Media Screening in AML Compliance


30. HumanMCP: A Human-Like Query Dataset for Evaluating MCP Tool Retrieval Performance


31. Do LLMs Benefit From Their Own Words?


32. CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation


33. Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation


34. Memory Caching: RNNs with Growing Memory


35. Resources for Automated Evaluation of Assistive RAG Systems that Help Readers with News Trustworthiness Assessment


36. Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification


37. FaultXformer: A Transformer-Encoder Based Fault Classification and Location Identification model in PMU-Integrated Active Electrical Distribution System


38. SafeGen-LLM: Enhancing Safety Generalization in Task Planning for Robotic Systems


39. Controllable Reasoning Models Are Private Thinkers


40. An Efficient Unsupervised Federated Learning Approach for Anomaly Detection in Heterogeneous IoT Networks


41. Resilient Strategies for Stochastic Systems: How Much Does It Take to Break a Winning Strategy?


42. A Mixed Diet Makes DINO An Omnivorous Vision Encoder


43. Task-Centric Acceleration of Small-Language Models


44. ArgLLM-App: An Interactive System for Argumentative Reasoning with Large Language Models


45. CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning


46. Multimodal Optimal Transport for Unsupervised Temporal Segmentation in Surgical Robotics


47. Terminology Rarity Predicts Catastrophic Failure in LLM Translation of Low-Resource Ancient Languages: Evidence from Ancient Greek


48. Toward Guarantees for Clinical Reasoning in Vision Language Models via Formal Verification


49. ARGUS: Seeing the Influence of Narrative Features on Persuasion in Argumentative Texts


50. DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer


51. Preference Packing: Efficient Preference Optimization for Large Language Models


52. Adaptive Correlation-Weighted Intrinsic Rewards for Reinforcement Learning


53. Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis


54. Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization


55. Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving


56. RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models


57. Interpretable Debiasing of Vision-Language Models for Social Fairness


58. Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking


59. Foundation World Models for Agents that Learn, Verify, and Adapt Reliably Beyond Static Environments


60. MINT: Multimodal Imaging-to-Speech Knowledge Transfer for Early Alzheimer’s Screening


61. Intrinsic Lorentz Neural Network


62. Ask don’t tell: Reducing sycophancy in large language models


63. SHINE: Sequential Hierarchical Integration Network for EEG and MEG


64. Micro-expression Recognition Based on Dual-branch Feature Extraction and Fusion



66. Hierarchical Concept-based Interpretable Models


67. PointCoT: A Multi-modal Benchmark for Explicit 3D Geometric Reasoning


68. Green or Fast? Learning to Balance Cold Starts and Idle Carbon in Serverless Computing


69. The Geometry of Transfer: Unlocking Medical Vision Manifolds for Training-Free Model Ranking


70. Experience-Guided Self-Adaptive Cascaded Agents for Breast Cancer Screening and Diagnosis with Reduced Biopsy Referrals


71. Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks


72. Uncovering sustainable personal care ingredient combinations using scientific modelling


73. Exploring Robust Intrusion Detection: A Benchmark Study of Feature Transferability in IoT Botnet Attack Detection


74. MI$^2$DAS: A Multi-Layer Intrusion Detection Framework with Incremental Learning for Securing Industrial IoT Networks


75. Enhancing Continual Learning for Software Vulnerability Prediction: Addressing Catastrophic Forgetting via Hybrid-Confidence-Aware Selective Replay for Temporal LLM Fine-Tuning


76. FedNSAM:Consistency of Local and Global Flatness for Federated Learning


77. Learning to maintain safety through expert demonstrations in settings with unknown constraints: A Q-learning perspective


78. Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parameteric Policies


79. See, Act, Adapt: Active Perception for Unsupervised Cross-Domain Visual Adaptation via Personalized VLM-Guided Agent


80. Operationalizing Longitudinal Causal Discovery Under Real-World Workflow Constraints


81. MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models


82. UPath: Universal Planner Across Topological Heterogeneity For Grid-Based Pathfinding


83. TradeFM: A Generative Foundation Model for Trade-flow and Market Microstructure


84. Bridging Dynamics Gaps via Diffusion Schrödinger Bridge for Cross-Domain Reinforcement Learning


85. From Static Benchmarks to Dynamic Protocol: Agent-Centric Text Anomaly Detection for Evaluating LLM Reasoning


86. SLA-Aware Distributed LLM Inference Across Device-RAN-Cloud


87. SAGE-LLM: Towards Safe and Generalizable LLM Controller with Fuzzy-CBF Verification and Graph-Structured Knowledge Retrieval for UAV Decision


88. Optimizer-Induced Low-Dimensional Drift and Transverse Dynamics in Transformer Training


89. Interpretable Multimodal Gesture Recognition for Drone and Mobile Robot Teleoperation via Log-Likelihood Ratio Fusion


90. The Compulsory Imaginary: AGI and Corporate Authority


91. Blockchain-Enabled Routing for Zero-Trust Low-Altitude Intelligent Networks


92. TRIZ-RAGNER: A Retrieval-Augmented Large Language Model for TRIZ-Aware Named Entity Recognition in Patent-Based Contradiction Mining


93. ProtoDCS: Towards Robust and Efficient Open-Set Test-Time Adaptation for Vision-Language Models


94. 3D Modality-Aware Pre-training for Vision-Language Model in MRI Multi-organ Abnormality Detection


95. AudioCapBench: Quick Evaluation on Audio Captioning across Sound, Music, and Speech


96. FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA


97. FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation


98. DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model


99. ReDON: Recurrent Diffractive Optical Neural Processor with Reconfigurable Self-Modulated Nonlinearity


100. When Does Multimodal Learning Help in Healthcare? A Benchmark on EHR and Chest X-Ray Fusion


101. LLM-Driven Multi-Turn Task-Oriented Dialogue Synthesis for Realistic Reasoning


102. LFQA-HP-1M: A Large-Scale Human Preference Dataset for Long-Form Question Answering


103. KEEP: A KV-Cache-Centric Memory Management System for Efficient Embodied Planning


104. Pseudo Contrastive Learning for Diagram Comprehension in Multimodal Models


105. Hyperdimensional Cross-Modal Alignment of Frozen Language and Image Models for Efficient Image Captioning


106. SDMixer: Sparse Dual-Mixer for Time Series Forecasting


107. BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners via Inter-group Data Augmentation


108. CycleBEV: Regularizing View Transformation Networks via View Cycle Consistency for Bird’s-Eye-View Semantic Segmentation


109. Evidential Neural Radiance Fields


110. Flowette: Flow Matching with Graphette Priors for Graph Generation


111. Hierarchical Multi-Scale Graph Learning with Knowledge-Guided Attention for Whole-Slide Image Survival Analysis


112. Rudder: Steering Prefetching in Distributed GNN Training using LLM Agents


113. Humans and LLMs Diverge on Probabilistic Inferences


114. Modelling and Simulation of Neuromorphic Datasets for Anomaly Detection in Computer Vision


115. SegReg: Latent Space Regularization for Improved Medical Image Segmentation


116. FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments


117. TaCarla: A comprehensive benchmarking dataset for end-to-end autonomous driving


118. Optimization of Edge Directions and Weights for Mixed Guidance Graphs in Lifelong Multi-Agent Path Finding


119. BiKA: Kolmogorov-Arnold-Network-inspired Ultra Lightweight Neural Network Hardware Accelerator


120. SALIENT: Frequency-Aware Paired Diffusion for Controllable Long-Tail CT Detection


121. Human Supervision as an Information Bottleneck: A Unified Theory of Error Floors in Human-Guided Learning


122. DesignSense: A Human Preference Dataset and Reward Modeling Framework for Graphic Layout Generation


123. Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG


124. Long Range Frequency Tuning for QML


125. Learning to Generate Secure Code via Token-Level Rewards


126. Task-Lens: Cross-Task Utility Based Speech Dataset Profiling for Low-Resource Indian Languages


127. Hello-Chat: Towards Realistic Social Audio Interactions


128. Now You See Me: Designing Responsible AI Dashboards for Early-Stage Health Innovation


129. Higress-RAG: A Holistic Optimization Framework for Enterprise Retrieval-Augmented Generation via Dual Hybrid Retrieval, Adaptive Routing, and CRAG


130. Democratizing GraphRAG: Linear, CPU-Only Graph Retrieval for Multi-Hop QA



132. Toward General Semantic Chunking: A Discriminative Framework for Ultra-Long Documents


133. Reason to Contrast: A Cascaded Multimodal Retrieval Framework


134. Keyword search is all you need: Achieving RAG-Level Performance without vector databases using agentic tool use


135. Let There Be Claws: An Early Social Network Analysis of AI Agents on Moltbook


136. QD-MAPPER: A Quality Diversity Framework to Automatically Evaluate Multi-Agent Path Finding Algorithms in Diverse Maps