전체 AI 논문 - 2026-03-11

1. Think Before You Lie: How Reasoning Improves Honesty


2. The Confidence Gate Theorem: When Should Ranked Decision Systems Abstain?


3. PathMem: Toward Cognition-Aligned Memory Transformation for Pathology MLLMs


4. MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems


5. Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts


6. LCA: Local Classifier Alignment for Continual Learning


7. Quantifying the Necessity of Chain of Thought through Opaque Serial Depth


8. World2Mind: Cognition Toolkit for Allocentric Spatial Reasoning in Foundation Models


9. AutoAgent: Evolving Cognition and Elastic Memory Orchestration for Adaptive Agents


10. Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT


11. OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences


12. EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages


13. Logics-Parsing-Omni Technical Report


14. MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants


15. PRECEPT: Planning Resilience via Experience, Context Engineering & Probing Trajectories A Unified Framework for Test-Time Adaptation with Compositional Rule Learning and Pareto-Guided Prompt Evolution


16. Context Engineering: From Prompts to Corporate Multi-Agent Architecture


17. Enhancing Debunking Effectiveness through LLM-based Personality Adaptation


18. Vibe-Creation: The Epistemology of Human-AI Emergent Cognition


19. GenePlan: Evolving Better Generalized PDDL Plans using Large Language Models


20. Telogenesis: Goal Is All U Need


21. An Empirical Study and Theoretical Explanation on Task-Level Model-Merging Collapse


22. AI Act Evaluation Benchmark: An Open, Transparent, and Reproducible Evaluation Dataset for NLP and RAG Systems


23. Robust Regularized Policy Iteration under Transition Uncertainty


24. Curveball Steering: The Right Direction To Steer Isn’t Always Linear


25. Rescaling Confidence: What Scale Design Reveals About LLM Metacognition


26. Logos: An evolvable reasoning engine for rational molecular design


27. Social-R1: Towards Human-like Social Reasoning in LLMs


28. Cognitively Layered Data Synthesis for Domain Adaptation of LLMs to Space Situational Awareness


29. PrivPRISM: Automatically Detecting Discrepancies Between Google Play Data Safety Declarations and Developer Privacy Policies


30. Abundant Intelligence and Deficient Demand: A Macro-Financial Stress Test of Rapid AI Adoption


31. Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents


32. The Reasoning Trap – Logical Reasoning as a Mechanistic Pathway to Situational Awareness


33. Explainable Innovation Engine: Dual-Tree Agent-RAG with Methods-as-Nodes and Verifiable Write-Back


34. Real-Time Trust Verification for Safe Agentic Actions using TrustBench


35. DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering


36. Deep Tabular Research via Continual Experience-Driven Execution


37. Chaotic Dynamics in Multi-LLM Deliberation


38. From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoring


39. EPOCH: An Agentic Protocol for Multi-Round System Optimization


40. Time, Identity and Consciousness in Language Model Agents


41. MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games


42. Meissa: Multi-modal Medical Agentic Intelligence


43. The FABRIC Strategy for Verifying Neural Feedback Systems


44. A Consensus-Driven Multi-LLM Pipeline for Missing-Person Investigations


45. AgentOS: From Application Silos to a Natural Language-Driven Data Ecosystem


46. Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance



48. LDP: An Identity-Aware Protocol for Multi-Agent LLM Systems


49. MASEval: Extending Multi-Agent Evaluation from Models to Systems


50. From Data Statistics to Feature Geometry: How Correlations Shape Superposition


51. Understanding the Use of a Large Language Model-Powered Guide to Make Virtual Reality Accessible for Blind and Low Vision People


52. Emotional Modulation in Swarm Decision Dynamics


53. BEACON: Language-Conditioned Navigation Affordance Prediction under Occlusion


54. Towards a Neural Debugger for Python


55. When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic


56. No Image, No Problem: End-to-End Multi-Task Cardiac Analysis from Undersampled k-Space


57. Towards Flexible Spectrum Access: Data-Driven Insights into Spectrum Demand


58. Adaptive Clinical-Aware Latent Diffusion for Multimodal Brain Image Generation and Missing Modality Imputation


59. AI-Enabled Data-driven Intelligence for Spectrum Demand Estimation


60. MSSR: Memory-Aware Adaptive Replay for Continual LLM Fine-Tuning


61. Emerging Extrinsic Dexterity in Cluttered Scenes via Dynamics-aware Policy Learning


62. A Graph-Based Approach to Spectrum Demand Prediction Using Hierarchical Attention Networks


63. SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases


64. MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents


65. Correction of Transformer-Based Models with Smoothing Pseudo-Projector


66. MITRA: An AI Assistant for Knowledge Retrieval in Physics Collaborations


67. Exploiting Label-Aware Channel Scoring for Adaptive Channel Pruning in Split Learning


68. A Hybrid Quantum-Classical Framework for Financial Volatility Forecasting Based on Quantum Circuit Born Machines


69. First Estimation of Model Parameters for Neutrino-Induced Nucleon Knockout Using Simulation-Based Inference


70. Ego: Embedding-Guided Personalization of Vision-Language Models


71. EXPLORE-Bench: Egocentric Scene Prediction with Long-Horizon Reasoning


72. RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation


73. MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models


74. Mousse: Rectifying the Geometry of Muon with Curvature-Aware Preconditioning


75. ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning


76. ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling


77. AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering


78. Automatic Cardiac Risk Management Classification using large-context Electronic Patients Health Records


79. GNNs for Time Series Anomaly Detection: An Open-Source Framework and a Critical Evaluation


80. When to Lock Attention: Training-Free KV Control in Video Diffusion


81. MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings


82. Grounding Synthetic Data Generation With Vision and Language Models


83. A Variational Latent Equilibrium for Learning in Cortex


84. Routing without Forgetting


85. Compiler-First State Space Duality and Portable $O(1)$ Autoregressive Caching for Inference


86. Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation


87. Evolving Prompt Adaptation for Vision-Language Models


88. Temporal-Conditioned Normalizing Flows for Multivariate Time Series Anomaly Detection


89. EvoDriveVLA: Evolving Autonomous Driving Vision-Language-Action Model via Collaborative Perception-Planning Distillation


90. Declarative Scenario-based Testing with RoadLogic


91. Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers


92. A Guideline-Aware AI Agent for Zero-Shot Target Volume Auto-Delineation


93. Common Sense vs. Morality: The Curious Case of Narrative Focus Bias in LLMs


94. CERES: A Probabilistic Early Warning System for Acute Food Insecurity


95. Open-World Motion Forecasting


96. Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health


97. From Flow to One Step: Real-Time Multi-Modal Trajectory Policies via Implicit Maximum Likelihood Estimation-based Distribution Distillation


98. PromptDLA: A Domain-aware Prompt Document Layout Analysis Framework with Descriptive Knowledge as a Cue


99. Reviving ConvNeXt for Efficient Convolutional Diffusion Models


100. ICDAR 2025 Competition on End-to-End Document Image Machine Translation Towards Complex Layouts


101. Physics-Informed Neural Engine Sound Modeling with Differentiable Pulse-Train Synthesis


102. SPAARS: Safer RL Policy Alignment through Abstract Exploration and Refined Exploitation of Action Space


103. MIL-PF: Multiple Instance Learning on Precomputed Features for Mammography Classification


104. M3GCLR: Multi-View Mini-Max Infinite Skeleton-Data Game Contrastive Learning For Skeleton-Based Action Recognition


105. Democratising Clinical AI through Dataset Condensation for Classical Clinical Models


106. TA-GGAD: Testing-time Adaptive Graph Model for Generalist Graph Anomaly Detection


107. TaSR-RAG: Taxonomy-guided Structured Reasoning for Retrieval-Augmented Generation


108. Beyond Scaling: Assessing Strategic Reasoning and Rapid Decision-Making Capability of LLMs in Zero-sum Environments


109. TimberAgent: Gram-Guided Retrieval for Executable Music Effect Control


110. Reading the Mood Behind Words: Integrating Prosody-Derived Emotional Context into Socially Responsive VR Agents


111. SpaceSense-Bench: A Large-Scale Multi-Modal Benchmark for Spacecraft Perception and Pose Estimation


112. CLoE: Expert Consistency Learning for Missing Modality Segmentation


113. DenoiseSplat: Feed-Forward Gaussian Splatting for Noisy 3D Scene Reconstruction


114. DendroNN: Dendrocentric Neural Networks for Energy-Efficient Classification of Event-Based Data


115. Multi-model approach for autonomous driving: A comprehensive study on traffic sign-, vehicle- and lane detection and behavioral cloning


116. BridgeDiff: Bridging Human Observations and Flat-Garment Synthesis for Virtual Try-Off


117. Embodied Human Simulation for Quantitative Design and Analysis of Interactive Robotics


118. Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing


119. Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning


120. DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization


121. Differentiable Stochastic Traffic Dynamics: Physics-Informed Generative Modelling in Transportation


122. Reinforced Generation of Combinatorial Structures: Ramsey Numbers


123. ZeroWBC: Learning Natural Visuomotor Humanoid Control Directly from Human Egocentric Video


124. GIAT: A Geologically-Informed Attention Transformer for Lithology Identification


125. Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL


126. RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning


127. Causally Sufficient and Necessary Feature Expansion for Class-Incremental Learning


128. QUSR: Quality-Aware and Uncertainty-Guided Image Super-Resolution Diffusion Model


129. DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation


130. PM-Nav: Priori-Map Guided Embodied Navigation in Functional Buildings


131. VIVID-Med: LLM-Supervised Structured Pretraining for Deployable Medical ViTs


132. Composed Vision-Language Retrieval for Skin Cancer Case Search via Joint Alignment of Global and Local Representations


133. Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges


134. Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting


135. GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models


136. A Text-Native Interface for Generative Video Authoring


137. Sim2Act: Robust Simulation-to-Decision Learning via Adversarial Calibration and Group-Relative Perturbation


138. WS-Net: Weak-Signal Representation Learning and Gated Abundance Reconstruction for Hyperspectral Unmixing via State-Space and Weak Signal Attention Fusion


139. PlayWorld: Learning Robot World Models from Autonomous Play


140. Automating Detection and Root-Cause Analysis of Flaky Tests in Quantum Software


141. The Missing Memory Hierarchy: Demand Paging for LLM Context Windows


142. AI Phenomenology for Understanding Human-AI Experiences Across Eras


143. Improving through Interaction: Searching Behavioral Representation Spaces with CMA-ES-IG


144. Gender Fairness in Audio Deepfake Detection: Performance and Disparity Analysis


145. Security Considerations for Multi-agent Systems


146. Arbiter: Detecting Interference in LLM Agent System Prompts


147. Semantic Level of Detail: Multi-Scale Knowledge Representation via Heat Kernel Diffusion on Hyperbolic Manifolds


148. Automated Tensor-Relational Decomposition for Large-Scale Sparse Tensor Computation


149. BiCLIP: Domain Canonicalization via Structured Geometric Transformation


150. VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs


151. PathoScribe: Transforming Pathology Data into a Living Library with a Unified LLM-Driven Framework for Semantic Retrieval and Clinical Integration


152. Using Vision Language Foundation Models to Generate Plant Simulation Configurations via In-Context Learning


153. Quantifying Uncertainty in AI Visibility: A Statistical Framework for Generative Search Measurement


154. Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates


155. FedLECC: Cluster- and Loss-Guided Client Selection for Federated Learning under Non-IID Data


156. Cross-Domain Uncertainty Quantification for Selective Prediction: A Comprehensive Bound Ablation with Transfer-Informed Betting


157. NetDiffuser: Deceiving DNN-Based Network Attack Detection Systems with Diffusion-Generated Adversarial Traffic


158. A New Modeling to Feature Selection Based on the Fuzzy Rough Set Theory in Normal and Optimistic States on Hybrid Information Systems


159. Unpacking Interpretability: Human-Centered Criteria for Optimal Combinatorial Solutions


160. A Lightweight Multi-Cancer Tumor Localization Framework for Deployable Digital Pathology


161. Are Expressive Encoders Necessary for Discrete Graph Generation?


162. Fish Audio S2 Technical Report


163. Beyond Relevance: On the Relationship Between Retrieval and RAG Information Coverage


164. Scale-Plan: Scalable Language-Enabled Task Planning for Heterogeneous Multi-Robot Teams


165. Test-Driven AI Agent Definition (TDAD): Compiling Tool-Using Agents from Behavioral Specifications


166. Large Language Model-Assisted Superconducting Qubit Experiments


167. Multi-level meta-reinforcement learning with skill-based curriculum


168. Clear, Compelling Arguments: Rethinking the Foundations of Frontier AI Safety Cases


169. EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation


170. Generalized Reduction to the Isotropy for Flexible Equivariant Neural Fields


171. Turn: A Language for Agentic Computation


172. Hindsight Credit Assignment for Long-Horizon LLM Agents


173. Permutation-Equivariant 2D State Space Models: Theory and Canonical Architecture for Multivariate Time Series


174. Diagnosing FP4 inference: a layer-wise and block-wise sensitivity analysis of NVFP4 and MXFP4


175. Zipage: Maintain High Request Concurrency for LLM Reasoning through Compressed PagedAttention


176. Architectural Design and Performance Analysis of FPGA based AI Accelerators: A Comprehensive Review


177. Sensitivity-Guided Framework for Pruned and Quantized Reservoir Computing Accelerators


178. Autonomous Edge-Deployed AI Agents for Electric Vehicle Charging Infrastructure Management


179. Benchmarking Federated Learning in Edge Computing Environments: A Systematic Review and Performance Evaluation


180. Measurement-Free Ancilla Recycling via Blind Reset: A Cross-Platform Study on Superconducting and Trapped-Ion Processors


181. ARKV: Adaptive and Resource-Efficient KV Cache Management under Limited Memory Budget for Long-Context Inference in LLMs


182. PhD Thesis Summary: Methods for Reliability Assessment and Enhancement of Deep Neural Network Hardware Accelerators


183. Alignment Is the Disease: Censorship Visibility and Alignment Constraint Complexity as Determinants of Collective Pathology in Multi-Agent LLM Systems


184. ALADIN: Accuracy-Latency-Aware Design-space Inference Analysis for Embedded AI Accelerators


185. SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation


186. CktEvo: Repository-Level RTL Code Benchmark for Design Evolution


187. Design Conductor: An agent autonomously builds a 1.5 GHz Linux-capable RISC-V CPU


188. Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction


189. Let’s Verify Math Questions Step by Step