전체 AI 논문 - 2026-05-21

1. DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation


2. AiraXiv: An AI-Driven Open-Access Platform for Human and AI Scientists


3. Mind the Sim-to-Real Gap & Think Like a Scientist


4. PALS: Power-Aware LLM Serving for Mixture-of-Experts Models


5. Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work


6. Towards Resilient and Autonomous Networks: A BlueSky Vision on AI-Native 6G


7. Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents


8. ScenePilot: Controllable Boundary-Driven Critical Scenario Generation for Autonomous Driving


9. AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions


10. Playing Devil’s Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy


11. For How Long Should We Be Punching? Learning Action Duration in Fighting Games


12. Governance by Construction for Generalist Agents


13. PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models


14. Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment


15. Interaction Locality in Hierarchical Recursive Reasoning


16. Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards


17. VBFDD-Agent for Electric Vehicle Battery Fault Detection and Diagnosis: Descriptive Text Modeling of Battery Digital Signals


18. Declarative Data Services: Structured Agentic Discovery for Composing Data Systems


19. Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines


20. COAgents: Multi-Agent Framework to Learn and Navigate Routing Problems Search Space


21. From Automated to Autonomous: Hierarchical Agent-native Network Architecture (HANA)


22. Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX


23. Personality Engineering with AI Agents: A New Methodology for Negotiation Research


24. AgentAtlas: Beyond Outcome Leaderboards for LLM Agents


25. Open-World Evaluations for Measuring Frontier AI Capabilities


26. \ECUAS{n}: A family of metrics for principled evaluation of uncertainty-augmented systems


27. High Quality Embeddings for Horn Logic Reasoning


28. AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows


29. OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind


30. Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration


31. SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation


32. Variance Reduction for Expectations with Diffusion Teachers


33. Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate


34. WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata


35. Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling


36. Mem-$π$: Adaptive Memory through Learning When and What to Generate


37. HITL-D: Human In The Loop Diffusion Assisted Shared Control


38. Quality and Security Signals in AI-Generated Python Refactoring Pull Requests


39. Approximation Theory for Neural Networks: Old and New


40. Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs


41. TempGlitch: Evaluating Vision-Language Models for Temporal Glitch Detection in Gameplay Videos


42. torchtune: PyTorch native post-training library


43. HiRes: Inspectable Precedent Memory for Reaction Condition Recommendation


44. FedCritic: Serverless Federated Critic Learning-based Resource Allocation for Multi-Cell OFDMA in 6G


45. Ordering Matters: Rank-Aware Selective Fusion for Blended Emotion Recognition


46. Stdlib or Third-Party? Empirical Performance and Correctness of LLM-Assisted Zero-Dependency Python Libraries


47. Open-source LLMs administer maximum electric shocks in a Milgram-like obedience experiment


48. Designing Conversations with the Dead: How People Engage with Generative Ghosts


49. On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures


50. SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents


51. How to Build Marcus’s Algebraic Mind: Algebro-Deterministic Substrate over Galois Fields


52. Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training


53. Data-Efficient Neural Operator Training via Physics-Based Active Learning


54. SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence


55. TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization


56. Frontier: Towards Comprehensive and Accurate LLM Inference Simulation


57. DeCoR: Design and Control Co-Optimization for Urban Streets Using Reinforcement Learning


58. Deformba: Vision State Space Model with Adaptive State Fusion


59. From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach


60. Tracing the ongoing emergence of human-like reasoning in Large Language Models


61. TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs – A Case Study in Mental Health


62. Large-Step Training Dynamics of a Two-Factor Linear Transformer Model


63. \textit{Stochastic} MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent


64. MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset


65. How Much Online RL is Enough? Informative Rollouts for Offline Preference Optimization in RLVR


66. Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation


67. APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents


68. RePCM: Region-Specific and Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis


69. OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization


70. PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment


71. Artificial Intelligence Reshapes Microwave Photonics


72. Behavior-Consistent Deep Reinforcement Learning


73. Enhanced Reinforcement Learning-based Process Synthesis via Quantum Computing


74. SURGE: An Event-Centric Social Media Sentiment Time Series Benchmark with Interaction Structure


75. SAM-Sode: Towards Faithful Explanations for Tiny Bacteria Detection


76. Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding


77. Comparative Analysis of Military Detection Using Drone Imagery Across Multiple Visual Spectrums


78. Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models


79. Detecting Trojaned DNNs via Spectral Regression Analysis


80. On the Complexity of Entailment for Cumulative Propositional Dependence Logics


81. Efficient Learning of Deep State Space Models via Importance Smoothing


82. ACL-Verbatim: hallucination-free question answering for research


83. Decoupling Communication from Policy: Robust MARL under Bandwidth Constraints


84. Fine-grained Claim-level RAG Benchmark for Law


85. Grounding Driving VLA via Inverse Kinematics


86. Divide et Calibra: Multiclass Local Calibration via Vector Quantization


87. DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation


88. Beyond Text-to-SQL: An Agentic LLM System for Governed Enterprise Analytics APIs


89. Single-Pass, Depth-Selective Reading for Multi-Aspect Sentiment Analysis


90. Hybrid Machine Learning Model for Forest Height Estimation from TanDEM-X and Landsat Data


91. Towards Context-Invariant Safety Alignment for Large Language Models


92. A Sharper Picture of Generalization in Transformers


93. Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory


94. Comparative Evaluation of Deep Learning Models for Fake Image Detection


95. Finding the Correct Visual Evidence Without Forgetting: Mitigating Hallucination in LVLMs via Inter-Layer Visual Attention Discrepancy


96. Focus-then-Context: Subject-Centric Progressive Visual Token Reduction for Vision-Language Models


97. DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU


98. Strategy-Induct: Task-Level Strategy Induction for Instruction Generation


99. Causal Past Logic for Runtime Verification of Distributed LLM Agent Workflows


100. Winfree Oscillatory Neural Network


101. Sutra: Tensor-Op RNNs as a Compilation Target for Vector Symbolic Architectures


102. Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models


103. VISTA: Technical Report for the Ego4D Short-Term Object Interaction Anticipation at EgoVis 2026


104. GenAI-Driven Threat Detection with Microsoft Security Copilot


105. Terminal-World: Scaling Terminal-Agent Environments via Agent Skills


106. CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation


107. Runtime-Certified Bounded-Error Quantized Attention


108. Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards


109. DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation


110. USV: Towards Understanding the User-generated Short-form Videos


111. ArchSIBench: Benchmarking the Architectural Spatial Intelligence of Vision-Language Models


112. GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval


113. Tunable MAGMAX: Preference-Aware Model Merging for Continual Learning


114. ELSA: An ELastic SNN Inference Architecture for Efficient Neuromorphic Computing


115. Correcting Stochastic Update Bias in Preconditioned Language Model Optimizers


116. PACD-Net: Pseudo-Augmented Contrastive Distillation for Glycemic Control Estimation from SMBG


117. The Devil is in the Condition Numbers: Why is GLU Better than non-GLU Structure?


118. The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering


119. Hack-Verifiable Environments: Towards Evaluating Reward Hacking at Scale


120. Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression


121. An Application-Layer Multi-Modal Covert-Channel Reference Monitor for LLM Agent Egress


122. TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design


123. Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning


124. AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback


125. SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction


126. SCRIBE: Diagnostic Evaluation and Rich Transcription Models for Indic ASR


127. Rethinking Cross-Layer Information Routing in Diffusion Transformers


128. Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU


129. Heartbeat-Bound Hierarchical Credentials: Cryptographic Revocation for AI Agent Swarms


130. Interpretable Discriminative Text Representations via Agreement and Label Disentanglement


131. DIVE: Embedding Compression via Self-Limiting Gradient Updates


132. Dynamic TMoE: A Drift-Aware Dynamic Mixture of Experts Framework for Non-Stationary Time Series Forecasting


133. On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists


134. REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak


135. AMAR: Lightweight Attention-Based Multi-User Activity Recognition from Wi-Fi CSI


136. Jointly Learning Predicates and Actions Enables Zero-Shot Skill Composition


137. Design for Manufacturing: A Manufacturability Knowledge-Integrated Reinforcement Learning Framework for Free-Form Pipe Routing in Aeroengines


138. AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals


139. Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs


140. Pareto-Enhanced Portrait Generation: Vision-Aligned Text Supervision for Alignment, Realism, and Aesthetics


141. Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task


142. Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models


143. Lower Bounds for Advection-Diffusion Equations: An Exploration with AI-Generated Proofs


144. Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts


145. Self-Training Doesn’t Flatten Language – It Restructures It: Surface Markers Amplify While Deep Syntax Dies


146. Multi-agent Collaboration with State Management


147. Complementing reinforcement learning with SFT through logit averaging in the post training of LLMs


148. Faster or Stronger: Towards Flexible Visual Place Recognition via Weighted Aggregation and Token Pruning


149. Latent Process Generator Matching


150. Axiomatizing Neural Networks via Pursuit of Subspaces


151. Collocational bootstrapping: A hypothesis about the learning of subject-verb agreement in humans and neural networks


152. NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding


153. Machine-Learning-Enhanced Non-Invasive Testing for MASLD Fibrosis: Shallow-Deep Neural Networks Versus FIB-4, Tabular Foundation Models, and Large Language Models


154. Codec-Robust Attacks on Audio LLMs


155. ShadeBench: A Benchmark Dataset for Building Shade Simulation in Sustainable Society


156. Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection


157. Training Language Agents to Learn from Experience


158. Code Generation by Differential Test Time Scaling


159. EPC-3D-Diff: Equivariant Physics Consistent Conditional 3D Latent Diffusion for CBCT to CT Synthesis


160. Pixel Wised Lesion Prediction on COVID-19 CT Imagery: A Comparative Analysis of Automated Image Segmentation Architectures


161. Agentic Agile-V: From Vibe Coding to Verified Engineering in Software and Hardware Development


162. LLM Pretraining Shapes a Generalizable Manifold: Insights into Cross-Modal Transfer to Time Series


163. A Comprehensive Comparison of Deep Learning Architectures for COVID-19 Classification on CT & X-ray Imagery


164. Modeling Emotional Dynamics in Agent-to-Agent Interactions on Moltbook


165. Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics


166. Group-Algebraic Tensors: Provably-optimal Equivariant Learning and Physical Symmetry Discovery


167. Mechanics of Bias and Reasoning: Interpreting the Impact of Chain-of-Thought Prompting on Gender Bias in LLMs


168. Disentangling Sampling from Training Budget in Class-Imbalanced CT Body Composition Segmentation


169. Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor


170. STELLAR: Scaling 3D Perception Large Models for Autonomous Driving


171. Nonlocal operator learning for fMRI encoding and decoding tasks


172. ConceptSeg-R1: Segment Any Concept via Meta-Reinforcement Learning


173. Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs


174. SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework


175. Latent Space Guided Scenario Sampling for Multimodal Segmentation Under Missing Modalities


176. DEL: Digit Entropy Loss for Numerical Learning of Large Language Models


177. Security Document Classification with a Fine-Tuned Local Large Language Model: Benchmark Data and an Open-Source System


178. Consistently Informative Soft-Label Temperature for Knowledge Distillation


179. Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models


180. Causal Unlearning in Collaborative Optimization: Exact and Approximate Influence Reversal under Adversarial Contributions


181. Targeting Clause Type Distributions: a Picklock for Random Satisfiability Problems


182. Representability-Aware Neural Networks for Reduced Density Matrices: Application to Fractional Chern Insulators


183. FullFlow: Upgrading Text-to-Image Flow Matching Models for Bidirectional Vision–Language Generation


184. Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases


185. Tiny-Engram: Trigger-Indexed Concept Tables for Generative Vision


186. SDM: A Powerful Tool for Evaluating Model Robustness


187. Co-Fusion4D: Spatio-temporal Collaborative Fusion for Robust 3D Object Detection


188. Robust Subspace-Constrained Quadratic Models for Low-Dimensional Structure Learning


189. Mechanisms of Misgeneralization in Physical Sequence Modeling


190. Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining


191. Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization


192. Closed-form predictive coding via hierarchical Gaussian filters


193. Plug-and-Play Spiking Operators: Breaking the Nonlinearity Bottleneck in Spiking Transformers


194. FusionCell: Cross-Attentive Fusion of Layout Geometry and Netlist Topology for Standard-Cell Performance Prediction


195. Introspective X Training: Feedback Conditioning Improves Scaling Across all LLM Training Stages


196. JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA


197. Can Vision Models Truly Forget? Mirage: Representation-Level Certification of Visual Unlearning


198. ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison


199. Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis


200. You Don’t Need Attention: Gated Convolutional Modeling for Watch-Based Fall Detection


201. PolycubeNet: A Dual-latent Diffusion Model for Polycube-Based Hexahedral Mesh Generation


202. Modality-Decoupled Online Recursive Editing


203. Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning


204. Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs


205. Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity


206. Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding


207. Generation of Heterogeneous PET Images from Uniform Organ Activity Maps Using a Pretrained Domain-Adapted Diffusion Model


208. Residual Paving: Diagnosing the Routing Bottleneck in Selective Refusal Editing


209. It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs



211. FBOS-RL: Feedback-Driven Bi-Objective Synergistic Reinforcement Learning


212. Multi-Agent Reinforcement Learning for Safe Autonomous Driving Under Pedestrian Behavioral Uncertainty


213. Efficient Table QA via TableGrid Navigation and Progressive Inference Prompting


214. ProcBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding Agents


215. Automated Kernel Discovery Towards Understanding High-dimensional Bayesian Optimization


216. CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning


217. GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents



219. LEAP: A closed-loop framework for perovskite precursor additive discovery


220. Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry


221. Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine


222. TabPFN-MT: A Natively Multitask In-Context Learner for Tabular Data


223. AI-Assisted Competency Assessment from Egocentric Video in Simulation-Based Nursing Education


224. Network-Based Interventions for HIV Prevention via Cascade-Aware Suppression of Transmission


225. Leveraging Vision-Language Models to Detect Attention in Educational Videos


226. Governance by Design: Architecting Agentic AI for Organizational Learning and Scalable Autonomy


227. PrivacyAkinator: Articulating Key Privacy Design Decisions by Answering LLM-Generated Multiple-choice Questions


228. RealUserSim: Bridging the Reality Gap in Agent Benchmarking via Grounded User Simulation


229. GrandGuard: Taxonomy, Benchmark, and Safeguards for Elderly-Chatbot Interaction Safety


230. Under Pressure: Emotional Framing Induces Measurable Behavioral Shifts and Structured Internal Geometry in Small Language Models


231. Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning


232. Evaluating multimodal emotion recognition in proactive conversational agents: A user study


233. FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation


234. Data Scaling as Progressive Coverage of a Predictive Contribution Spectrum


235. Pseudo-Siamese Network for Planning in Target-Oriented Proactive Dialogues


236. Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction


237. Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification


238. GraphDiffMed: Knowledge-Constrained Differential Attention with Pharmacological Graph Priors for Medication Recommendation


239. Neural Estimation of Pairwise Mutual Information in Masked Discrete Sequence Models


240. Diverge to Induce Prompting: Multi-Rationale Induction for Zero-Shot Reasoning