전체 AI 논문 - 2026-05-12

1. Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace


2. Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory


3. BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD


4. The Generalized Turing Test: A Foundation for Comparing Intelligence


5. From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World


6. The First Drop of Ink: Nonlinear Impact of Misleading Information in Long-Context Reasoning


7. MaD Physics: Evaluating information seeking under constraints in physical environments


8. CLEF: EEG Foundation Model for Learning Clinical Semantics


9. Probing Cross-modal Information Hubs in Audio-Visual LLMs


10. NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation


11. Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge


12. New AI-Driven Tools for Enhancing Campus Well-being: A Prevention and Intervention Approach


13. Interpretable Machine Learning for Football Performance Analysis: Evidence of Limited Transferability from Elite Leagues to University Competition


14. PathISE: Learning Informative Path Supervision for Knowledge Graph Question Answering


15. ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox


16. TrajPrism: A Multi-Task Benchmark for Language-Grounded Urban Trajectory Understanding


17. MATRA: Modeling the Attack Surface of Agentic AI Systems – OpenClaw Case Study


18. The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents


19. GESR: A Genetic Programming-Based Symbolic Regression Method with Gene Editing


20. Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents


21. diffGHOST: Diffusion based Generative Hedged Oblivious Synthetic Trajectories


22. Navigating the Sea of LLM Evaluation: Investigating Bias in Toxicity Benchmarks


23. Teacher-Aware Evolution of Heuristic Programs from Learned Optimization Policies


24. Hierarchical Causal Abduction: A Foundation Framework for Explainable Model Predictive Control


25. PRISM: Generation-Time Detection and Mitigation of Secret Leakage in Multi-Agent LLM Pipelines


26. The Open-Box Fallacy: Why AI Deployment Needs a Calibrated Verification Regime


27. Budget-Efficient Automatic Algorithm Design via Code Graph


28. LLARS: Enabling Domain Expert & Developer Collaboration for LLM Prompting, Generation and Evaluation


29. A Resilient Solution for Sewer Overflow Monitoring across Cloud and Edge


30. LLM Jaggedness Unlocks Scientific Creativity


31. Deep Arguing


32. Agent-First Tool API: A Semantic Interface Paradigm for Enterprise AI Agent Systems


33. Bridging Sequence and Graph Structure for Epigenetic Age Prediction


34. A Reflective Storytelling Agent for Older Adults: Integrating Argumentation Schemes and Argument Mining in LLM-Based Personalised Narratives


35. PrimeKG-CL: A Continual Graph Learning Benchmark on Evolving Biomedical Knowledge Graphs


36. Consistency as a Testable Property: Statistical Methods to Evaluate AI Agent Reliability


37. SLASH the Sink: Sharpening Structural Attention Inside LLMs


38. SkillEvolver: Skill Learning as a Meta-Skill


39. ASIA: an Autonomous System Identification Agent


40. Can Agent Benchmarks Support Their Scores? Evidence-Supported Bounds for Interactive-Agent Evaluation


41. LLM4Branch: Large Language Model for Discovering Efficient Branching Policies of Integer Programs


42. GuardAD: Safeguarding Autonomous Driving MLLMs via Markovian Safety Logic


43. Agentic Performance at the Edge: Insights from Benchmarking


44. Agent-X: Full Pipeline Acceleration of On-device AI Agents


45. Autonomous FAIR Digital Objects: From Passive Assertions to Active Knowledge


46. EGL-SCA: Structural Credit Assignment for Co-Evolving Instructions and Tools in Graph Reasoning Agents


47. Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values


48. How Mobile World Model Guides GUI Agents?


49. TMAS: Scaling Test-Time Compute via Multi-Agent Synergy


50. PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents


51. CORTEG: Foundation Models Enable Cross-Modality Representation Transfer from Scalp to Intracranial Brain Recordings


52. EmbodiSkill: Skill-Aware Reflection for Self-Evolving Embodied Agents


53. Verifiable Process Rewards for Agentic Reasoning


54. Positive Alignment: Artificial Intelligence for Human Flourishing


55. AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks


56. IndustryBench: Probing the Industrial Knowledge Boundaries of LLMs


57. E-TCAV: Formalizing Penultimate Proxies for Efficient Concept Based Interpretability


58. Towards Autonomous Railway Operations: A Semi-Hierarchical Deep Reinforcement Learning Approach to the Vehicle Rescheduling Problem


59. SciIntegrity-Bench: A Benchmark for Evaluating Academic Integrity in AI Scientist Systems


60. Hypothesis-Driven Deep Research with Large Language Models: A Structured Methodology for Automated Knowledge Discovery


61. Beyond Autonomy: A Dynamic Tiered AgentRunner Framework for Governable and Resilient Enterprise AI Execution


62. TRACE: Distilling Where It Matters via Token-Routed Self On-Policy Alignment


63. Automated Approach for Solving Infinite-state Polynomial Reachability Games


64. Benchmarking Safety Risks of Knowledge-Intensive Reasoning under Malicious Knowledge Editing


65. FormalRewardBench: A Benchmark for Formal Theorem Proving Reward Models


66. Useful for Exploration, Risky for Precision: Evaluating AI Tools in Academic Research


67. Rethinking Constraint Awareness for Efficient State Embedding of Neural Routing Solver


68. Arcane: An Assertion Reduction Framework through Semantic Clustering and MCTS-Guided Rule Exploring


69. Active Testing of Large Language Models via Approximate Neyman Allocation


70. MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs


71. Strategic Exploitation in LLM Agent Markets: A Simulation Framework for E-Commerce Trust


72. Route by State, Recover from Trace: STAR with Failure-Aware Markov Routing for Multi-Agent Spatiotemporal Reasoning


73. TimeClaw: A Time-Series AI Agent with Exploratory Execution Learning


74. From Single-Step Edit Response to Multi-Step Molecular Optimization


75. Optimizer-Induced Mode Connectivity: From AdamW to Muon


76. Prospective Compression in Human Abstraction Learning


77. Learning the Interaction Prior for Protein-Protein Interaction Prediction: A Model-Agnostic Approach


78. LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models


79. HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution


80. expo: Exploration-prioritized policy optimization via adaptive kl regulation and gaussian curriculum sampling


81. RADAR: Redundancy-Aware Diffusion for Multi-Agent Communication Structure Generation


82. Separate First, Fuse Later: Mitigating Cross-Modal Interference in Audio-Visual LLMs Reasoning with Modality-Specific Chain-of-Thought


83. The Gordian Knot for VLMs: Diagrammatic Knot Reasoning as a Hard Benchmark


84. M2A: Synergizing Mathematical and Agentic Reasoning in Large Language Models


85. Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations


86. When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning


87. Fairness of Explanations in Artificial Intelligence (AI): A Unifying Framework, Axioms, and Future Direction toward Responsible AI


88. The Metacognitive Probe: Five Behavioural Calibration Diagnostics for LLMs


89. Yield Curve Forecasting using Machine Learning and Econometrics: A Comparative Analysis


90. EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents


91. Attribution-based Explanations for Markov Decision Processes


92. Marrying Generative Model of Healthcare Events with Digital Twin of Social Determinants of Health for Disease Reasoning


93. UTS at PsyDefDetect: Multi-Agent Councils and Absence-Based Reasoning for Defense Mechanism Classification


94. Primal-Dual Guided Decoding for Constrained Discrete Diffusion


95. Medical Model Synthesis Architectures: A Case Study


96. Ambig-DS: A Benchmark for Task-Framing Ambiguity in Data-Science Agents


97. Unpredictability dissociates from structured control in language agents


98. Absurd World: A Simple Yet Powerful Method to Absurdify the Real-world for Probing LLM Reasoning Capabilities


99. CodeClinic: Evaluating Automation of Coding Skills for Clinical Reasoning Agents


100. Workspace Optimization: How to Train Your Agent


101. PDEAgent-Bench: A Multi-Metric, Multi-Library Benchmark for PDE Solver Generation


102. TIDE-Bench: Task-Aware and Diagnostic Evaluation of Tool-Integrated Reasoning


103. LLM-Guided Monte Carlo Tree Search over Knowledge Graphs: Composing Mechanistic Explanations for Drug-Disease Pairs


104. Cplus2ASP: Computing Action Language C+ in Answer Set Programming


105. Functional Stable Model Semantics and Answer Set Programming Modulo Theories


106. Weighted Rules under the Stable Model Semantics


107. A Game Theoretic Free Energy Analysis of Higher Order Synergy in Attention Heads of Large Language Models


108. WindINR: Latent-State INR for Fast Local Wind Query and Correction in Complex Terrain


109. EpiGraph: A Knowledge Graph and Benchmark for Evidence-Intensive Reasoning in Epilepsy


110. Don’t Click That: Teaching Web Agents to Resist Deceptive Interfaces


111. VulTriage: Triple-Path Context Augmentation for LLM-Based Vulnerability Detection


112. SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning


113. From Passive Reuse to Active Reasoning: Grounding Large Language Models for Neuro-Symbolic Experience Replay


114. Strategic commitments shape collective cybersecurity under AI inequality


115. Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning


116. Do Linear Probes Generalize Better in Persona Coordinates?


117. NEXUS: Continual Learning of Symbolic Constraints for Safe and Robust Embodied Planning


118. Explainable Knowledge Tracing via Probabilistic Embeddings and Pattern-based Reasoning


119. Towards a Virtual Neuroscientist: Autonomous Neuroimaging Analysis via Multi-Agent Collaboration


120. Position: Avoid Overstretching LLMs for every Enterprise Task


121. The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?


122. CHAINTRIX: A multi-pipeline LLM-augmented framework for automated smart-contract security auditing


123. Dsat: A Native SAT Solver for Discrete Logic


124. SKG-VLA: Scene Knowledge Graph Priors for Structured Scene Semantics and Multimodal Reasoning for Decision Making


125. Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation


126. How LLMs Are Persuaded: A Few Attention Heads, Rerouted


127. Beyond ESG Scores: Learning Dynamic Constraints for Sequential Portfolio Optimization


128. Beyond Accuracy: Evaluating Strategy Diversity in LLM Mathematical Reasoning


129. PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning


130. A Prompt-Aware Structuring Framework for Reliable Reuse of AI-Generated Content in the Agentic Web


131. EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium


132. Towards Conversational Medical AI with Eyes, Ears and a Voice


133. Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding


134. SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning


135. How Much is Brain Data Worth for Machine Learning?


136. Learning the Preferences of a Learning Agent


137. The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations


138. Evidence Over Plans: Online Trajectory Verification for Skill Distillation


139. Emergent Semantic Role Understanding in Language Models


140. Agentic MIP Research: Accelerated Constraint Handler Generation


141. Open Ontologies: Tool-Augmented Ontology Engineering with Stable Matching Alignment


142. CIVeX: Causal Intervention Verification for Language Agents


143. FORTIS: Benchmarking Over-Privilege in Agent Skills


144. Do LLMs Experience an Internal Polylogue? Investigating Reasoning through the Lens of Personas


145. BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models


146. MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments


147. Data-driven Circuit Discovery for Interpretability of Language Models


148. When (and How) to Trust the Expert: Diagnosing Query-Time Expert-Guided Reinforcement Learning


149. Token Economics for LLM Agents: A Dual-View Study from Computing and Economics


150. Constant-Target Energy Matching: A Unified Framework for Continuous and Discrete Density Estimation


151. CauSim: Scaling Causal Reasoning with Increasingly Complex Causal Simulators


152. Containment Verification: AI Safety Guarantees Independent of Alignment


153. UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence


154. SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks


155. CATO: Charted Attention for Neural PDE Operators


156. Re$^2$Math: Benchmarking Theorem Retrieval in Research-Level Mathematics


157. Sufficient conditions for a Heuristic Rating Estimation Method application


158. Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization


159. Latency Analysis and Optimization of Alpamayo 1 via Efficient Trajectory Generation


160. Agentic AI Scientists Are Not Built For Autonomous Scientific Discovery


161. MDGYM: Benchmarking AI Agents on Molecular Simulations


162. Can We Formally Verify Neural PDE Surrogates? SMT Compilation of Small Fourier Neural Operators


163. Self-ReSET: Learning to Self-Recover from Unsafe Reasoning Trajectories


164. PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting


165. Internalizing Safety Understanding in Large Reasoning Models via Verification


166. Forge: Quality-Aware Reinforcement Learning for NP-Hard Optimization in LLMs


167. OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces


168. Ace-Skill: Bootstrapping Multimodal Agents with Prioritized and Clustered Evolution


169. M$^3$: Reframing Training Measures for Discretized Physical Simulations


170. SynerDiff: Synergetic Continuous Batching for Fast and Parallel Diffusion Model Inference


171. FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences


172. When Agents Overtrust Environmental Evidence: An Extensible Agentic Framework for Benchmarking Evidence-Grounding Defects in LLM Agents


173. Mental Health AI Safety Claims Must Preserve Temporal Evidence


174. How You Begin is How You Reason: Driving Exploration in RLVR via Prefix-Tuned Priors


175. Mirror, Mirror on the Wall: Can VLM Agents Tell Who They Are at All?


176. Not All Turns Matter: Credit Assignment for Multi-Turn Jailbreaking


177. Reasoning Compression with Mixed-Policy Distillation


178. EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems


179. From Holo Pockets to Electron Density: GPT-style Drug Design with Density


180. AHD Agent: Agentic Reinforcement Learning for Automatic Heuristic Design


181. Value-Decomposed Reinforcement Learning Framework for Taxiway Routing with Hierarchical Conflict-Aware Observations


182. Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents


183. Bias by Necessity: Impossibility Theorems for Sequential Processing with Convergent AI and Human Validation


184. When Can Human-AI Teams Outperform Individuals? Tight Bounds with Impossibility Guarantees


185. AgentPSO: Evolving Agent Reasoning Skill via Multi-agent Particle Swarm Optimization


186. RewardHarness: Self-Evolving Agentic Post-Training


187. MBP-KT: Learning Global Collaborative Information from Meta-Behavioral Pattern for Enhanced Knowledge Tracing


188. SkillMaster: Toward Autonomous Skill Mastery in LLM Agents


189. Reconciling Consistency-Based Diagnosis with Actual-Causality-Based Explanations


190. Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs


191. MIND-Skill: Quality-Guaranteed Skill Generation via Multi-Agent Induction and Deduction


192. C2L-Net: A Data-Driven Model for State-of-Charge Estimation of Lithium-Ion Batteries During Discharge


193. DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules


194. Generalization Bounds of Emergent Communications for Agentic AI Networking


195. The Echo Amplifies the Knowledge: Somatic Marker Analogues in Language Models via Emotion Vector Re-Injection


196. What Will Happen Next: Large Models-Driven Deduction for Emergency Instances


197. Biological Plausibility and Representational Alignment of Feedback Alignment in Convolutional Networks


198. Why Retrying Fails: Context Contamination in LLM Agent Pipelines


199. Evaluating Developmental Cognition Capabilities of LLMs


200. Log analysis is necessary for credible evaluation of AI agents


201. Human-Inspired Memory Architecture for LLM Agents


202. Human-LLM Dialogue Improves Diagnostic Accuracy in Emergency Care


203. Results and Retrospective Analysis of the CODS 2025 AssetOpsBench Challenge


204. OracleTSC: Oracle-Informed Reward Hurdle and Uncertainty Regularization for Traffic Signal Control


205. Latent Personality Alignment: Improving Harmlessness Without Mentioning Harms


206. AI-Care: A Conversational Agentic System for Task Coordination in Alzheimer’s Disease Care


207. Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models


208. Behavioral Determinants of Deployed AI Agents in Social Networks: A Multi-Factor Study of Personality, Model, and Guardrail Specification


209. LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification


210. Measuring What Matters: Benchmarking Generative, Multimodal, and Agentic AI in Healthcare


211. The Attacker in the Mirror: Breaking Self-Consistency in Safety via Anchored Bipolicy Self-Play


212. Alignment as Jurisprudence


213. Political Plasticity: An Analysis of Ideological Adaptability in Large Language Models


214. Playing games with knowledge: AI-Induced delusions need game theoretic interventions


215. Belief or Circuitry? Causal Evidence for In-Context Graph Learning


216. CoCoDA: Co-evolving Compositional DAG for Tool-Augmented Agents


217. PLACO: A Multi-Stage Framework for Cost-Effective Performance in Human-AI Teams


218. SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents


219. MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs


220. On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective


221. Embeddings for Preferences, Not Semantics


222. Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria


223. Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction


224. Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits


225. ELF: Embedded Language Flows


226. Variational Inference for Lévy Process-Driven SDEs via Neural Tilting


227. Confidence-Guided Diffusion Augmentation for Enhanced Bangla Compound Character Recognition


228. Engineering Robustness into Personal Agents with the AI Workflow Store


229. DataMaster: Towards Autonomous Data Engineering for Machine Learning


230. Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why


231. Shields to Guarantee Probabilistic Safety in MDPs


232. LoKA: Low-precision Kernel Applications for Recommendation Models At Scale


233. AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents


234. CADBench: A Multimodal Benchmark for AI-Assisted CAD Program Generation


235. Attractor-Vascular Coupling Theory: Formal Grounding and Empirical Validation for AAMI-Standard Cuffless Blood Pressure Estimation from Smartphone Photoplethysmography


236. BEACON: A Multimodal Dataset for Learning Behavioral Fingerprints from Gameplay Data


237. Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?


238. Training-Free Cultural Alignment of Large Language Models via Persona Disagreement


239. Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories


240. MMVIAD: Multi-view Multi-task Video Understanding for Industrial Anomaly Detection


241. SLIM: Sparse Latent Steering for Interpretable and Property-Directed LLM-Based Molecular Editing


242. ALAM: Algebraically Consistent Latent Transitions for Vision-Language-Action Models


243. Policy Gradient Methods for Non-Markovian Reinforcement Learning


244. Switching-Geometry Analysis of Deflated Q-Value Iteration


245. Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights


246. PhyGround: Benchmarking Physical Reasoning in Generative World Models


247. The Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studies


248. Can You Keep a Secret? Involuntary Information Leakage in Language Model Writing


249. Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenizatio


250. Towards a Large Language-Vision Question Answering Model for MSTAR Automatic Target Recognition


251. MPerS: Dynamic MLLM MixExperts Perception-Guided Remote Sensing Scene Segmentation


252. Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning


253. Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization


254. GridProbe: Posterior-Probing for Adaptive Test-Time Compute in Long-Video VLMs


255. Provable Sparse Inversion and Token Relabel Enhanced One-shot Federated Learning with ViTs


256. Geospatial-Temporal Sensemaking of Remote Sensing Activity Detections with Multimodal Large Language Model


257. iPay: Integrated Payment Action Recognition via Multimodal Networks and Adaptive Spatial Prior Learning


258. AllocMV: Optimal Resource Allocation for Music Video Generation via Structured Persistent State


259. An Uncertainty-Aware Resilience Micro-Agent for Causal Observability in the Computing Continuum


260. Why Low-Resource NLP Needs More Than Cross-Lingual Transfer: Lessons Learned from Luxembourgish


261. The Bystander Effect in Multi-Agent Reasoning: Quantifying Cognitive Loafing in Collaborative Interactions


262. Is Data Shapley Not Better than Random in Data Selection? Ask NASH


263. Step Rejection Fine-Tuning: A Practical Distillation Recipe


264. Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions


265. bViT: Investigating Single-Block Recurrence in Vision Transformers for Image Recognition


266. When Can Digital Personas Reliably Approximate Human Survey Findings?


267. Active Learning for Gaussian Process Regression Under Self-Induced Boltzmann Weights


268. A Recursive Decomposition Framework for Causal Structure Learning in the Presence of Latent Variables


269. LLaVA-CKD: Bottom-Up Cascaded Knowledge Distillation for Vision-Language Models


270. Towards Understanding Continual Factual Knowledge Acquisition of Language Models: From Theory to Algorithm


271. Intrinsic Guardrails: How Semantic Geometry of Personality Interacts with Emergent Misalignment in LLMs


272. Interpretable Coreference Resolution Evaluation Using Explicit Semantics


273. Re-Triggering Safeguards within LLMs for Jailbreak Detection


274. Measuring Embedding Sensitivity to Authorial Style in French: Comparing Literary Texts with Language Model Rewritings


275. Fairness vs Performance: Characterizing the Pareto Frontier of Algorithmic Decision Systems


276. CrackMeBench: Binary Reverse Engineering for Agents


277. An agentic framework for gravitational-wave counterpart association in the multi-messenger era


278. Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing


279. SenseBench: A Benchmark for Remote Sensing Low-Level Visual Perception and Description in Large Vision-Language Models


280. Acceptance Cards:A Four-Diagnostic Standard for Safe Fine-Tuning Defense Claims


281. ThreatCore: A Benchmark for Explicit and Implicit Threat Detection


282. HH-SAE: Discovering and Steering Hierarchical Knowledge of Complex Manifolds


283. DuetFair: Coupling Inter- and Intra-Subgroup Robustness for Fair Medical Image Segmentation


284. Infinite Mask Diffusion for Few-Step Distillation


285. SoK: A Systematic Bidirectional Literature Review of AI & DLT Convergence


286. CMKL: Modality-Aware Continual Learning for Evolving Biomedical Knowledge Graphs


287. Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data


288. Multi-layer attentive probing improves transfer of audio representations for bioacoustics


289. DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning


290. Formally Verifying Analog Neural Networks Under Process Variations Using Polynomial Zonotopes


291. Cavity-Enhanced Collective Quantum Processing with Polarization-Encoded Qubits


292. Statistical Model Checking of the Keynes+Schumpeter Model: A Transient Sensitivity Analysis of a Macroeconomic ABM


293. StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs


294. Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation


295. Physical probes expose and alleviate chemical-environment collapse in molecular representations


296. CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving


297. Toward an Engineering of Science: Rebalancing Generation and Verification in the Age of AI


298. Can Language Models Analyze Data? Evaluating Large Language Models for Question Answering over Datasets


299. Every finite group admits a just finite presentation


300. AnomalyClaw: A Universal Visual Anomaly Detection Agent via Tool-Grounded Refutation


301. Phoenix-VL 1.5 Medium Technical Report


302. RW-Post: Auditable Evidence-Grounded Multimodal Fact-Checking in the Wild


303. Portable Active Learning for Object Detection


304. EvoStreaming: Your Offline Video Model Is a Natively Streaming Assistant


305. PowerStep: Memory-Efficient Adaptive Optimization via $\ell_p$-Norm Steepest Descent


306. SCALAR: A Neurosymbolic Framework for Automated Conjecture and Reasoning in Quantum Circuit Analysis


307. Relations Are Channels: Knowledge Graph Embedding via Kraus Decompositions


308. Active Tabular Augmentation via Policy-Guided Diffusion Inpainting


309. Qwen Goes Brrr: Off-the-Shelf RAG for Ukrainian Multi-Domain Document Understanding


310. Robust Probabilistic Shielding for Safe Offline Reinforcement Learning


311. LeapTS: Rethinking Time Series Forecasting as Adaptive Multi-Horizon Scheduling


312. Generative AI Fuels Solo Entrepreneurship, but Teams Still Lead at the Top


313. Drum Synthesis from Expressive Drum Grids via Neural Audio Codecs


314. DP-LAC: Lightweight Adaptive Clipping for Differentially Private Federated Fine-tuning of Language Models


315. MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading


316. A Cold Diffusion Approach for Percussive Dereverberation


317. Knowledge Poisoning Attacks on Medical Multi-Modal Retrieval-Augmented Generation


318. When Normality Shifts: Risk-Aware Test-Time Adaptation for Unsupervised Tabular Anomaly Detection


319. When Does Non-Uniform Replay Matter in Reinforcement Learning?


320. To Redact, or not to Redact? A Local LLM Approach to Deliberative Process Privilege Classification


321. HeteroGenManip: Generalizable Manipulation For Heterogeneous Object Interactions


322. Empty SPACE: Cross-Attention Sparsity for Concept Erasure in Diffusion Models


323. ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design



325. DynGhost: Temporally-Modelled Transformer for Dynamic Ghost Imaging with Quantum Detectors


326. Developing a foundation model for high-resolution remote sensing data of the Netherlands


327. A Comparative Study of Machine Learning and Deep Learning for Out-of-Distribution Detection


328. One-Step Graph-Structured Neural Flows for Irregular Multivariate Time Series Classification


329. MTA-RL: Robust Urban Driving via Multi-modal Transformer-based 3D Affordances and Reinforcement Learning


330. When Prompts Become Payloads: A Framework for Mitigating SQL Injection Attacks in Large Language Model-Driven Applications


331. When Reviews Disagree: Fine-Grained Contradiction Analysis in Scientific Peer Reviews


332. Task-Agnostic Noisy Label Detection via Standardized Loss Aggregation


333. Coarsening Linear Non-Gaussian Causal Models with Cycles


334. Scaling Vision Models Does Not Consistently Improve Localisation-Based Explanation Quality


335. Explainability of Recurrent Neural Networks for Enhancing P300-based Brain-Computer Interfaces


336. MicroWorld: Empowering Multimodal Large Language Models to Bridge the Microscopic Domain Gap with Multimodal Attribute Graph


337. Think as Needed: Geometry-Driven Adaptive Perception for Autonomous Driving


338. CFSPMNet: Cross-subject Fourier-guided Spatial-Patch Mamba Network for EEG Motor Imagery Decoding in Stroke Patients


339. ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models


340. HYPERPOSE: Hyperbolic Kinematic Phase-Space Attention for 3D Human Pose Estimation


341. Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs


342. PoDAR: Power-Disentangled Audio Representation for Generative Modeling


343. Metis: Learning to Jailbreak LLMs via Self-Evolving Metacognitive Policy Optimization


344. NCO: A Versatile Plug-in for Handling Negative Constraints in Decoding


345. Not-So-Strange Love: Language Models and Generative Linguistic Theories are More Compatible than They Appear


346. Swarm Skills: A Portable, Self-Evolving Multi-Agent System Specification for Coordination Engineering


347. Guided Streaming Stochastic Interpolant Policy


348. Rethinking Loss Reweighting for Imbalance Learning as an Inverse Problem: A Neural Collapse Point of View


349. Adaptive Action Chunking via Multi-Chunk Q Value Estimation


350. Personalizing LLMs with Binary Feedback: A Preference-Corrected Optimization Framework


351. Bridging the Cognitive Gap: A Unified Memory Paradigm for 6G Agentic AI-RAN


352. Speech-based Psychological Crisis Assessment using LLMs


353. Medical Incident Causal Factors and Preventive Measures Generation Using Tag-based Example Selection in Few-shot Learning


354. The two clocks and the innovation window: When and how generative models learn rules


355. Combining Mechanical and Agentic Specification Inference for Move


356. Continual Harness: Online Adaptation for Self-Improving Foundation Agents


357. Attention Drift: What Autoregressive Speculative Decoding Models Learn


358. Geometric 4D Stitching for Grounded 4D Generation


359. Yeti: A compact protein structure tokenizer for reconstruction and multi-modal generation


360. GLiNER2-PII: A Multilingual Model for Personally Identifiable Information Extraction


361. HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation


362. G-Zero: Self-Play for Open-Ended Generation from Zero Data


363. SDTalk: Structured Facial Priors and Dual-Branch Motion Fields for Generalizable Gaussian Talking Head Synthesis


364. Novel GPU Boruta algorithms for feature selection from high-dimensional data


365. PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning


366. Team-Based Self-Play With Dual Adaptive Weighting for Fine-Tuning LLMs


367. Verifier-Free RL for LLMs via Intrinsic Gradient-Norm Reward


368. NaiAD: Initiate Data-Driven Research for LLM Advertising


369. Position: Academic Conferences are Potentially Facing Denominator Gaming Caused by Fully Automated Scientific Agents


370. Voice Biomarkers for Depression and Anxiety


371. Rethinking Random Transformers as Adaptive Sequence Smoothers for Sleep Staging


372. Hyperbolic Distillation: Geometry-Guided Cross-Modal Transfer for Robust 3D Object Detection


373. Pseudo-Deliberation in Language Models: When Reasoning Fails to Align Values and Actions


374. The Geometric Wall: Manifold Structure Predicts Layerwise Sparse Autoencoder Scaling Laws


375. The Cartesian Shortcut: Re-evaluate Vision Reasoning in Polar Coordinate Space


376. Key-Value Means


377. EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding


378. Intervention-Based Time Series Causal Discovery via Simulator-Generated Interventional Distributions


379. Continuous Latent Contexts Enable Efficient Online Learning in Transformers


380. Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents


381. UFO: A Unified Flow-Oriented Framework for Robust Continual Graph Learning


382. Flag Varieties: A Geometric Framework for Deep Network Alignment


383. MoPO: Incorporating Motion Prior for Occluded Human Mesh Recovery


384. Probing Routing-Conditional Calibration in Attention-Residual Transformers


385. ChladniSonify: A Visual-Acoustic Mapping Method for Chladni Patterns in New Media Art Creation


386. Free Energy Manifold: Score-Based Inference for Hybrid Bayesian Networks


387. Fashion Florence: Fine-Tuning Florence-2 for Structured Fashion Attribute Extraction


388. Pretraining large language models with MXFP4


389. CalBench: Evaluating Coordination-Privacy Trade-offs in Multi-Agent LLMs


390. Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning


391. LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models


392. Insight: Enhancing Mobile Accessibility for Blind and Visually Impaired Users with LLMs


393. CrossVL: Complexity-Aware Feature Routing and Paired Curriculum for Cross-View Vision-Language Detection


394. Multi-Tier Labeling and Physics-Informed Learning for Orbital Anomaly Detection at Scale


395. Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution


396. EvoPref: Multi-Objective Evolutionary Optimization Discovers Diverse LLM Alignments Beyond Gradient Descent


397. Exploitation Without Deception: Dark Triad Feature Steering Reveals Separable Antisocial Circuits in Language Models


398. WISTERIA: Learning Clinical Representations from Noisy Supervision via Multi-View Consistency in Electronic Health Records



400. Sequential Feature Selection for Efficient Landslide Segmentation from Multi-Spectral Data


401. Entropy-informed Decoding: Adaptive Information-Driven Branching


402. TIDES: Implicit Time-Awareness in Selective State Space Models


403. The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods


404. KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving


405. Trajectory Supervision for Continual Tool-Use Learning in LLMs


406. One for All: A Non-Linear Transformer can Enable Cross-Domain Generalization for In-Context Reinforcement Learning


407. Security Risks in Tool-Enabled AI Agents: A Systematic Analysis of Privileged Execution Environments


408. Distilling 3D Spatial Reasoning into a Lightweight Vision-Language Model with CoT


409. Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon


410. Adaptive Data Harvesting for Efficient Neural Network Learning with Universal Constraints


411. Do multimodal models imagine electric sheep?


412. Learning Unified Representations of Normalcy for Time Series Anomaly Detection


413. MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring


414. DeepTumorVQA: A Hierarchical 3D CT Benchmark for Stage-Wise Evaluation of Medical VLMs and Tool-Augmented Agents


415. ChaosNetBench: Benchmarking Spatio-Temporal Graph Neural Networks on Chaotic Lattice Dynamics


416. S2P-Net: A Spectral-Spatial Polar Network for Rotation-Invariant Object Recognition in Low-Data Regimes


417. Rethinking Evaluation of Multiple Sclerosis (MS) Lesion Segmentation Models


418. Learning Multi-Indicator Weights for Data Selection: A Joint Task-Model Adaptation Framework with Efficient Proxies


419. Causal Parametric Drift Simulation: A Digital Twin Framework for Classifier Robustness Evaluation


420. MedMeta: A Benchmark for LLMs in Synthesizing Meta-Analysis Conclusion from Medical Studies


421. RDEx-CASK: Cauchy Mutation, Archive, and Stagnation Kick for RDEx-CSOP


422. Adaptive DNN Partitioning and Offloading in Heterogeneous Edge-Cloud Continuum


423. Any2Any 3D Diffusion Models with Knowledge Transfer: A Radiotherapy Planning Study


424. SmartEval: A Benchmark for Evaluating LLM-Generated Smart Contracts from Natural Language Specifications


425. Efficient Ensemble Selection from Binary and Pairwise Feedback


426. CLR-voyance: Reinforcing Open-Ended Reasoning for Inpatient Clinical Decision Support with Outcome-Aware Rubrics


427. Biosignal Fingerprinting: A Cross-Modal PPG-ECG Foundation Model


428. KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation


429. PhysHanDI: Physics-Based Reconstruction of Hand-Deformable Object Interactions


430. TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM


431. Governing AI-Assisted Security Operations: A Design Science Framework for Operational Decision Support


432. Assessment of RAG and Fine-Tuning for Industrial Question-Answering-Applications


433. Mixture of Layers with Hybrid Attention


434. Position: AI Security Policy Should Target Systems, Not Models


435. Hidden Error Awareness in Chain-of-Thought Reasoning: The Signal Is Diagnostic, Not Causal


436. Spectral Transformer Neural Processes


437. LASSA Architecture-Based Autonomous Fault-Tolerant Control of Unmanned Underwater Vehicles


438. APCD: Adaptive Path-Contrastive Decoding for Reliable Large Language Model Generation


439. CTQWformer: A CTQW-based Transformer for Graph Classification


440. A Cognitively Grounded Bayesian Framework for Misinformation Susceptibility


441. Outlier-Robust Diffusion Solvers for Inverse Problems


442. Align and Shine: Building High-Quality Sentence-Aligned Corpora for Multilingual Text Simplification


443. When Few Steps Are Enough: Training-Free Acceleration of Identity-Preserved Generation


444. RAwR: Role-Aware Rewiring via Approximate Equitable Partition


445. SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation


446. Key Coverage Matters: Semi-Structured Extraction of OCR Clinical Reports


447. Evading Visual Aphasia: Contrastive Adaptive Semantic Token Pruning for Vision-Language Models


448. AtteConDA: Attention-Based Conflict Suppression in Multi-Condition Diffusion Models and Synthetic Data Augmentation


449. Relational Retrieval: Leveraging Known-Novel Interactions for Generalized Category Discovery


450. RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models


451. Sparsity Moves Computation: How FFN Architecture Reshapes Attention in Small Transformers


452. Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech


453. LiteMedCoT-VL: Parameter-Efficient Adaptation for Medical Visual Question Answering


454. EduStory: A Unified Framework for Pedagogically-Consistent Multi-Shot STEM Instructional Video Generation


455. From Detection to Recovery: Operational Analysis on LLM Pre-training with 504 GPUs


456. Your Simulation Runs but Solves the Wrong Physics: PDE-Grounded Intent Verification for LLM-Generated Multiphysics Simulation Code


457. Skill-R1: Agent Skill Evolution via Reinforcement Learning


458. HOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on Household Daily Activities


459. RuPLaR : Efficient Latent Compression of LLM Reasoning Chains with Rule-Based Priors From Multi-Step to One-Step


460. Perceptual Asymmetry Between Hue Categories: Evidence from Human Color Categorization


461. The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory


462. Neural Information Causality


463. Teaching Molecular Dynamics to a Non-Autoregressive Ionic Transport Predictor


464. Hierarchical Attention-based Graph Neural Network with Relevance-driven Pruning


465. Neural Cluster First, Route Second: One-Shot Capacitated Vehicle Routing via Differentiable Optimal Transport


466. Micro-Defects Expose Macro-Fakes: Detecting AI-Generated Images via Local Distributional Shifts


467. Towards Effective Theory of LLMs: A Representation Learning Approach


468. MC$^2$: Monte Carlo Correction for Fast Elliptic PDE Solving


469. Semi-Supervised Neural Super-Resolution for Mesh-Based Simulations


470. Memorize Theorems, Not Instances: Probing SFT Generalization through Mathematical Reasoning


471. Beyond Continuity: Challenges of Context Switching in Multi-Turn Dialogue with LLMs


472. Remix the Timbre: Diffusion-Based Style Transfer Across Polyphonic Stems


473. Monocular Biomechanical Tracking of Fingers with Inverse Kinematics to Foundation Models


474. Improving Generalization by Permutation Routing Across Model Copies


475. Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation


476. Sub-JEPA: Subspace Gaussian Regularization for Stable End-to-End World Models


477. Intrinsic Muon: Spectral Optimization on Riemannian Matrix Manifolds


478. Matching Meaning at Scale: Evaluating Semantic Search for 18th-Century Intellectual History through the Case of Locke


479. On Variance Reduction in Learning Mean Flows


480. Towards Robust Sequential Decomposition for Complex Image Editing


481. ProactBench: Beyond What The User Asked For


482. The Art of the Jailbreak: Formulating Jailbreak Attacks for LLM Security Beyond Binary Scoring


483. Detect, Localize, and Explain: Interactive Hierarchical Log Anomaly Analytics with LLM Augmentation


484. The Pokémon Theorem and other Fairness Impossibility Results


485. Flame3D: Zero-shot Compositional Reasoning of 3D Scenes with Agentic Language Models


486. Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability


487. Select-then-differentiate: Solving Bilevel Optimization with Manifold Lower-level Solution Sets


488. RigidFormer: Learning Rigid Dynamics using Transformers


489. DARE: Difficulty-Adaptive Reinforcement Learning with Co-Evolved Difficulty Estimation


490. Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers


491. WavesFM: Hierarchical Representation Learning for Longitudinal Wearable Sensor Waveforms


492. Prediction Bottlenecks Don’t Discover Causal Structure (But Here’s What They Actually Do)


493. WorldSpeech: A Multilingual Speech Corpus from Around the World


494. Revisiting Mixture Policies in Entropy-Regularized Actor-Critic


495. Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan


496. Beyond Self-Play: Hierarchical Reasoning for Continuous Motion in Closed-Loop Traffic Simulation


497. From Traditional Taggers to LLMs: A Comparative Study of POS Tagging for Medieval Romance Languages


498. Internal vs. External: Comparing Deliberation and Evolution for Multi-Agent Constitutional Design


499. A Communication-Theoretic Framework for LLM Agents: Cost-Aware Adaptive Reliability


500. Personalized Alignment Revisited: The Necessity and Sufficiency of User Diversity