전체 AI 논문 - 2026-05-13

1. ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents


2. Reward Hacking in Rubric-Based Reinforcement Learning


3. Towards Affordable Energy: A Gymnasium Environment for Electric Utility Demand-Response Programs


4. CAAFC: Chronological Actionable Automated Fact-Checker for misinformation / non-factual hallucination detection and correction


5. Formalize, Don’t Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers


6. Semantic Reward Collapse and the Preservation of Epistemic Integrity in Adaptive AI Systems


7. ProfiliTable: Profiling-Driven Tabular Data Processing via Agentic Workflows


8. Classifier Context Rot: Monitor Performance Degrades with Context Length


9. $δ$-mem: Efficient Online Memory for Large Language Models


10. Reinforcing VLAs in Task-Agnostic World Models


11. Towards Automated Air Traffic Safety Assessment Around Non-Towered Airports Using Large Language Models


12. LISA: Cognitive Arbitration for Signal-Free Autonomous Intersection Management


13. Executable Agentic Memory for GUI Agent


14. NARA: Anchor-Conditioned Relation-Aware Contextualization of Heterogeneous Geoentities


15. How Useful Is Cross-Domain Generalization for Training LLM Monitors?


16. Missingness-MDPs: Bridging the Theory of Missing Data and POMDPs


17. Why Conclusions Diverge from the Same Observations: Formalizing World-Model Non-Identifiability via an Inference


18. No Action Without a NOD: A Heterogeneous Multi-Agent Architecture for Reliable Service Agents


19. Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems


20. MolDeTox: Evaluating Language Model’s Stepwise Fragment Editing for Molecular Detoxification


21. Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics


22. ALGOGEN: Tool-Generated Verifiable Traces for Reliable Algorithm Visualization


23. MM-OptBench: A Solver-Grounded Benchmark for Multimodal Optimization Modeling


24. BoolXLLM: LLM-Assisted Explainability for Boolean Models


25. Rollout Cards: A Reproducibility Standard for Agent Research


26. To Whom Do Language Models Align? Measuring Principal Hierarchies Under High-Stakes Competing Demands


27. Adaptive Multi-Round Allocation with Stochastic Arrivals


28. Large Language Models as Amortized Pareto-Front Generators for Constrained Bi-Objective Convex Optimization


29. Autonomy and Agency in Agentic AI: Architectural Tactics for Regulated Contexts


30. Intermediate Artifacts as First-Class Citizens: A Data Model for Durable Intermediate Artifacts in Agentic Systems


31. SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory


32. OmniRefine: Alignment-Aware Cooperative Compression for Efficient Omnimodal Large Language Models


33. LLMs and the ZPD



35. BadSKP: Backdoor Attacks on Knowledge Graph-Enhanced LLMs with Soft Prompts


36. Random-Set Graph Neural Networks


37. On the Limitations of Large Language Models for Conceptual Database Modeling


38. Assessing and Mitigating Miscalibration in LLM-Based Social Science Measurement


39. Counterfactual Trace Auditing of LLM Agent Skills


40. From Noise to Diversity: Random Embedding Injection in LLM Reasoning


41. When Simulation Lies: A Sim-to-Real Benchmark and Domain-Randomized RL Recipe for Tool-Use Agents


42. Domain Restriction via Multi SAE Layer Transitions


43. Rethinking Positional Encoding for Neural Vehicle Routing


44. Rethinking Supervision Granularity: Segment-Level Learning for LLM-Based Theorem Proving


45. Toward Modeling Player-Specific Chess Behaviors


46. From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP


47. On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment


48. MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare


49. Automated Reformulation of Robust Optimization via Memory-Augmented Large Language Models


50. Beyond World-Frame Action Heads: Motion-Centric Action Frames for Vision-Language-Action Models


51. Why Users Go There: World Knowledge-Augmented Generative Next POI Recommendation


52. Beyond Inefficiency: Systemic Costs of Incivility in Multi-Agent Monte Carlo Simulations


53. Towards Visually Grounded Multimodal Summarization via Cross-Modal Transformer and Gated Attention


54. When Reasoning Traces Become Performative: Step-Level Evidence that Chain-of-Thought Is an Imperfect Oversight Channel


55. OptArgus: A Multi-Agent System to Detect Hallucinations in LLM-based Optimization Modeling


56. Allegory of the Cave: Measurement-Grounded Vision-Language Learning


57. SafeSteer: A Decoding-level Defense Mechanism for Multimodal Large Language Models


58. Toward Stable Value Alignment: Introducing Independent Modules for Consistent Value Guidance


59. Measuring What Matters Beyond Text: Evaluating Multimodal Summaries by Quality, Alignment, and Diversity


60. Persistent and Conversational Multi-Method Explainability for Trustworthy Financial AI


61. Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion


62. OOM-Free Alpamayo via CPU-GPU Memory Swapping for Vision-Language-Action Models


63. A CAP-like Trilemma for Large Language Models: Correctness, Non-bias, and Utility under Semantic Underdetermination


64. Seirênes: Adversarial Self-Play with Evolving Distractions for LLM Reasoning


65. Can LLM Agents Respond to Disasters? Benchmarking Heterogeneous Geospatial Reasoning in Emergency Operations


66. Nice Fold or Hero Call: Learning Budget-Efficient Thinking for Adaptive Reasoning


67. CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG


68. GAR: Carbon-Aware Routing for LLM Inference via Constrained Optimization


69. Native Explainability for Bayesian Confidence Propagation Neural Networks: A Framework for Trusted Brain-Like AI


70. Dual-Temporal LSTM with Hybrid Attention for Airline Passenger Load Factor Forecasting: Integrating Intra-Flight and Inter-Flight Booking Dynamics


71. Hindsight Hint Distillation: Scaffolded Reasoning for SWE Agents from CoT-free Answers


72. Optimal LTLf Synthesis


73. Read, Grep, and Synthesize: Diagnosing Cross-Domain Seed Exposure for LLM Research Ideation


74. Controllable User Simulation


75. AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration – Learning from Cheap, Optimizing Expensive


76. Hierarchical LLM-Driven Control for HAPS-Assisted UAV Networks: Joint Optimization of Flight and Connectivity


77. Selective Off-Policy Reference Tuning with Plan Guidance


78. The Evaluation Differential: When Frontier AI Models Recognise They Are Being Tested


79. Engagement Process: Rethinking the Temporal Interface of Action and Observation


80. FibQuant: Universal Vector Quantization for Random-Access KV-Cache Compression


81. TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing


82. CAMPA: Efficient and Aligned Multimodal Graph Learning via Decoupled Propagation and Aggregation


83. Breaking $\textit{Winner-Takes-All}$: Cooperative Policy Optimization Improves Diverse LLM Reasoning


84. Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning


85. A Mechanistic Investigation of Supervised Fine Tuning


86. Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry


87. What Do EEG Foundation Models Capture from Human Brain Signals?


88. Attributing Emergence in Million-Agent Systems


89. AcuityBench: Evaluating Clinical Acuity Identification and Uncertainty Alignment


90. Transformer Interpretability from Perspective of Attention and Gradient


91. Revisiting Privacy Preservation in Brain-Computer Interfaces: Conceptual Boundaries, Risk Pathways, and a Protection-Strength Grading Framework


92. LLM-X: A Scalable Negotiation-Oriented Exchange for Communication Among Personal LLM Agents


93. Causal Algorithmic Recourse: Foundations and Methods


94. Causal Bias Detection in Generative Artifical Intelligence


95. CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing


96. CPEMH: An Agentic Framework for Prompt-Driven Behavior Evaluation and Assurance in Foundation-Model Systems for Mental Health Screening


97. Rethinking Evaluation for LLM Hallucination Detection: A Desiderata, A New RAG-based Benchmark, New Insights


98. Constraint-Data-Value-Maximization: Utilizing Data Attribution for Effective Data Pruning in Low-Data Environments


99. LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer?


100. Template-as-Ontology: Configurable Synthetic Data Infrastructure for Cross-Domain Manufacturing AI Validation


101. Unlocking LLM Creativity in Science through Analogical Reasoning


102. The Semantic Training Gap: Ontology-Grounded Tool Architectures for Industrial AI Agent Systems


103. Rethinking LLMOps for Fraud and AML: Building a Compliance-Grade LLM Serving Stack


104. PIVOT: Bridging Planning and Execution in LLM Agents via Trajectory Refinement


105. Do Vision-Language-Models show human-like logical problem-solving capability in point and click puzzle games?


106. Don’t Look at the Numbers: Visual Anchoring Bias and Layer-wise Representation in VLMs


107. The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes


108. OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents


109. RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking


110. EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales


111. A Cascaded Generative Approach for e-Commerce Recommendations


112. AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward


113. Learning, Fast and Slow: Towards LLMs That Adapt Continually


114. Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training


115. OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation


116. KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference


117. Solve the Loop: Attractor Models for Language and Reasoning


118. Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance


119. The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events


120. A Causal Language Modeling Detour Improves Encoder Continued Pretraining


121. Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space


122. Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling


123. OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning


124. Detecting overfitting in Neural Networks during long-horizon grokking using Random Matrix Theory


125. SEMIR: Semantic Minor-Induced Representation Learning on Graphs for Visual Segmentation


126. Scalable Token-Level Hallucination Detection in Large Language Models


127. Trust the Batch, On- or Off-Policy: Adaptive Policy Optimization for RL Post-Training


128. Discrete Flow Matching for Offline-to-Online Reinforcement Learning


129. Agent-Based Post-Hoc Correction of Agricultural Yield Forecasts


130. Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models


131. QAP-Router: Tackling Qubit Routing as Dynamic Quadratic Assignment with Reinforcement Learning


132. A Family of Quaternion-Valued Differential Evolution Algorithms for Numerical Function Optimization


133. MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answering


134. A New Technique for AI Explainability using Feature Association Map


135. BSO: Safety Alignment Is Density Ratio Matching


136. Manifold Sampling via Entropy Maximization


137. EHR-RAGp: Retrieval-Augmented Prototype-Guided Foundation Model for Electronic Health Records


138. Transferable Delay-Aware Reinforcement Learning via Implicit Causal Graph Modeling


139. KAN-CL: Per-Knot Importance Regularization for Continual Learning with Kolmogorov-Arnold Networks


140. PriorZero: Bridging Language Priors and World Models for Decision Making


141. TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching


142. Set-Aggregated Genome Embeddings for Microbiome Abundance Prediction


143. Iterative Audit Convergence in LLM-Managed Multi-Agent Systems: A Case Study in Prompt Engineering Quality Assurance


144. Reconnecting Fragmented Citation Networks with Semantic Augmentation


145. Mind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMs


146. Pretraining Strategies and Scaling for ECG Foundation Models: A Systematic Study


147. Harness Engineering as Categorical Architecture


148. TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning


149. No More, No Less: Task Alignment in Terminal Agents


150. TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion


151. Heterogeneous SoC Integrating an Open-Source Recurrent SNN Accelerator for Neuromorphic Edge Computing on FPGA


152. Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification


153. Not How Many, But Which: Parameter Placement in Low-Rank Adaptation


154. Uncertainty Quantification for LLM-based Code Generation


155. Overtrained, Not Misaligned


156. Mitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decoding


157. DriftXpress: Faster Drifting Models via Projected RKHS Fields


158. A Deep Learning-based Receiver for Asynchronous Grant-Free Random Access in Control-to-Control Networks


159. Premover: Fast Vision-Language-Action Control by Acting Before Instructions Are Complete


160. CIDR: A Large-Scale Industrial Source Code Dataset for Software Engineering Research


161. It’s Not the Size: Harness Design Determines Operational Stability in Small Language Models


162. Disentangled Sparse Representations for Concept-Separated Diffusion Unlearning


163. Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration


164. Property-Level Reconstructability of Agent Decisions: An Anchor-Level Pilot Across Vendor SDK Adapter Regimes


165. The Missing GAP: From Solving Square Jigsaw Puzzles to Handling Real World Archaeological Fragments


166. The Deepfakes We Missed: We Built Detectors for a Threat That Didn’t Arrive


167. Clausal Deletion Backdoors for QBF: a Parameterized Complexity Approach


168. Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction


169. Anomaly-Aware Vision-Language Adapters for Zero-Shot Anomaly Detection


170. Hölder Policy Optimisation


171. Scaling Laws and Tradeoffs in Recurrent Networks of Expressive Neurons


172. Rethink the Role of Neural Decoders in Quantum Error Correction


173. Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation


174. Spectral Vision Transformer for Efficient Tokenization with Limited Data


175. Efficient and Adaptive Human Activity Recognition via LLM Backbones


176. SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces


177. L2P: Unlocking Latent Potential for Pixel Generation


178. CR^2: Cost-Aware Risk-Controlled Routing for Wireless Device-Edge LLM Inference


179. The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures


180. A Transfer Learning Evaluation of Deep Neural Networks for Image Classification


181. High-lift Wing Separation Control via Bayesian Optimization and Deep Reinforcement Learning


182. Cooperative Robotics Reinforced by Collective Perception for Traffic Moderation


183. Beyond Point-wise Neural Collapse: A Topology-Aware Hierarchical Classifier for Class-Incremental Learning


184. AccLock: Unlocking Identity with Heartbeat Using In-Ear Accelerometers


185. Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems


186. Incentivizing Truthfulness and Collaborative Fairness in Bayesian Learning


187. Modulation Consistency-based Contrastive Learning for Self-Supervised Automatic Modulation Classification


188. IPI-proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents Against Indirect Prompt Injection


189. Very Efficient Listwise Multimodal Reranking for Long Documents


190. EvoNav: Evolutionary Reward Function Design for Robot Navigation with Large Language Models


191. Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications


192. GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation


193. Martingale-Consistent Self-Supervised Learning


194. Minimax Rates and Spectral Distillation for Tree Ensembles


195. Trade-offs in Decentralized Agentic AI Discovery Across the Compute Continuum


196. Multi-Timescale Conductance Spiking Networks: A Sparse, Gradient-Trainable Framework with Rich Firing Dynamics for Enhanced Temporal Processing


197. REFNet++: Multi-Task Efficient Fusion of Camera and Radar Sensor Data in Bird’s-Eye Polar View


198. OTT-Vid: Optimal Transport Temporal Token Compression for Video Large Language Models


199. Crash Assessment via Mesh-Based Graph Neural Networks and Physics-Aware Attention


200. Is Monotonic Sampling Necessary in Diffusion Models?


201. Behavioral Integrity Verification for AI Agent Skills


202. Focusable Monocular Depth Estimation


203. DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies


204. CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating


205. A Research Agenda on Agents and Software Engineering: Outcomes from the Rio A2SE Seminar


206. Self-organized MT Direction Maps Emerge from Spatiotemporal Contrastive Optimization


207. Debiased Model-based Representations for Sample-efficient Continuous Control


208. WildRelight: A Real-World Benchmark and Physics-Guided Adaptation for Single-Image Relighting


209. Emergent Communication between Heterogeneous Visual Agents through Decentralized Learning


210. Shaping Zero-Shot Coordination via State Blocking


211. Cochise: A Reference Harness for Autonomous Penetration Testing


212. Evolutionary Task Discovery: Advancing Reasoning Frontiers via Skill Composition and Complexity Scaling


213. Reviving In-domain Fine-tuning Methods for Source-Free Cross-domain Few-shot Learning


214. Weather-Robust Cross-View Geo-Localization via Prototype-Based Semantic Part Discovery


215. Every Bit, Everywhere, All at Once: A Binomial Multibit LLM Watermark


216. Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation


217. Unlocking UML Class Diagram Understanding in Vision Language Models


218. Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization


219. From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation


220. When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models


221. Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information


222. PRISM: A Geometric Risk Bound that Decomposes Drift into Scale, Shape, and Head


223. Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty


224. Keep What Audio Cannot Say: Context-Preserving Token Pruning for Omni-LLMs


225. DiffScore: Text Evaluation Beyond Autoregressive Likelihood


226. EpiCastBench: Datasets and Benchmarks for Multivariate Epidemic Forecasting


227. SoK: Unlearnability and Unlearning for Model Dememorization


228. NexOP: Joint Optimization of NEX-Aware k-space Sampling and Image Reconstruction for Low-Field MRI


229. Three Regimes of Context-Parametric Conflict: A Predictive Framework and Empirical Validation


230. TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles


231. When Looking Is Not Enough: Visual Attention Structure Reveals Hallucination in MLLMs


232. Sharpen Your Flow: Sharpness-Aware Sampling for Flow Matching


233. Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting


234. Efficient and provably convergent end-to-end training of deep neural networks with linear constraints


235. PointGS: Semantic-Consistent Unsupervised 3D Point Cloud Segmentation with 3D Gaussian Splatting


236. A Study on Hidden Layer Distillation for Large Language Model Pre-Training



238. Understanding and Preventing Entropy Collapse in RLVR with On-Policy Entropy Flow Optimization


239. Digital Identity for Agentic Systems: Toward a Portable Authorization Standard for Autonomous Agents


240. Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation


241. Drop the Act: Probe-Filtered RL for Faithful Chain-of-Thought Reasoning


242. SpatialForge: Bootstrapping 3D-Aware Spatial Reasoning from Open-World 2D Images


243. Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models


244. Predictive Maps of Multi-Agent Reasoning: A Successor-Representation Spectrum for LLM Communication Topologies


245. Deep Minds and Shallow Probes


246. Conditional Memory Enhanced Item Representation for Generative Recommendation


247. Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection


248. Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty


249. Diabetic Retinopathy Classification using Downscaling Algorithms and Deep Learning


250. Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer


251. MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification


252. fg-expo: Frontier-guided exploration-prioritized policy optimization via adaptive kl and gaussian curriculum


253. Spatial Adapter: Structured Spatial Decomposition and Closed-Form Covariance for Frozen Predictors


254. Deep Reasoning in General Purpose Agents via Structured Meta-Cognition


255. TRACE: Temporal Routing with Autoregressive Cross-channel Experts for EEG Representation Learning


256. LPDP: Inference-Time Reward Control for Variable-Length DNA Generation with Edit Flows


257. Causal Fairness for Survival Analysis



259. Human-AI Productivity Paradoxes: Modeling the Interplay of Skill, Effort, and AI Assistance


260. Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence


261. Gradient-Free Noise Optimization for Reward Alignment in Generative Models


262. Physics-Informed Teacher-Student Ensemble Learning for Traffic State Estimation with a Varying Speed Limit Scenario


263. Much of Geospatial Web Search Is Beyond Traditional GIS


264. Epistemic Uncertainty for Test-Time Discovery


265. Beyond Similarity Search: Tenure and the Case for Structured Belief State in LLM Memory


266. SOMA: Efficient Multi-turn LLM Serving via Small Language Model


267. Natural Language based Specification and Verification


268. Quantifying Rodda and Graham Gait Classification from 3D Makerless Kinematics derived from a Single-view Video in a Heterogeneous Pediatric Clinical Cohort


269. A Theory of Time-Sensitive Language Generation: Sparse Hallucination Beats Mode Collapse


270. ReAD: Reinforcement-Guided Capability Distillation for Large Language Models


271. Beyond Similarity: Temporal Operator Attention for Time Series Analysis


272. Rethinking external validation for the target population: Capturing patient-level similarity with a generative model


273. Discovery of Interpretable Surrogates via Agentic AI: Application to Gravitational Waves


274. Generative AI for Visualizing Highway Construction Hazards Through Synthetic Images and Temporal Sequences


275. Localization Boosting for Growth Markets: Mitigating Cross-Locale Behavioral Bias in Learning-to-Rank


276. gwBenchmarks: Stress-Testing LLM Agents on High-Precision Gravitational Wave Astronomy


277. DenseTRF: Texture-Aware Unsupervised Representation Adaptation for Surgical Scene Dense Prediction


278. Curriculum Learning-Guided Progressive Distillation in Large Language Models


279. RETUYT-INCO at BEA 2026 Shared Task 2: Meta-prompting in Rubric-based Scoring for German


280. Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning


281. LiBaGS: Lightweight Boundary Gap Synthesis for Targeted Synthetic Data Selection


282. Comment and Control: Hijacking Agentic Workflows via Context-Grounded Evolution


283. ABRA: Agent Benchmark for Radiology Applications


284. Leveraging RAG for Training-Free Alignment of LLMs


285. ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload


286. The Scaling Law of Evaluation Failure: Why Simple Averaging Collapses Under Data Sparsity and Item Difficulty Gaps, and How Item Response Theory Recovers Ground Truth Across Domains


287. Continuous Discovery of Vulnerabilities in LLM Serving Systems with Fuzzing


288. Exploring Token-Space Manipulation in Latent Audio Tokenizers


289. Adversarial SQL Injection Generation with LLM-Based Architectures


290. CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration


291. Muon is Not That Special: Random or Inverted Spectra Work Just as Well


292. Oversmoothing as Representation Degeneracy in Neural Sheaf Diffusion


293. The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models


294. Benchmarking LLM-Based Static Analysis for Secure Smart Contract Development: Reliability, Limitations, and Potential Hybrid Solutions


295. Interpretability Can Be Actionable


296. The Price of Proportional Representation in Temporal Voting


297. Quantifying the Reconstructability of Astrophysical Methods with Large Language Models and Information Theory: A Case Study in Spectral Reconstruction


298. ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV


299. Control Charts for Multi-agent Systems


300. Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training


301. HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series


302. SEVO: Semantic-Enhanced Virtual Observation for Robust VLA Manipulation via Active Illumination and Data-Centric Collection


303. Deploying Self-Supervised Learning for Real Seismic Data Denoising


304. Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs


305. Newton’s Lantern: A Reinforcement Learning Framework for Finetuning AC Power Flow Warm Start Models


306. Enabling Performant and Flexible Model-Internal Observability for LLM Inference


307. ASD-Bench: A Four-Axis Comprehensive Benchmark of AI Models for Autism Spectrum Disorder


308. ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?


309. MCPShield: Content-Aware Attack Detection for LLM Agent Tool-Call Traffic


310. On Problems of Implicit Context Compression for Software Engineering Agents


311. ForceFlow: Learning to Feel and Act via Contact-Driven Flow Matching


312. Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw


313. Read, Extract, Classify: A Tool for Smarter Requirements Engineering


314. Towards Model-Free Learning in Dynamic Population Games: An Application to Karma Economies


315. The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck


316. Sequential Behavioral Watermarking for LLM Agents


317. MambaNetBurst: Direct Byte-level Network Traffic Classification without Tokenization or Pretraining


318. Portable Agent Memory: A Protocol for Cryptographically-Verified Memory Transfer Across Heterogeneous AI Agents


319. An Executable Benchmarking Suite for Tool-Using Agents


320. FragBench: Cross-Session Attacks Hidden in Benign-Looking Fragments


321. From Code-Centric to Intent-Centric Software Engineering: A Reflexive Thematic Analysis of Generative AI, Agentic Systems, and Engineering Accountability


322. SCOPE: Siamese Contrastive Operon Pair Embeddings for Functional Sequence Representation and Classification


323. Trust Region Inverse Reinforcement Learning: Explicit Dual Ascent using Local Policy Updates


324. Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness


325. Simpson’s Paradox in Behavioral Curves: How Aggregation Distorts Parametric Models of User Dynamics


326. DCVD: Dual-Channel Cross-Modal Fusion for Joint Vulnerability Detection and Localization


327. Backbone-Equated Diffusion OOD via Sparse Internal Snapshots


328. LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models


329. When and How to Canonize: A Generalization Perspective


330. RT-Transformer: The Transformer Block as a Spherical State Estimator


331. An Execution-Verified Multi-Language Benchmark for Code Semantic Reasoning


332. DisagMoE: Computation-Communication overlapped MoE Training via Disaggregated AF-Pipe Parallelism


333. The Authorization-Execution Gap Is a Major Safety and Security Problem in Open-World Agents


334. MT-JailBench: A Modular Benchmark for Understanding Multi-Turn Jailbreak Attacks


335. SkillGen: Verified Inference-Time Agent Skill Synthesis


336. Few-Shot Truly Benign DPO Attack for Jailbreaking LLMs


337. Towards Scalable Persistence-Based Topological Optimization


338. Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures


339. Skill Drift Is Contract Violation: Proactive Maintenance for LLM Agent Skill Libraries


340. SURGE: Surrogate Gradient Adaptation in Binary Neural Networks


341. Seeing the Needle in the Haystack: Towards Weakly-Supervised Log Instance Anomaly Localization via Counterfactual Perturbation


342. AESOP: Adversarial Execution-path Selection to Overload Deep Learning Pipelines


343. Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning


344. TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment


345. $ξ$-DPO: Direct Preference Optimization via Ratio Reward Margin


346. LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection


347. PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks


348. Hierarchical Multi-Scale Graph Neural Networks: Scalable Heterophilous Learning with Oversmoothing and Oversquashing Mitigation


349. Vertex-Softmax: Tight Transformer Verification via Exact Softmax Optimization


350. Rotation-Preserving Supervised Fine-Tuning


351. Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models


352. Context-Gated Associative Retrieval: From Theory to Transformers


353. MMTB: Evaluating Terminal Agents on Multimedia-File Tasks


354. Two Hebrew folk meteorological proverbs tested: rainfall on Rosh Chodesh and Shabbat Mevarechim as predictors of monthly precipitation (Israel, 1950-2024)


355. QuIDE: Mastering the Quantized Intelligence Trade-off via Active Optimization


356. Multi-Fidelity Emulation of Atmospheric Correction Coefficients with Physics-Guided Kolmogorov-Arnold Networks


357. Controlled Steering-Based State Preparation for Adversarial-Robust Quantum Machine Learning


358. Continuous Flood Nowcasting in South Asia: A Multi-Sensor Ensemble Remote Sensing Framework for Flood Extent


359. AlphaEarth Satellite Embeddings for Modelling Climate Sensitive Diseases Towards Global Health Resilience


360. Breaking QAOA’s Fixed Target Hamiltonian Barrier: A Fully Connected Quantum Boltzmann Machine via Bilevel Optimization


361. MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media


362. Measuring Accuracy and Energy-to-Solution of Quantum Fine-Tuning of Foundational AI Models


363. Stargazer: A Scalable Model-Fitting Benchmark Environment for AI Agents under Astrophysical Constraints