전체 AI 논문 - 2026-02-13

1. Agentic Test-Time Scaling for WebAgents


2. CM2: Reinforcement Learning with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool Use


3. Think like a Scientist: Physics-guided LLM Agent for Equation Discovery


4. “Sorry, I Didn’t Catch That”: How Speech Models Miss What Matters Most


5. SAM3-LiteText: An Anatomical Study of the SAM3 Text Encoder for Efficient Vision-Language Segmentation


6. Pedagogically-Inspired Data Synthesis for Language Model Knowledge Distillation


7. Statistical Parsing for Logical Information Retrieval


8. Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision


9. GPT-4o Lacks Core Features of Theory of Mind


10. Seq2Seq2Seq: Lossless Data Compression via Discrete Latent Transformers and Reinforcement Learning


11. STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction


12. Value Alignment Tax: Measuring Value Trade-offs in LLM Alignment


13. Neutral Prompts, Non-Neutral People: Quantifying Gender and Skin-Tone Bias in Gemini Flash 2.5 Image and GPT Image 1.5


14. HLA: Hadamard Linear Attention


15. Commencing-Student Enrolment Forecasting Under Data Sparsity with Time Series Foundation Models


16. Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty


17. The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context


18. Differentiable Modal Logic for Multi-Agent Diagnosis, Orchestration and Communication


19. Tiny Recursive Reasoning with Mamba-2 Attention Hybrid



21. Multi UAVs Preflight Planning in a Shared and Dynamic Airspace


22. InjectRBP: Steering Large Language Model Reasoning Behavior via Pattern Injection


23. CSEval: A Framework for Evaluating Clinical Semantics in Text-to-Image Generation


24. Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments


25. MEME: Modeling the Evolutionary Modes of Financial Markets


26. AlphaPROBE: Alpha Mining via Principled Retrieval and On-graph biased evolution


27. When Should LLMs Be Less Specific? Selective Abstraction for Reliable Long-Form Text Generation


28. From Atoms to Trees: Building a Structured Feature Forest with Hierarchical Sparse Autoencoders


29. Intelligent AI Delegation


30. Talk2DM: Enabling Natural Language Querying and Commonsense Reasoning for Vehicle-Road-Cloud Integrated Dynamic Maps with Large Language Models


31. Prototype Transformer: Towards Language Model Architectures Interpretable by Design


32. Revis: Sparse Latent Steering to Mitigate Object Hallucination in Large Vision-Language Models


33. Predicting LLM Output Length via Entropy-Guided Representations


34. PuYun-LDM: A Latent Diffusion Model for High-Resolution Ensemble Weather Forecasts


35. Hi-SAM: A Hierarchical Structure-Aware Multi-modal Framework for Large-Scale Recommendation


36. Detecting RLVR Training Data via Structural Convergence of Reasoning


37. Beyond End-to-End Video Models: An LLM-Based Multi-Agent System for Educational Video Generation


38. FlowMind: Execute-Summarize for Structured Workflow Generation from LLM Reasoning


39. RELATE: A Reinforcement Learning-Enhanced LLM Framework for Advertising Text Generation


40. How to Optimize Multispecies Set Predictions in Presence-Absence Modeling ?


41. TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents


42. AIR: Improving Agent Safety through Incident Response


43. Text2GQL-Bench: A Text to Graph Query Language Benchmark [Experiment, Analysis & Benchmark]


44. Cross-Architecture Model Diffing with Crosscoders: Unsupervised Discovery of Differences Between LLMs


45. Beyond Parameter Arithmetic: Sparse Complementary Fusion for Distribution-Aware Model Merging


46. ThinkRouter: Efficient Reasoning via Routing Thinking between Latent and Discrete Spaces


47. Beyond Pixels: Vector-to-Graph Transformation for Reliable Schematic Auditing


48. Right for the Wrong Reasons: Epistemic Regret Minimization for Causal Rung Collapse in LLMs


49. Benchmark Health Index: A Systematic Framework for Benchmarking the Benchmarks of LLMs


50. PhyNiKCE: A Neurosymbolic Agentic Framework for Autonomous Computational Fluid Dynamics


51. Quark Medical Alignment: A Holistic Multi-Dimensional Alignment and Collaborative Optimization Paradigm


52. Do MLLMs Really Understand Space? A Mathematical Reasoning Evaluation


53. Neuro-Symbolic Multitasking: A Unified Framework for Discovering Generalizable Solutions to PDE Families


54. When Agents Disagree With Themselves: Measuring Behavioral Consistency in LLM-Based Agents


55. scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery


56. MAPLE: Modality-Aware Post-training and Learning Ecosystem


57. The Five Ws of Multi-Agent Communication: Who Talks to Whom, When, What, and Why – A Survey from MARL to Emergent Language and LLMs


58. Learning to Configure Agentic AI Systems


59. SemaPop: Semantic-Persona Conditioned Population Synthesis


60. Budget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool Use


61. CausalAgent: A Conversational Multi-Agent System for End-to-End Causal Inference


62. Human-Inspired Continuous Learning of Internal Reasoning Processes: Learning How to Think for Adaptive AI Systems


63. AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems


64. Credit Where It is Due: Cross-Modality Connectivity Drives Precise Reinforcement Learning for MLLM Reasoning


65. Distributionally Robust Cooperative Multi-Agent Reinforcement Learning via Robust Value Factorization


66. TRACER: Trajectory Risk Aggregation for Critical Episodes in Agentic Reasoning


67. GHOST: Unmasking Phantom States in Mamba2 via Grouped Hidden-state Output-aware Selection & Truncation


68. Causal-JEPA: Learning World Models through Object-Level Latent Interventions


69. ReplicatorBench: Benchmarking LLM Agents for Replicability in Social and Behavioral Sciences


70. Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization


71. AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition


72. Bi-Level Prompt Optimization for Multimodal LLM-as-a-Judge


73. Dissecting Subjectivity and the “Ground Truth” Illusion in Data Annotation


74. The PBSAI Governance Ecosystem: A Multi-Agent AI Reference Architecture for Securing Enterprise AI Estates


75. Voxtral Realtime


76. On Decision-Valued Maps and Representational Dependence


77. Latent Generative Solvers for Generalizable Long-Term Physics Simulation


78. Explaining AI Without Code: A User Study on Explainable AI


79. Scaling Verification Can Be More Effective than Scaling Policy Learning for Vision-Language-Action Alignment


80. UniT: Unified Multimodal Chain-of-Thought Test-time Scaling


81. AttentionRetriever: Attention Layers are Secretly Long Document Retrievers


82. Creative Ownership in the Age of AI


83. On the implicit regularization of Langevin dynamics with projected noise


84. A technical curriculum on language-oriented artificial intelligence in translation and specialised communication


85. ExtractBench: A Benchmark and Evaluation Methodology for Complex Structured Extraction


86. Intrinsic-Energy Joint Embedding Predictive Architectures Induce Quasimetric Spaces


87. Olmix: A Framework for Data Mixing Throughout LM Development


88. Energy-Aware Spike Budgeting for Continual Learning in Spiking Neural Networks for Neuromorphic Vision


89. Bandit Learning in Matching Markets with Interviews


90. Towards On-Policy SFT: Distribution Discriminant Theory and its Applications in LLM Training


91. The Observer Effect in World Models: Invasive Adaptation Corrupts Latent Physics


92. VIRENA: Virtual Arena for Research, Education, and Democratic Innovation


93. DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing


94. Visual Reasoning Benchmark: Evaluating Multimodal LLMs on Classroom-Authentic Visual Problems from Primary Education


95. SAGEO Arena: A Realistic Environment for Evaluating Search-Augmented Generative Engine Optimization


96. 3DGSNav: Enhancing Vision-Language Model Reasoning for Object Navigation via Active 3D Gaussian Splatting


97. dVoting: Fast Voting for dLLMs


98. On the Adoption of AI Coding Agents in Open-source Android and iOS Development


99. Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation


100. Meta-Sel: Efficient Demonstration Selection for In-Context Learning via Supervised Meta-Learning


101. KAN-FIF: Spline-Parameterized Lightweight Physics-based Tropical Cyclone Estimation on Meteorological Satellite


102. On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage


103. Multi Graph Search for High-Dimensional Robot Motion Planning


104. DeepSight: An All-in-One LM Safety Toolkit


105. Choose Your Agent: Tradeoffs in Adopting AI Advisors, Coaches, and Delegates in Multi-Party Negotiation


106. ModelWisdom: An Integrated Toolkit for TLA+ Model Visualization, Digest and Repair


107. Fourier Transformers for Latent Crystallographic Diffusion and Generative Modeling


108. An Empirical Study of the Imbalance Issue in Software Vulnerability Detection


109. On the Sensitivity of Firing Rate-Based Federated Spiking Neural Networks to Differential Privacy


110. Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?


111. Accelerating Robotic Reinforcement Learning with Agent Guidance


112. Manifold-Aware Temporal Domain Generalization for Large Language Models


113. TAVAE: A VAE with Adaptable Priors Explains Contextual Modulation in the Visual Cortex


114. Towards Performance-Enhanced Model-Contrastive Federated Learning using Historical Information in Heterogeneous Scenarios


115. Synthesis of Late Gadolinium Enhancement Images via Implicit Neural Representations for Cardiac Scar Segmentation


116. IncompeBench: A Permissively Licensed, Fine-Grained Benchmark for Music Information Retrieval


117. AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection


118. Who Does What? Archetypes of Roles Assigned to LLMs During Human-AI Decision-Making


119. DynaHOI: Benchmarking Hand-Object Interaction for Dynamic Target


120. Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs: A Systematic Evaluation


121. Mitigating Mismatch within Reference-based Preference Optimization


122. Agentic AI for Cybersecurity: A Meta-Cognitive Architecture for Governable Autonomy


123. Where Bits Matter in World Model Planning: A Paired Mixed-Bit Study for Efficient Spatial Reasoning


124. SynthRAR: Ring Artifacts Reduction in CT with Unrolled Network and Synthetic Data Training


125. Towards Fair and Comprehensive Evaluation of Routers in Collaborative LLM Systems


126. Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception


127. Resource-Aware Deployment Optimization for Collaborative Intrusion Detection in Layered Networks


128. Improving Neural Retrieval with Attribution-Guided Query Rewriting


129. ULTRA:Urdu Language Transformer-based Recommendation Architecture


130. Evaluating LLM Safety Under Repeated Inference via Accelerated Prompt Stress Testing


131. Safe Fairness Guarantees Without Demographics in Classification: Spectral Uncertainty Set Perspective


132. MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling


133. Cooperation Breakdown in LLM Agents Under Communication Delays


134. AmbiBench: Benchmarking Mobile GUI Agents Beyond One-Shot Instructions in the Wild


135. Adapting Vision-Language Models for E-commerce Understanding at Scale


136. LLM-Driven 3D Scene Generation of Agricultural Simulation Environments


137. Semantically Conditioned Diffusion Models for Cerebral DSA Synthesis


138. TabSieve: Explicit In-Table Evidence Selection for Tabular Prediction


139. OMEGA-Avatar: One-shot Modeling of 360° Gaussian Avatars


140. ANML: Attribution-Native Machine Learning with Guaranteed Robustness


141. DRACO: a Cross-Domain Benchmark for Deep Research Accuracy, Completeness, and Objectivity


142. PatientHub: A Unified Framework for Patient Simulation


143. Provable Offline Reinforcement Learning for Structured Cyclic MDPs


144. SToRM: Supervised Token Reduction for Multi-modal LLMs toward efficient end-to-end autonomous driving


145. LoRA-based Parameter-Efficient LLMs for Continuous Learning in Edge-based Malware Detection


146. DMind-3: A Sovereign Edge–Local–Cloud AI System with Controlled Deliberation and Correction-Based Tuning for Safe, Low-Latency Transaction Execution


147. Brain Tumor Classifiers Under Attack: Robustness of ResNet Variants Against Transferable FGSM and PGD Attacks


148. ViTaS: Visual Tactile Soft Fusion Contrastive Learning for Visuomotor Learning


149. Variation-aware Flexible 3D Gaussian Editing


150. ScalSelect: Scalable Training-Free Multimodal Data Selection for Efficient Visual Instruction Tuning


151. ArGEnT: Arbitrary Geometry-encoded Transformer for Operator Learning


152. PLOT-CT: Pre-log Voronoi Decomposition Assisted Generation for Low-dose CT Reconstruction


153. ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation


154. Gradient Compression May Hurt Generalization: A Remedy by Synthetic Data Guided Sharpness Aware Minimization



156. ReaDy-Go: Real-to-Sim Dynamic 3D Gaussian Splatting Simulation for Environment-Specific Visual Navigation with Moving Obstacles


157. Perception-based Image Denoising via Generative Compression


158. TS-Memory: Plug-and-Play Memory for Time Series Foundation Models


159. Native Reasoning Models: Training Language Models to Reason on Unverifiable Data


160. Krause Synchronization Transformers


161. AltTS: A Dual-Path Framework with Alternating Optimization for Multivariate Time Series Forecasting


162. Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs


163. Adaptive Milestone Reward for GUI Agents


164. Locally Interpretable Individualized Treatment Rules for Black-Box Decision Models


165. How Smart Is Your GUI Agent? A Framework for the Future of Software Interaction


166. Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt


167. Multimodal Fact-Level Attribution for Verifiable Reasoning


168. RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis


169. Understanding Persuasive Interactions between Generative Social Agents and Humans: The Knowledge-based Persuasion Model (KPM)


170. Compiler-Guided Inference-Time Adaptation: Improving GPT-5 Programming Performance in Idris


171. EM-Aware Physical Synthesis: Neural Inductor Modeling and Intelligent Placement & Routing for RF Circuits


172. From Noise to Order: Learning to Rank via Denoising Diffusion


173. Enhanced Portable Ultra Low-Field Diffusion Tensor Imaging with Bayesian Artifact Correction and Deep Learning-Based Super-Resolution


174. Towards Reliable Machine Translation: Scaling LLMs for Critical Error Detection and Safety


175. Fighting MRI Anisotropy: Learning Multiple Cardiac Shapes From a Single Implicit Neural Representation


176. Gradients Must Earn Their Influence: Unifying SFT with Generalized Entropic Objectives


177. When Visibility Outpaces Verification: Delayed Verification and Narrative Lock-in in Agentic AI Discourse


178. Can We Really Learn One Representation to Optimize All Rewards?


179. General and Efficient Steering of Unconditional Diffusion


180. Retrieval-Aware Distillation for Transformer-SSM Hybrids


181. The Manifold of the Absolute: Religious Perennialism as Generative Inference


182. The Energy of Falsehood: Detecting Hallucinations via Diffusion Model Likelihoods


183. Finding the Cracks: Improving LLMs Reasoning with Paraphrastic Probing and Consistency Verification


184. Bootstrapping-based Regularisation for Reducing Individual Prediction Instability in Clinical Risk Prediction Models


185. When Models Examine Themselves: Vocabulary-Activation Correspondence in Self-Referential Processing


186. Divide and Learn: Multi-Objective Combinatorial Optimization at Scale


187. Situated, Dynamic, and Subjective: Envisioning the Design of Theory-of-Mind-Enabled Everyday AI with Industry Practitioners


188. MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation


189. Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP


190. Predictive Associative Memory: Retrieval Beyond Similarity Through Temporal Co-occurrence


191. CryptoAnalystBench: Failures in Multi-Tool Long-Form LLM Analysis


192. HiFloat4 Format for Language Model Inference


193. DeepRed: an architecture for redshift estimation


194. How Many Features Can a Language Model Store Under the Linear Representation Hypothesis?


195. Toward Reliable Tea Leaf Disease Diagnosis Using Deep Learning Model: Enhancing Robustness With Explainable AI and Adversarial Training


196. AI-Driven Clinical Decision Support System for Enhanced Diabetes Diagnosis and Management


197. Credal Concept Bottleneck Models: Structural Separation of Epistemic and Aleatoric Uncertainty


198. SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents


199. UltraLIF: Fully Differentiable Spiking Neural Networks via Ultradiscretization and Max-Plus Algebra


200. Zero-Sacrifice Persistent-Robustness Adversarial Defense for Pre-Trained Encoders


201. interwhen: A Generalizable Framework for Verifiable Reasoning with Test-time Monitors


202. DDL2PropBank Agent: Benchmarking Multi-Agent Frameworks’ Developer Experience Through a Novel Relational Schema Mapping Task


203. Hybrid operator learning of wave scattering maps in high-contrast media


204. Position-Aware Self-supervised Representation Learning for Cross-mode Radar Signal Recognition


205. MELINOE: Fine-Tuning Enables Memory-Efficient Inference for Mixture-of-Experts Models


206. Time-TK: A Multi-Offset Temporal Interaction Framework Combining Transformer and Kolmogorov-Arnold Networks for Time Series Forecasting


207. MuCO: Generative Peptide Cyclization Empowered by Multi-stage Conformation Optimization


208. TDPNavigator-Placer: Thermal- and Wirelength-Aware Chiplet Placement in 2.5D Systems Through Multi-Agent Reinforcement Learning


209. Spectra: Rethinking Optimizers for LLMs Under Spectral Anisotropy


210. KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models


211. From Instruction to Output: The Role of Prompting in Modern NLG


212. What Do LLMs Know About Alzheimer’s Disease? Fine-Tuning, Probing, and Data Synthesis for AD Detection


213. Evaluating Few-Shot Temporal Reasoning of LLMs for Human Activity Prediction in Smart Environments


214. The Script Tax: Measuring Tokenization-Driven Efficiency and Latency Disparities in Multilingual Language Models


215. Efficient Hyper-Parameter Search for LoRA via Language-aided Bayesian Optimization


216. Disentangling Direction and Magnitude in Transformer Representations: A Double Dissociation Through L2-Matched Perturbation Analysis


217. Enhancing SDG-Text Classification with Combinatorial Fusion Analysis and Generative AI


218. Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering


219. Small Updates, Big Doubts: Does Parameter-Efficient Fine-tuning Enhance Hallucination Detection ?


220. Assessing LLM Reliability on Temporally Recent Open-Domain Questions


221. Automated Optimization Modeling via a Localizable Error-Driven Perspective


222. Nested Named Entity Recognition in Plasma Physics Research Articles


223. BIRD: A Museum Open Dataset Combining Behavior Patterns and Identity Types to Better Model Visitors’ Experience


224. Methodological Variation in Studying Staff and Student Perceptions of AI


225. HybridRAG: A Practical LLM-based ChatBot Framework based on Pre-Generated Q&A over Raw Unstructured Documents


226. Improving Medical Visual Reinforcement Fine-Tuning via Perception and Reasoning Augmentation