전체 AI 논문 - 2026-02-12

1. FormalJudge: A Neuro-Symbolic Paradigm for Agentic Oversight


2. GameDevBench: Evaluating Agentic Capabilities Through Game Development


3. CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion


4. Can LLMs Cook Jamaican Couscous? A Study of Cultural Novelty in Recipe Generation


5. Reinforcing Chain-of-Thought Reasoning with Self-Evolving Rubrics


6. SynergyKGC: Reconciling Topological Heterogeneity in Knowledge Graph Completion via Topology-Aware Synergy


7. See, Plan, Snap: Evaluating Multimodal GUI Agents in Scratch


8. Integrating Generative AI-enhanced Cognitive Systems in Higher Education: From Stakeholder Perceptions to a Conceptual Framework considering the EU AI Act


9. Spend Search Where It Pays: Value-Guided Structured Sampling and Optimization for Generative Recommendation


10. OmniSapiens: A Foundation Model for Social Behavior Processing via Heterogeneity-Aware Relative Policy Optimization


11. To Think or Not To Think, That is The Question for Large Reasoning Models in Theory of Mind Tasks


12. Neuro-symbolic Action Masking for Deep Reinforcement Learning


13. Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets


14. Abstraction Generation for Generalized Planning with Pretrained Large Language Models


15. MERIT Feedback Elicits Better Bargaining in LLM Negotiators


16. Found-RL: foundation model-enhanced reinforcement learning for autonomous driving


17. LiveMedBench: A Contamination-Free Medical Benchmark for LLMs with Automated Rubric Evaluation


18. Discovering Differences in Strategic Behavior Between Humans and LLMs


19. Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling


20. GENIUS: Generative Fluid Intelligence Evaluation Suite


21. Data-Efficient Hierarchical Goal-Conditioned Reinforcement Learning via Normalizing Flows


22. Weight Decay Improves Language Model Plasticity


23. Learning to Compose for Cross-domain Agentic Workflow Generation


24. Safety Recovery in Reasoning Models Is Only a Few Early Steering Steps Away


25. Direct Learning of Calibration-Aware Uncertainty for Neural PDE Surrogates


26. DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning


27. General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies


28. GRASP: group-Shapley feature selection for patients


29. SteuerLLM: Local specialized large language model for German tax law analysis


30. In-the-Wild Model Organisms: Mitigating Undesirable Emergent Behaviors in Production LLM Post-Training via Data Attribution


31. Interpretable Attention-Based Multi-Agent PPO for Latency Spike Resolution in 6G RAN Slicing


32. Chatting with Images for Introspective Visual Thinking


33. Conversational Behavior Modeling Foundation Model With Multi-Level Perception


34. GraphSeek: Next-Generation Graph Analytics with LLMs


35. Language Model Inversion through End-to-End Differentiation


36. Linguistic Indicators of Early Cognitive Decline in the DementiaBank Pitt Corpus: A Statistical and Machine Learning Study


37. Chain-of-Look Spatial Reasoning for Dense Surgical Instrument Counting


38. ContactGaussian-WM: Learning Physics-Grounded World Model from Videos


39. OSIL: Learning Offline Safe Imitation Policies with Safety Inferred from Non-preferred Trajectories


40. From Buffers to Registers: Unlocking Fine-Grained FlashAttention with Hybrid-Bonded 3D NPU Co-Design


41. CVPL: A Geometric Framework for Post-Hoc Linkage Risk Assessment in Protected Tabular Data


42. ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression


43. Enhancing Predictability of Multi-Tenant DNN Inference for Autonomous Vehicles’ Perception


44. Fine-Tuning GPT-5 for GPU Kernel Generation


45. LoRA-Squeeze: Simple and Effective Post-Tuning and In-Tuning Compression of LoRA Modules


46. RiemannGL: Riemannian Geometry Changes Graph Deep Learning


47. FeatureBench: Benchmarking Agentic Coding for Complex Feature Development


48. Healthy Harvests: A Comparative Look at Guava Disease Classification Using InceptionV3


49. Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers


50. Search or Accelerate: Confidence-Switched Position Beam Search for Diffusion Language Models


51. Computational Phenomenology of Temporal Experience in Autism: Quantifying the Emotional and Narrative Characteristics of Lived Unpredictability


52. What do people want to fact-check?


53. Traceable, Enforceable, and Compensable Participation: A Participation Ledger for People-Centered AI Governance


54. Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System


55. Resource-Efficient Model-Free Reinforcement Learning for Board Games



57. The CLEF-2026 FinMMEval Lab: Multilingual and Multimodal Evaluation of Financial AI Systems


58. Diagnosing Structural Failures in LLM-Based Evidence Extraction for Meta-Analysis


59. FedPS: Federated data Preprocessing via aggregated Statistics


60. ICA: Information-Aware Credit Assignment for Visually Grounded Long-Horizon Information-Seeking Agents


61. Time Series Foundation Models for Energy Load Forecasting on Consumer Hardware: A Multi-Dimensional Zero-Shot Benchmark


62. Enhancing Multivariate Time Series Forecasting with Global Temporal Retrieval


63. Flow caching for autoregressive video generation


64. Beyond Confidence: The Rhythms of Reasoning in Generative Models


65. PELLI: Framework to effectively integrate LLMs for quality software generation


66. RSHallu: Dual-Mode Hallucination Evaluation for Remote-Sensing Multimodal Large Language Models with Domain-Tailored Mitigation


67. Transport, Don’t Generate: Deterministic Geometric Flows for Combinatorial Optimization


68. VulReaD: Knowledge-Graph-guided Software Vulnerability Reasoning and Detection


69. Kill it with FIRE: On Leveraging Latent Space Directions for Runtime Backdoor Mitigation in Deep Neural Networks


70. LOREN: Low Rank-Based Code-Rate Adaptation in Neural Receivers


71. Exploring the impact of adaptive rewiring in Graph Neural Networks


72. SecureScan: An AI-Driven Multi-Layer Framework for Malware and Phishing Detection Using Logistic Regression and Threat Intelligence Integration


73. Self-Supervised Image Super-Resolution Quality Assessment based on Content-Free Multi-Model Oriented Representation Learning


74. Calliope: A TTS-based Narrated E-book Creator Ensuring Exact Synchronization, Privacy, and Layout Fidelity


75. A Diffusion-Based Generative Prior Approach to Sparse-view Computed Tomography


76. Locomo-Plus: Beyond-Factual Cognitive Memory Evaluation Framework for LLM Agents


77. Cross-Sectional Asset Retrieval via Future-Aligned Soft Contrastive Learning


78. Interpretable Graph-Level Anomaly Detection via Contrast with Normal Prototypes


79. AugVLA-3D: Depth-Driven Feature Augmentation for Vision-Language-Action Models


80. VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training


81. OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL


82. TwiFF (Think With Future Frames): A Large-Scale Dataset for Dynamic Visual Reasoning


83. The Neurosymbolic Frontier of Nonuniform Ellipticity: Formalizing Sharp Schauder Theory via Topos-Theoretic Reasoning Models


84. A Vision-Language Foundation Model for Zero-shot Clinical Collaboration and Automated Concept Discovery in Dermatology


85. Mitigating Reward Hacking in RLHF via Bayesian Non-negative Reward Modeling


86. Online Causal Kalman Filtering for Stable and Effective Policy Optimization


87. Hierarchical Zero-Order Optimization for Deep Neural Networks


88. Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters


89. Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity


90. LLM-Based Scientific Equation Discovery via Physics-Informed Token-Regularized Policy Optimization


91. MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning


92. When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning


93. LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer


94. Contrastive Learning for Multi Label ECG Classification with Jaccard Score Based Sigmoid Loss


95. C^2ROPE: Causal Continuous Rotary Positional Encoding for 3D Large Multimodal-Models Reasoning


96. Enhancing Weakly Supervised Multimodal Video Anomaly Detection through Text Guidance


97. RealHD: A High-Quality Dataset for Robust Detection of State-of-the-Art AI-Generated Images


98. $μ$pscaling small models: Principled warm starts and hyperparameter transfer


99. A Swap-Adversarial Framework for Improving Domain Generalization in Electroencephalography-Based Parkinson’s Disease Prediction


100. AI-PACE: A Framework for Integrating AI into Medical Education


101. LHAW: Controllable Underspecification for Long-Horizon Tasks


102. Co-jump: Cooperative Jumping with Quadrupedal Robots via Multi-Agent Reinforcement Learning


103. 1%>100%: High-Efficiency Visual Adapter with Complex Linear Projection Optimization


104. Learning Structure-Semantic Evolution Trajectories for Graph Domain Adaptation


105. Low-Dimensional Execution Manifolds in Transformer Learning Dynamics: Evidence from Modular Arithmetic Tasks


106. Learning Adaptive Distribution Alignment with Neural Characteristic Function for Graph Domain Adaptation


107. Protecting Context and Prompts: Deterministic Security for Non-Deterministic AI


108. Driving Reaction Trajectories via Latent Flow Matching


109. Why Human Guidance Matters in Collaborative Vibe Coding


110. Authenticated Workflows: A Systems Approach to Protecting Agentic AI


111. Constructing Industrial-Scale Optimization Modeling Benchmark


112. A Unified Theory of Random Projection for Influence Functions


113. LakeMLB: Data Lake Machine Learning Benchmark


114. AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning


115. Control Reinforcement Learning: Token-Level Mechanistic Analysis via Learned SAE Feature Steering


116. A Dual-Stream Physics-Augmented Unsupervised Architecture for Runtime Embedded Vehicle Health Monitoring


117. Breaking the Curse of Repulsion: Optimistic Distributionally Robust Policy Optimization for Off-Policy Generative Recommendation


118. AIvilization v0: Toward Large-Scale Artificial Social Simulation with a Unified Agent Architecture and Adaptive Agent Profiles


119. Equivariant Evidential Deep Learning for Interatomic Potentials


120. AI-rithmetic


121. Modular Multi-Task Learning for Chemical Reaction Prediction


122. Affordances Enable Partial World Modeling with LLMs


123. Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs


124. Making Databases Faster with LLM Evolutionary Sampling


125. Time-to-Event Transformer to Capture Timing Attention of Events in EHR Time Series


126. The Alignment Bottleneck in Decomposition-Based Claim Verification


127. ENIGMA: EEG-to-Image in 15 Minutes Using Less Than 1% of the Parameters


128. Beyond Calibration: Confounding Pathology Limits Foundation Model Specificity in Abdominal Trauma CT


129. Learning Self-Interpretation from Interpretability Artifacts: Training Lightweight Adapters on Vector-Label Pairs


130. Are More Tokens Rational? Inference-Time Scaling in Language Models as Adaptive Resource Rationality


131. Confounding Robust Continuous Control via Automatic Reward Shaping


132. ECHO: An Open Research Platform for Evaluation of Chat, Human Behavior, and Outcomes


133. ERGO: Excess-Risk-Guided Optimization for High-Fidelity Monocular 3D Gaussian Splatting


134. From Classical to Topological Neural Networks Under Uncertainty


135. The Complexity of Bayesian Network Learning: Revisiting the Superstructure


136. KORAL: Knowledge Graph Guided LLM Reasoning for SSD Operational Analysis


137. Transforming Policy-Car Swerving for Mitigating Stop-and-Go Traffic Waves: A Practice-Oriented Jam-Absorption Driving Strategy


138. ImprovEvolve: Ask AlphaEvolve to Improve the Input Solution and Then Improvise


139. Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards


140. Self-Evolving Recommendation System: End-To-End Autonomous Model Optimization With LLM Agents


141. Quantum Integrated Sensing and Computation with Indefinite Causal Order


142. Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models


143. Versor: A Geometric Sequence Architecture


144. When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models


145. Towards Autonomous Mathematics Research


146. Cosmo3DFlow: Wavelet Flow Matching for Spatial-to-Spectral Compression in Reconstructing the Early Universe


147. EvoCodeBench: A Human-Performance Benchmark for Self-Evolving LLM-Driven Coding Systems


148. EVA: Towards a universal model of the immune system


149. Anatomy-Preserving Latent Diffusion for Generation of Brain Segmentation Masks with Ischemic Infarct


150. Beyond SMILES: Evaluating Agentic Systems for Drug Discovery


151. Omni-Safety under Cross-Modality Conflict: Vulnerabilities, Dynamics Mechanisms and Efficient Alignment


152. AD$^2$: Analysis and Detection of Adversarial Threats in Visual Perception for End-to-End Autonomous Driving Systems


153. NMRTrans: Structure Elucidation from Experimental NMR Spectra via Set Transformers


154. MalMoE: Mixture-of-Experts Enhanced Encrypted Malicious Traffic Detection Under Graph Drift


155. PRISM-XR: Empowering Privacy-Aware XR Collaboration with Multimodal Large Language Models


156. PEST: Physics-Enhanced Swin Transformer for 3D Turbulence Simulation


157. Exploring Semantic Labeling Strategies for Third-Party Cybersecurity Risk Assessment Questionnaires


158. Red-teaming the Multimodal Reasoning: Jailbreaking Vision-Language Models via Cross-modal Entanglement Attacks


159. On the Use of a Large Language Model to Support the Conduction of a Systematic Mapping Study: A Brief Report from a Practitioner’s View


160. Silence Routing: When Not Speaking Improves Collective Judgment


161. When LLMs get significantly worse: A statistical approach to detect model degradations


162. Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study


163. Anonymization-Enhanced Privacy Protection for Mobile GUI Agents: Available but Invisible


164. Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs – Evolution, Limitations, and Cognitive Enhancement


165. Multi-encoder ConvNeXt Network with Smooth Attentional Feature Fusion for Multispectral Semantic Segmentation


166. Reverse-Engineering Model Editing on Language Models


167. AgentTrace: A Structured Logging Framework for Agent System Observability


168. TokaMark: A Comprehensive Benchmark for MAST Tokamak Plasma Models


169. The Anatomy of the Moltbook Social Graph


170. “Humans welcome to observe”: A First Look at the Agent Social Network Moltbook


171. A Practical Guide to Agentic AI Transition in Organizations


172. Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke