전체 AI 논문 - 2026-02-10

1. Agentic Uncertainty Reveals Agentic Overconfidence


2. AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents


3. From Features to Actions: Explainability in Traditional and Agentic AI Systems


4. An Adaptive Differentially Private Federated Learning Framework with Bi-level Optimization


5. LLM Active Alignment: A Nash Equilibrium Perspective


6. POP: Online Structural Pruning Enables Efficient Inference of Large Foundation Models


7. ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training


8. Wild Guesses and Mild Guesses in Active Concept Learning


9. Towards Understanding What State Space Models Learn About Code


10. Semantically Labelled Automata for Multi-Task Reinforcement Learning with LTL Instructions


11. Autoregressive Models for Knowledge Graph Generation


12. Same Answer, Different Representations: Hidden instability in VLMs


13. SeeUPO: Sequence-Level Agentic-RL with Convergence Guarantees


14. AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research


15. LogicSkills: A Structured Benchmark for Formal Reasoning in Large Language Models


16. HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction


17. Progress Constraints for Reinforcement Learning in Behavior Trees


18. JADE: Expert-Grounded Dynamic Evaluation for Open-Ended Professional Tasks


19. AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents


20. Intrinsic Stability Limits of Autoregressive Reasoning: Structural Consequences for Long-Horizon Execution


21. Unlocking Noisy Real-World Corpora for Foundation Model Pre-Training via Quality-Aware Tokenization


22. Difficulty-Estimated Policy Optimization


23. Trifuse: Enhancing Attention-Based GUI Grounding via Multimodal Fusion


24. Exposing Weaknesses of Large Reasoning Models through Graph Algorithm Problems


25. Do LLMs Act Like Rational Agents? Measuring Belief Coherence in Probabilistic Decision Making


26. Do It for HER: First-Order Temporal Logic Reward Specification in Reinforcement Learning (Extended Version)


27. Large Language Model Reasoning Failures


28. Jackpot: Optimal Budgeted Rejection Sampling for Extreme Actor-Policy Mismatch Reinforcement Learning


29. Learning a Generative Meta-Model of LLM Activations


30. InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning


31. DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos


32. Optimal Turkish Subword Strategies at Scale: Systematic Evaluation of Data, Vocabulary, Morphology Interplay


33. Endogenous Resistance to Activation Steering in Language Models


34. Cochain Perspectives on Temporal-Difference Signals for Learning Beyond Markov Dynamics


35. Implementing Grassroots Logic Programs with Multiagent Transition Systems and AI


36. From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers


37. Halluverse-M^3: A multitask multilingual benchmark for hallucination in LLMs


38. PANC: Prior-Aware Normalized Cut for Object Segmentation


39. TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering


40. Supercharging Simulation-Based Inference for Bayesian Optimal Experimental Design


41. NanoFLUX: Distillation-Driven Compression of Large Text-to-Image Generation Models for Mobile Devices


42. TraceCoder: A Trace-Driven Multi-Agent Framework for Automated Debugging of LLM-Generated Code


43. Zero-shot Generalizable Graph Anomaly Detection with Mixture of Riemannian Experts


44. The Quantum Sieve Tracer: A Hybrid Framework for Layer-Wise Activation Tracing in Large Language Models


45. Rethinking Multi-Condition DiTs: Eliminating Redundant Attention via Position-Alignment and Keyword-Scoping


46. The Representational Geometry of Number


47. AEGPO: Adaptive Entropy-Guided Policy Optimization for Diffusion Models


48. AI-Generated Music Detection in Broadcast Monitoring


49. Bridging 6G IoT and AI: LLM-Based Efficient Approach for Physical Layer’s Optimization Tasks


50. SuReNav: Superpixel Graph-based Constraint Relaxation for Navigation in Over-constrained Environments


51. On the Identifiability of Steering Vectors in Large Language Models


52. Generating Data-Driven Reasoning Rubrics for Domain-Adaptive Reward Modeling


53. Next-generation cyberattack detection with large language models: anomaly analysis across heterogeneous logs


54. AEGIS: Adversarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models


55. A Unified Framework for LLM Watermarks


56. Gold Exploration using Representations from a Multispectral Autoencoder


57. Optimal Abstractions for Verifying Properties of Kolmogorov-Arnold Networks (KANs)


58. Pairwise is Not Enough: Hypergraph Neural Networks for Multi-Agent Pathfinding


59. GhostCite: A Large-Scale Analysis of Citation Validity in the Age of Large Language Models


60. F-GRPO: Don’t Let Your Policy Learn the Obvious and Forget the Rare


61. SaDiT: Efficient Protein Backbone Design via Latent Structural Tokenization and Diffusion Transformers


62. compar:IA: The French Government’s LLM arena to collect French-language human prompts and preference data


63. Not All Layers Need Tuning: Selective Layer Restoration Recovers Diversity


64. Multimodal Generative Retrieval Model with Staged Pretraining for Food Delivery on Meituan


65. RAPID: Reconfigurable, Adaptive Platform for Iterative Design


66. Humanoid Manipulation Interface: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations


67. Temperature Scaling Attack Disrupting Model Confidence in Federated Learning


68. Trust Regions Sell, But Who’s Buying? Overlap Geometry as an Alternative Trust Region for Policy Optimization


69. DAVE: Distribution-aware Attribution via ViT Gradient Decomposition


70. The challenge of generating and evolving real-life like synthetic test data without accessing real-world raw data – a Systematic Review


71. Scaling Speech Tokenizers with Diffusion Autoencoders


72. Sample-Efficient Policy Space Response Oracles with Joint Experience Best Response


73. Personality as Relational Infrastructure: User Perceptions of Personality-Trait-Infused LLM Messaging


74. AgentStepper: Interactive Debugging of Software Development Agents


75. ProtoQuant: Quantization of Prototypical Parts For General and Fine-Grained Image Classification


76. Target noise: A pre-training based neural network initialization for efficient high resolution learning


77. Exploring Sparsity and Smoothness of Arbitrary $\ell_p$ Norms in Adversarial Attacks


78. Perturbing the Phase: Analyzing Adversarial Robustness of Complex-Valued Neural Networks


79. Transformer-based Parameter Fitting of Models derived from Bloch-McConnell Equations for CEST MRI Analysis


80. SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs


81. Which Graph Shift Operator? A Spectral Answer to an Empirical Question


82. LIBERO-X: Robustness Litmus for Vision-Language-Action Models


83. Dynamics-Aligned Shared Hypernetworks for Zero-Shot Actuator Inversion


84. Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study


85. MTQE.en-he: Machine Translation Quality Estimation for English-Hebrew


86. Completing Missing Annotation: Multi-Agent Debate for Accurate and Scalable Relevant Assessment for IR Benchmarks


87. Efficient-LVSM: Faster, Cheaper, and Better Large View Synthesis Model via Decoupled Co-Refinement Attention


88. Prism: Spectral Parameter Sharing for Multi-Agent Reinforcement Learning


89. Revisiting the Shape Convention of Transformer Language Models


90. Improve Large Language Model Systems with User Logs


91. Principle-Evolvable Scientific Discovery via Uncertainty Minimization


92. CORE: Comprehensive Ontological Relation Evaluation for Large Language Models


93. TrajAD: Trajectory Anomaly Detection for Trustworthy LLM Agents


94. TrailBlazer: History-Guided Reinforcement Learning for Black-Box LLM Jailbreaking


95. A methodology for analyzing financial needs hierarchy from social discussions using LLM


96. Investigating the structure of emotions by analyzing similarity and association of emotion words


97. TFusionOcc: Student’s t-Distribution Based Object-Centric Multi-Sensor Fusion Framework for 3D Occupancy Prediction


98. ARIS-RSMA Enhanced ISAC System: Joint Rate Splitting and Beamforming Design


99. Empirical Analysis of Adversarial Robustness and Explainability Drift in Cybersecurity Classifiers


100. Generating High-quality Privacy-preserving Synthetic Data


101. Revisiting Salient Object Detection from an Observer-Centric Perspective


102. Training Data Selection with Gradient Orthogonality for Efficient Domain Adaptation


103. SHINE: A Scalable In-Context Hypernetwork for Mapping Context to LoRA in a Single Pass


104. Di3PO – Diptych Diffusion DPO for Targeted Improvements in Image


105. Zero-Trust Runtime Verification for Agentic Payment Protocols: Mitigating Replay and Context-Binding Failures in AP2


106. Action Hallucination in Generative Visual-Language-Action Models


107. Can Post-Training Transform LLMs into Causal Reasoners?


108. The Condensate Theorem: Transformers are O(n), Not $O(n^2)$


109. Accelerating Vision Transformers on Brain Processing Unit


110. Toward generative machine learning for boosting ensembles of climate simulations


111. Can One-sided Arguments Lead to Response Change in Large Language Models?


112. GRP-Obliteration: Unaligning LLMs With a Single Unlabeled Prompt


113. Steering Safely or Off a Cliff? Rethinking Specificity and Robustness in Inference-Time Interventions


114. ASMa: Asymmetric Spatio-temporal Masking for Skeleton Action Representation Learning


115. REBEL: Hidden Knowledge Recovery via Evolutionary-Based Evaluation Loop


116. ATEX-CF: Attack-Informed Counterfactual Explanations for Graph Neural Networks


117. RuleSmith: Multi-Agent LLMs for Automated Game Balancing


118. SR4-Fit: An Interpretable and Informative Classification Algorithm Applied to Prediction of U.S. House of Representatives Elections


119. Coupled Local and Global World Models for Efficient First Order RL


120. Addressing the Waypoint-Action Gap in End-to-End Autonomous Driving via Vehicle Motion Models


121. Emergent Low-Rank Training Dynamics in MLPs with Smooth Activations


122. Multi-Way Representation Alignment


123. Learning Rate Scaling across LoRA Ranks and Transfer to Full Finetuning


124. AnyThermal: Towards Learning Universal Representations for Thermal Perception


125. Personagram: Bridging Personas and Product Design for Creative Ideation with Multimodal LLMs


126. Generics in science communication: Misaligned interpretations across laypeople, scientists, and large language models


127. Optimal rates for density and mode estimation with expand-and-sparsify representations


128. Stop the Flip-Flop: Context-Preserving Verification for Fast Revocable Diffusion Decoding


129. Protean Compiler: An Agile Framework to Drive Fine-grain Phase Ordering


130. Hear You in Silence: Designing for Active Listening in Human Interaction with Conversational Agents Using Context-Aware Pacing


131. Self-Improving World Modelling with Latent Actions


132. Urban Spatio-Temporal Foundation Models for Climate-Resilient Housing: Scaling Diffusion Transformers for Disaster Risk Prediction


133. Coding Agents with Environment Interaction: A Theoretical Perspective


134. NanoNet: Parameter-Efficient Learning with Label-Scarce Supervision for Lightweight Text Mining Model


135. SVRepair: Structured Visual Reasoning for Automated Program Repair


136. Transformer-Based Reinforcement Learning for Autonomous Orbital Collision Avoidance in Partially Observable Environments


137. Communication Enhances LLMs’ Stability in Strategic Thinking


138. Allocate Marginal Reviews to Borderline Papers Using LLM Comparative Ranking


139. HQP: Sensitivity-Aware Hybrid Quantization and Pruning for Ultra-Low-Latency Edge AI Inference


140. iScheduler: Reinforcement Learning-Driven Continual Optimization for Large-Scale Resource Investment Problems


141. Analyzing Diffusion and Autoregressive Vision Language Models in Multimodal Embedding Space


142. Rethinking Memory Mechanisms of Foundation Agents in the Second Half


143. Recontextualizing Famous Quotes for Brand Slogan Generation


144. Git for Sketches: An Intelligent Tracking System for Capturing Design Evolution


145. EUGens: Efficient, Unified, and General Dense Layers