전체 AI 논문 - 2026-01-29

1. SokoBench: Evaluating Long-Horizon Planning and Reasoning in Large Language Models


2. Deep Researcher with Sequential Plan Reflection and Candidates Crossover (Deep Researcher Reflect Evolve)


3. MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents


4. REASON: Accelerating Probabilistic Logical Reasoning for Scalable Neuro-Symbolic Intelligence


5. Implementing Metric Temporal Answer Set Programming


6. Enterprise Resource Planning Using Multi-type Transformers in Ferro-Titanium Industry


7. Investigating the Development of Task-Oriented Communication in Vision-Language Models


8. Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation


9. Dialogical Reasoning Across AI Architectures: A Multi-Model Framework for Testing AI Alignment Strategies


10. Online Risk-Averse Planning in POMDPs Using Iterated CVaR Value Function


11. PathWise: Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs


12. Normative Equivalence in human-AI Cooperation: Behaviour, Not Identity, Drives Cooperation in Mixed-Agent Groups


13. CtrlCoT: Dual-Granularity Chain-of-Thought Compression for Controllable Reasoning


14. OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution


15. Policy of Thoughts: Scaling LLM Reasoning via Test-time Policy Evolution


16. AMA: Adaptive Memory via Multi-Agent Collaboration


17. ECG-Agent: On-Device Tool-Calling Agent for ECG Multi-Turn Dialogue


18. Endogenous Reprompting: Self-Evolving Cognitive Alignment for Unified Multimodal Models


19. Scaling Medical Reasoning Verification via Tool-Integrated Reinforcement Learning


20. Towards Intelligent Urban Park Development Monitoring: LLM Agents for Multi-Modal Information Fusion and Analysis


21. Should I Have Expressed a Different Intent? Counterfactual Generation for LLM-Based Autonomous Control


22. Insight Agents: An LLM-Based Multi-Agent System for Data Insights


23. Fuzzy Categorical Planning: Autonomous Goal Satisfaction with Graded Semantic Constraints


24. Teaching LLMs to Ask: Self-Querying Category-Theoretic Planning for Under-Specified Reasoning


25. NeuroAI and Beyond


26. Evolutionary Strategies lead to Catastrophic Forgetting in LLMs


27. Exploring Transformer Placement in Variational Autoencoders for Tabular Data Generation


28. Post-Training Fairness Control: A Single-Train Framework for Dynamic Fairness in Recommendation


29. A New Dataset and Framework for Robust Road Surface Classification via Camera-IMU Fusion


30. $\mathbb{R}^{2k}$ is Theoretically Large Enough for Embedding-based Top-$k$ Retrieval


31. Reward Models Inherit Value Biases from Pretraining


32. Open-Vocabulary Functional 3D Human-Scene Interaction Generation


33. Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning


34. GNN Explanations that do not Explain and How to find Them


35. Reinforcement Learning via Self-Distillation


36. Conditional PED-ANOVA: Hyperparameter Importance in Hierarchical & Dynamic Search Spaces


37. FAIRT2V: Training-Free Debiasing for Text-to-Video Diffusion Models


38. Independence of Approximate Clones


39. HESTIA: A Hessian-Guided Differentiable Quantization-Aware Training Framework for Extremely Low-Bit LLMs


40. QueerGen: How LLMs Reflect Societal Norms on Gender and Sexuality in Sentence Completion Tasks


41. Li-ViP3D++: Query-Gated Deformable Camera-LiDAR Fusion for End-to-End Perception and Trajectory Prediction


42. Adapting the Behavior of Reinforcement Learning Agents to Changing Action Spaces and Reward Functions


43. Beyond GEMM-Centric NPUs: Enabling Efficient Diffusion LLM Sampling


44. LEMON: How Well Do MLLMs Perform Temporal Multimodal Understanding on Instructional Videos?


45. Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework


46. Harnessing Large Language Models for Precision Querying and Retrieval-Augmented Knowledge Extraction in Clinical Data Science


47. Learning Contextual Runtime Monitors for Safe AI-Based Autonomy


48. Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability


49. GDCNet: Generative Discrepancy Comparison Network for Multimodal Sarcasm Detection


50. Agent Benchmarks Fail Public Sector Requirements


51. WFR-MFM: One-Step Inference for Dynamic Unbalanced Optimal Transport


52. CLEAR-Mamba:Towards Accurate, Adaptive and Trustworthy Multi-Sequence Ophthalmic Angiography Classification


53. Regularized Gradient Temporal-Difference Learning


54. Person Re-ID in 2025: Supervised, Self-Supervised, and Language-Aligned. What Works?


55. Ranking-aware Reinforcement Learning for Ordinal Ranking


56. Inequality in Congestion Games with Learning Agents


57. Robust Distributed Learning under Resource Constraints: Decentralized Quantile Estimation via (Asynchronous) ADMM


58. Unsupervised Ensemble Learning Through Deep Energy-based Models


59. IoT Device Identification with Machine Learning: Common Pitfalls and Best Practices


60. Interpreting Emergent Extreme Events in Multi-Agent Systems


61. CCMamba: Selective State-Space Models for Higher-Order Graph Learning on Combinatorial Complexes


62. Audio Deepfake Detection in the Age of Advanced Text-to-Speech models


63. Comparative evaluation of training strategies using partially labelled datasets for segmentation of white matter hyperintensities and stroke lesions in FLAIR MRI


64. Fair Recourse for All: Ensuring Individual and Group Fairness in Counterfactual Explanations


65. Assembling the Mind’s Mosaic: Towards EEG Semantic Intent Decoding


66. Self Voice Conversion as an Attack against Neural Audio Watermarking


67. Guiding the Recommender: Information-Aware Auto-Bidding for Content Promotion


68. Let’s Roll a BiFTA: Bi-refinement for Fine-grained Text-visual Alignment in Vision-Language Models


69. Meeting SLOs, Slashing Hours: Automated Enterprise LLM Optimization with OptiKIT


70. On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents


71. GuideAI: A Real-time Personalized Learning Solution with Adaptive Interventions


72. FedRD: Reducing Divergences for Generalized Federated Learning via Heterogeneity-aware Parameter Guidance


73. LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning


74. Can Continuous-Time Diffusion Models Generate and Solve Globally Constrained Discrete Problems? A Study on Sudoku


75. Switchcodec: Adaptive residual-expert sparse quantization for high-fidelity neural audio coding


76. CURVE: Learning Causality-Inspired Invariant Representations for Robust Scene Understanding via Uncertainty-Guided Regularization


77. Multimodal Multi-Agent Ransomware Analysis Using AutoGen


78. MobileBench-OL: A Comprehensive Chinese Benchmark for Evaluating Mobile GUI Agents in Real-World Environment


79. Demonstration-Free Robotic Control via LLM Agents


80. Beyond Speedup – Utilizing KV Cache for Sampling and Reasoning



82. SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips


83. Structure-constrained Language-informed Diffusion Model for Unpaired Low-dose Computed Tomography Angiography Reconstruction


84. Physically Guided Visual Mass Estimation from a Single RGB Image


85. Towards Compact and Robust DNNs via Compression-aware Sharpness Minimization


86. Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction


87. Cheap2Rich: A Multi-Fidelity Framework for Data Assimilation and System Identification of Multiscale Physics – Rotating Detonation Engines


88. The Forecast After the Forecast: A Post-Processing Shift in Time Series


89. Beyond the Needle’s Illusion: Decoupled Evaluation of Evidence Access and Use under Semantic Interference at 326M-Token Scale


90. Eliciting Least-to-Most Reasoning for Phishing URL Detection


91. Robust SDE Parameter Estimation Under Missing Time Information Setting


92. Automated Benchmark Generation from Domain Guidelines Informed by Bloom’s Taxonomy


93. Order-Optimal Sample Complexity of Rectified Flows


94. How AI Impacts Skill Formation


95. MALLOC: Benchmarking the Memory-aware Long Sequence Compression for Large Sequential Recommendation


96. Certificate-Guided Pruning for Stochastic Lipschitz Optimization


97. ProFlow: Zero-Shot Physics-Consistent Sampling via Proximal Flow Guidance


98. Meta-Cognitive Reinforcement Learning with Self-Doubt and Recovery


99. Causal-Driven Feature Evaluation for Cross-Domain Image Classification


100. NeuraLSP: An Efficient and Rigorous Neural Left Singular Subspace Preconditioner for Conjugate Gradient Methods


101. What’s the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering


102. Large language models accurately predict public perceptions of support for climate action worldwide


103. Taxonomy of the Retrieval System Framework: Pitfalls and Paradigms


104. BengaliSent140: A Large-Scale Bengali Binary Sentiment Dataset for Hate and Non-Hate Speech Classification


105. Rewarding Intellectual Humility Learning When Not To Answer In Large Language Models


106. Membership Inference Attacks Against Fine-tuned Diffusion Language Models


107. How Much Progress Has There Been in NVIDIA Datacenter GPUs?


108. Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis


109. Taming Toxic Talk: Using chatbots to intervene with users posting toxic comments


110. Dynamics of Human-AI Collective Knowledge on the Web: A Scalable Model and Insights for Sustainable Growth


111. LLaTTE: Scaling Laws for Multi-Stage Sequence Modeling in Large-Scale Ads Recommendation


112. Semi-Supervised Masked Autoencoders: Unlocking Vision Transformer Potential with Limited Data


113. VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning


114. Size Matters: Reconstructing Real-Scale 3D Models from Monocular Images for Food Portion Estimation


115. CiMRAG: Cim-Aware Domain-Adaptive and Noise-Resilient Retrieval-Augmented Generation for Edge-Based LLMs


116. Structural Compositional Function Networks: Interpretable Functional Compositions for Tabular Discovery


117. LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?


118. On the Effectiveness of LLM-Specific Fine-Tuning for Detecting AI-Generated Text


119. Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers


120. Cross-Session Decoding of Neural Spiking Data via Task-Conditioned Latent Alignment


121. MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference


122. Do we really need Self-Attention for Streaming Automatic Speech Recognition?


123. VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models


124. Probabilistic Sensing: Intelligence in Data Sampling


125. LTS-VoiceAgent: A Listen-Think-Speak Framework for Efficient Streaming Voice Interaction via Semantic Triggering and Incremental Reasoning


126. NCSAM Noise-Compensated Sharpness-Aware Minimization for Noisy Label Learning


127. Benchmarking von ASR-Modellen im deutschen medizinischen Kontext: Eine Leistungsanalyse anhand von Anamnesegesprächen


128. Bench4HLS: End-to-End Evaluation of LLMs in High-Level Synthesis Code Generation


129. Continuous-Flow Data-Rate-Aware CNN Inference on FPGA


130. DecHW: Heterogeneous Decentralized Federated Learning Exploiting Second-Order Information


131. Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data


132. Mem2ActBench: A Benchmark for Evaluating Long-Term Memory Utilization in Task-Oriented Autonomous Agents


133. Quantifying non deterministic drift in large language models


134. Text-to-State Mapping for Non-Resolution Reasoning: The Contradiction-Preservation Principle


135. SDUs DAISY: A Benchmark for Danish Culture


136. Stingy Context: 18:1 Hierarchical Code Compression for LLM Auto-Coding


137. The Grammar of Transformers: A Systematic Review of Interpretability Research on Syntactic Knowledge in Language Models


138. Evaluating Large Language Models for Abstract Evaluation Tasks: An Empirical Study


139. OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling


140. Table-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation


141. HEART: A Unified Benchmark for Assessing Humans and LLMs in Emotional Support Dialogue


142. Demystifying Multi-Agent Debate: The Role of Confidence and Diversity


143. FastWhisper: Adaptive Self-knowledge Distillation for Real-time Automatic Speech Recognition


144. Modeling Next-Token Prediction as Left-Nested Intuitionistic Implication


145. Simulating Complex Multi-Turn Tool Calling Interactions in Stateless Execution Environments


146. From Intuition to Expertise: Rubric-Based Cognitive Calibration for Human Detection of LLM-Generated Korean Text


147. Analysis of LLM Vulnerability to GPU Soft Errors: An Instruction-Level Fault Injection Study


148. GTAC: A Generative Transformer for Approximate Circuits


149. DABench-LLM: Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators for LLMs


150. STELLAR: Structure-guided LLM Assertion Retrieval and Generation for Formal Verification