전체 AI 논문 - 2026-01-02

1. Context-aware LLM-based AI Agents for Human-centered Energy Management Systems in Smart Buildings


2. AMAP Agentic Planning Technical Report


3. Iterative Deployment Improves Planning Skills in LLMs


4. Semi-Automated Data Annotation in Multisensor Datasets for Autonomous Vehicle Testing


5. Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem


6. A study on constraint extraction and exception exclusion in care worker scheduling


7. GenZ: Foundational models as latent variable generators within traditional statistical models


8. Explaining Why Things Go Where They Go: Interpretable Constructs of Human Organizational Preferences


9. BatteryAgent: Synergizing Physics-Informed Interpretation with LLM Reasoning for Intelligent Battery Fault Diagnosis


10. Multi-modal cross-domain mixed fusion model with dual disentanglement for fault diagnosis under unseen working conditions


11. Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization


12. Group Deliberation Oriented Multi-Agent Conversational Model for Complex Reasoning


13. Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization


14. Recursive Language Models


15. MCPAgentBench: A Real-world Task Benchmark for Evaluating LLM Agent MCP Tool Use


16. From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning


17. Evaluating the Reasoning Abilities of LLMs on Underrepresented Mathematics Competition Problems


18. Thinking on Maps: How Foundation Model Agents Explore, Remember, and Reason Map Environments


19. What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?


20. Align While Search: Belief-Guided Exploratory Inference for World-Grounded Embodied Agents


21. Constrained Language Model Policy Optimization via Risk-aware Stepwise Alignment


22. Deep Reinforcement Learning for Solving the Fleet Size and Mix Vehicle Routing Problem


23. SCP: Accelerating Discovery with a Global Web of Autonomous Scientific Agents


24. Graph-Based Exploration for ARC-AGI-3 Interactive Reasoning Tasks


25. CogRec: A Cognitive Recommender Agent Fusing Large Language Models and Soar for Explainable Recommendation


26. LoongFlow: Directed Evolutionary Search via a Cognitive Plan-Execute-Summarize Paradigm


27. ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment


28. SPARK: Search Personalization via Agent-Driven Retrieval and Knowledge-sharing


29. A Proof-of-Concept for Explainable Disease Diagnosis Using Large Language Models and Answer Set Programming


30. CASCADE: Cumulative Agentic Skill Creation through Autonomous Development and Evolution


31. The Drill-Down and Fabricate Test (DDFT): A Protocol for Measuring Epistemic Robustness in Language Models


32. SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time


33. Coordinated Humanoid Manipulation with Choice Policies



35. AdaGReS:Adaptive Greedy Context Selection via Redundancy-Aware Scoring for Token-Budgeted RAG


36. Generative Classifiers Avoid Shortcut Solutions


37. Modeling Language as a Sequence of Thoughts



39. DarkEQA: Benchmarking Vision-Language Models for Embodied Question Answering in Low-Light Indoor Environments


40. A Modal Logic for Possibilistic Reasoning with Fuzzy Formal Contexts


41. SymSeqBench: a unified framework for the generation and analysis of rule-based symbolic sequences and datasets


42. Evaluating the Impact of Compression Techniques on the Robustness of CNNs under Natural Corruptions


43. The Impact of LLMs on Online News Consumption and Production


44. ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands


45. Semi-overlapping Multi-bandit Best Arm Identification for Sequential Support Network Learning


46. MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control


47. HaineiFRDM: Explore Diffusion to Restore Defects in Fast-Movement Films


48. RAIR: A Rule-Aware Benchmark Uniting Challenging Long-Tail and Visual Salience Subset for E-commerce Relevance Assessment


49. AI-Driven Cloud Resource Optimization for Multi-Cluster Environments


50. mHC: Manifold-Constrained Hyper-Connections


51. Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements


52. Big AI is accelerating the metacrisis: What can we do?


53. PrivacyBench: A Conversational Benchmark for Evaluating Privacy in Personalized AI


54. Video and Language Alignment in 2D Systems for 3D Multi-object Scenes with Multi-Information Derivative-Free Control


55. Practising responsibility: Ethics in NLP as a hands-on course


56. LeanCat: A Benchmark Suite for Formal Category Theory in Lean (Part I: 1-Categories)


57. HiGR: Efficient Generative Slate Recommendation via Hierarchical Planning and Multi-Objective Preference Alignment


58. Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow


59. AstroReview: An LLM-driven Multi-Agent Framework for Telescope Proposal Peer Review and Refinement


60. LSRE: Latent Semantic Rule Encoding for Real-Time Semantic Risk Detection in Autonomous Driving


61. BandiK: Efficient Multi-Task Decomposition Using a Multi-Bandit Framework


62. Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting


63. Nested Learning: The Illusion of Deep Learning Architectures


64. R-Debater: Retrieval-Augmented Debate Generation through Argumentative Memory


65. An Adaptive, Disentangled Representation for Multidimensional MRI Reconstruction


66. VLA-RAIL: A Real-Time Asynchronous Inference Linker for VLA Models and Robots



68. Do Large Language Models Know What They Are Capable Of?


69. Hybrid Motion Planning with Deep Reinforcement Learning for Mobile Robot Navigation


70. DynaFix: Iterative Automated Program Repair Driven by Execution-Level Dynamic Information


71. AI-Driven Acoustic Voice Biomarker-Based Hierarchical Classification of Benign Laryngeal Voice Disorders from Sustained Vowels


72. AutoFed: Manual-Free Federated Traffic Prediction via Personalized Prompt


73. Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space


74. Chat-Driven Optimal Management for Virtual Network Services


75. Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time


76. SynRAG: A Large Language Model Framework for Executable Query Generation in Heterogeneous SIEM System


77. Localized Calibrated Uncertainty in Code Language Models


78. More Than Bits: Multi-Envelope Double Binary Factorization for Extreme Quantization


79. Generative AI-enhanced Sector-based Investment Portfolio Construction


80. Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice


81. Automated Classification of First-Trimester Fetal Heart Views Using Ultrasound-Specific Self-Supervised Learning


82. HOLOGRAPH: Active Causal Discovery via Sheaf-Theoretic Alignment of Large Language Model Priors


83. F2IDiff: Real-world Image Super-resolution using Feature to Image Diffusion Foundation Model


84. Foundation models on the bridge: Semantic hazard detection and safety maneuvers for maritime autonomy with vision-language models


85. Privacy-Preserving Semantic Communications via Multi-Task Learning and Adversarial Perturbations


86. PackKV: Reducing KV Cache Memory Footprint through LLM-Aware Lossy Compression


87. Comparing Approaches to Automatic Summarization in Less-Resourced Languages


88. Fast and Realistic Automated Scenario Simulations and Reporting for an Autonomous Racing Stack


89. FAST-IDS: A Fast Two-Stage Intrusion Detection System with Hybrid Compression for Real-Time Threat Detection in Connected and Autonomous Vehicles


90. Tubular Riemannian Laplace Approximations for Bayesian Neural Networks


91. Skim-Aware Contrastive Learning for Efficient Document Representation


92. FedSecureFormer: A Fast, Federated and Secure Transformer Framework for Lightweight Intrusion Detection in Connected and Autonomous Vehicles


93. DermaVQA-DAS: Dermatology Assessment Schema (DAS) & Datasets for Closed-Ended Question Answering & Segmentation in Patient-Generated Dermatology Images


94. Empower Low-Altitude Economy: A Reliability-Aware Dynamic Weighting Allocation for Multi-modal UAV Beam Prediction


95. Generative Video Compression: Towards 0.01% Compression Rate for Video Transmission


96. Virtual-Eyes: Quantitative Validation of a Lung CT Quality-Control Pipeline for Foundation-Model Cancer Risk Prediction


97. DRL-TH: Jointly Utilizing Temporal Graph Attention and Hierarchical Fusion for UGV Navigation in Crowded Environments


98. One-shot synthesis of rare gastrointestinal lesions improves diagnostic accuracy and clinical training


99. Taming Hallucinations: Boosting MLLMs’ Video Understanding via Counterfactual Video Generation


100. PointRAFT: 3D deep learning for high-throughput prediction of potato tuber weight from partial point clouds


101. Developing controlled natural language for formal specification patterns using AI assistants


102. GARDO: Reinforcing Diffusion Models without Reward Hacking


103. Unified Embodied VLM Reasoning with Robotic Action via Autoregressive Discretized Pre-training


104. OptRot: Mitigating Weight Outliers via Data-Free Rotations for Post-Training Quantization


105. Enhancing LLM-Based Neural Network Generation: Few-Shot Prompting and Efficient Validation for Automated Architecture Design


106. Multilevel Fair Allocation


107. Enhancing LLM Planning Capabilities through Intrinsic Self-Critique


108. Factorized Learning for Temporally Grounded Video-Language Models


109. FedLiTeCAN : A Federated Lightweight Transformer for Fast and Robust CAN Bus Intrusion Detection


110. Random Multiplexing


111. Pathology Context Recalibration Network for Ocular Disease Recognition


112. Beyond Hallucinations: A Composite Score for Measuring Reliability in Open-Source Large Language Models


113. AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives


114. Jailbreaking Attacks vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race?


115. Kidney Exchange: Faster Parameterized Algorithms and Tighter Lower Bounds


116. PipeFlow: Pipelined Processing and Motion-Aware Frame Selection for Long-Form Video Editing


117. RSAgent: Learning to Reason and Act for Text-Guided Segmentation via Multi-Turn Tool Invocations


118. FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing


119. iCLP: Large Language Model Reasoning with Implicit Cognition Latent Planning


120. TESO Tabu Enhanced Simulation Optimization for Noisy Black Box Problems


121. Tracing the Heart’s Pathways: ECG Representation Learning from a Cardiac Conduction Perspective


122. PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation


123. Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process


124. MeLeMaD: Adaptive Malware Detection via Chunk-wise Feature Selection and Meta-Learning


125. Coding With AI: From a Reflection on Industrial Practices to Future Computer Science and Software Engineering Education


126. Causify DataFlow: A Framework For High-performance Machine Learning Stream Computing


127. A Community-Aware Framework for Influence Maximization with Explicit Accounting for Inter-Community Influence


128. Efficient Context Scaling with LongCat ZigZag Attention


129. Physics-informed Graph Neural Networks for Operational Flood Modeling


130. An Comparative Analysis about KYC on a Recommendation System Toward Agentic Recommendation System


131. Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling


132. Interactive Machine Learning: From Theory to Scale


133. A multimodal Transformer for InSAR-based ground deformation forecasting with cross-site generalization across Europe


134. Efficient Deep Learning for Short-Term Solar Irradiance Time Series Forecasting: A Benchmark Study in Ho Chi Minh City


135. How Large Language Models Systematically Misrepresent American Climate Opinions


136. Autoregressive long-horizon prediction of plasma edge dynamics


137. Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack


138. Probing the Limits of Compressive Memory: A Study of Infini-Attention in Small-Scale Pretraining


139. Lifelong Domain Adaptive 3D Human Pose Estimation


140. Seeking Late Night Life Lines: Experiences of Conversational AI Use in Mental Health Crisis


141. Security Without Detection: Economic Denial as a Primitive for Edge and IoT Defense


142. From Correctness to Collaboration: Toward a Human-Centered Framework for Evaluating AI Agent Behavior in Software Engineering


143. Adversarial Lens: Exploiting Attention Layers to Generate Adversarial Examples for Evaluation


144. Retrieval Augmented Question Answering: When Should LLMs Admit Ignorance?


145. Explaining News Bias Detection: A Comparative SHAP Analysis of Transformer Model Decision Mechanisms


146. Artificial Intelligence for All? Brazilian Teachers on Ethics, Equity, and the Everyday Challenges of AI in Education


147. Video-Based Performance Evaluation for ECR Drills in Synthetic Training Environments


148. Quantum Error Mitigation with Attention Graph Transformers for Burgers Equation Solvers on NISQ Hardware


149. Improved Bounds for Private and Robust Alignment


150. StressRoBERTa: Cross-Condition Transfer Learning from Depression, Anxiety, and PTSD to Stress Detection


151. Zero-Trust Agentic Federated Learning for Secure IIoT Defense Systems


152. Prompt-Induced Over-Generation as Denial-of-Service: A Black-Box Attack-Side Benchmark


153. A Survey on Graph Neural Networks for Fraud Detection in Ride Hailing Platforms


154. FineFT: Efficient and Risk-Aware Ensemble Reinforcement Learning for Futures Trading


155. Safety-Biased Policy Optimisation: Towards Hard-Constrained Reinforcement Learning via Trust Regions


156. Uncovering Discrimination Clusters: Quantifying and Explaining Systematic Fairness Violations


157. Enabling Physical AI at the Edge: Hardware-Accelerated Recovery of System Dynamics


158. Entropy-Aware Speculative Decoding Toward Improved LLM Reasoning


159. Drift-Based Dataset Stability Benchmark


160. Audited Skill-Graph Self-Improvement for Agentic LLMs via Verifiable Rewards, Experience Synthesis, and Continual Memory


161. Leveraging Machine Learning for Early Detection of Lung Diseases


162. HINTS: Extraction of Human Insights from Time-Series Without External Sources


163. Generalized Regularized Evidential Deep Learning Models: Theory and Comprehensive Evaluation


164. Geometric Scaling of Bayesian Inference in LLMs


165. Coordinate Matrix Machine: A Human-level Concept Learning to Classify Very Similar Documents


166. State-of-the-art Small Language Coder Model: Mify-Coder


167. Hybrid-Code: A Privacy-Preserving, Redundant Multi-Agent Framework for Reliable Local Clinical Coding


168. AgenticTCAD: A LLM-based Multi-Agent Framework for Automated TCAD Code Generation and Device Optimization


169. Towards representation agnostic probabilistic programming


170. Break Out the Silverware – Semantic Understanding of Stored Household Items


171. Enforcing Temporal Constraints for LLM Agents


172. When in Doubt, Deliberate: Confidence-Based Routing to Expert Debate for Sexism Detection


173. q3-MuPa: Quick, Quiet, Quantitative Multi-Parametric MRI using Physics-Informed Diffusion Models


174. A Survey of AI Methods for Geometry Preparation and Mesh Generation in Engineering Simulation


175. HarmTransform: Transforming Explicit Harmful Queries into Stealthy via Multi-Agent Debate


176. PyBangla at BLP-2025 Task 2: Enhancing Bangla-to-Python Code Generation with Iterative Self-Correction and Multilingual Agents


177. STED and Consistency Scoring: A Framework for Evaluating LLM Structured Output Reliability


178. Enriching Historical Records: An OCR and AI-Driven Approach for Database Integration