전체 AI 논문 - 2026-01-09

1. Agent Drift: Quantifying Behavioral Degradation in Multi-Agent LLM Systems Over Extended Interactions


2. ComfySearch: Autonomous Exploration and Reasoning for ComfyUI Workflows


3. MobileDreamer: Generative Sketch World Model for GUI Agent


4. Anti-Length Shift: Dynamic Outlier Truncation for Training Efficient Reasoning Models


5. Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification


6. Current Agents Fail to Leverage World Model as Tool for Foresight


7. Investigating the Grounding Bottleneck for a Large-Scale Configuration Problem: Existing Tools and Constraint-Aware Guessing


8. xDNN(ASP): Explanation Generation System for Deep Neural Networks powered by Answer Set Programming


9. Formally Explaining Decision Tree Models with Answer Set Programming



11. Defeasible Conditionals using Answer Set Programming


12. ROI-Reasoning: Rational Optimization for Inference via Pre-Computation Meta-Cognition


13. EntroCoT: Enhancing Chain-of-Thought via Adaptive Entropy-Guided Segmentation


14. Personalized Medication Planning via Direct Domain Modeling and LLM-Generated Heuristics


15. Sandwich Reasoning: An Answer-Reasoning-Answer Approach for Low-Latency Query Correction


16. How Does the Thinking Step Influence Model Safety? An Entropy-based Safety Reminder for LRMs


17. Architecting Agentic Communities using Design Patterns


18. Interleaved Tool-Call Reasoning for Protein Function Understanding


19. Controllable LLM Reasoning via Sparse Autoencoder-Based Steering


20. SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models


21. ReEfBench: Quantifying the Reasoning Efficiency of LLMs


22. STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules


23. Variance Computation for Weighted Model Counting with Knowledge Compilation Approach


24. Evolving Programmatic Skill Networks


25. Personalization of Large Foundation Models for Health Interventions


26. CPGPrompt: Translating Clinical Guidelines into LLM-Executable Decision Support


27. Toward Maturity-Based Certification of Embodied AI: Quantifying Trustworthiness Through Measurement Mechanisms


28. Exploration Through Introspection: A Self-Aware Reward Model


29. Enhancing LLM Instruction Following: An Evaluation-Driven Multi-Agentic Workflow for Prompt Instructions Optimization


30. Digital Red Queen: Adversarial Program Evolution in Core War with LLMs


31. Mastering the Game of Go with Self-play Experience Replay


32. Embedding Autonomous Agents in Resource-Constrained Robotic Platforms


33. Clinical Data Goes MEDS? Let’s OWL make sense of it


34. Klear: Unified Multi-Task Audio-Video Joint Generation


35. Wow, wo, val! A Comprehensive Embodied World Model Evaluation Turing Test


36. ContextFocus: Activation Steering for Contextual Faithfulness in Large Language Models


37. Pixel-Wise Multimodal Contrastive Learning for Remote Sensing Images


38. InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training


39. Quantifying the Impact of Modules and Their Interactions in the PSO-X Framework


40. Layer-wise Positional Bias in Short-Context Language Modeling


41. CSSG: Measuring Code Similarity with Semantic Graphs


42. Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts


43. Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models


44. HoneyTrap: Deceiving Large Language Model Attackers to Honeypot Traps with Resilient Multi-Agent Defense


45. A Scheduling Framework for Efficient MoE Inference on Edge GPU-NDP Systems


46. Large-Scale Aspect-Based Sentiment Analysis with Reasoning-Infused LLMs


47. FOREVER: Forgetting Curve-Inspired Memory Replay for Language Model Continual Learning


48. Bayes-PD: Exploring a Sequence to Binding Bayesian Neural Network model trained on Phage Display data


49. FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection


50. A Gap Between Decision Trees and Neural Networks


51. An Algebraic Representation Theorem for Linear GENEOs in Geometric Machine Learning


52. Adaptive-Boundary-Clipping GRPO: Ensuring Bounded Ratios for Stable and Generalizable Training


53. Spectral Manifold Regularization for Stable and Modular Routing in Deep MoE Architectures


54. IndexTTS 2.5 Technical Report


55. FLNet: Flood-Induced Agriculture Damage Assessment using Super Resolution of Satellite Images


56. Women Worry, Men Adopt: How Gendered Perceptions Shape the Use of Generative AI


57. What Matters For Safety Alignment?


58. Implementing the First-Order Logic of Here and There


59. When Numbers Start Talking: Implicit Numerical Coordination Among LLM-Based Agents


60. On the Trap Space Semantics of Normal Logic Programs


61. Logic Tensor Network-Enhanced Generative Adversarial Network


62. IDESplat: Iterative Depth Probability Estimation for Generalizable 3D Gaussian Splatting


63. AI Generated Text Detection


64. Where meaning lives: Layer-wise accessibility of psycholinguistic features in encoder and decoder language models


65. An Algorithmic Framework for Systematic Literature Reviews: A Case Study for Financial Narratives


66. Do LLMs Really Memorize Personally Identifiable Information? Revisiting PII Leakage with a Cue-Controlled Memorization Framework


67. NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning


68. Criminal Liability of Generative Artificial Intelligence Providers for User-Generated Child Sexual Abuse Material


69. Membox: Weaving Topic Continuity into Long-Range Memory for LLM Agents


70. PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation


71. Scalable Machine Learning Force Fields for Macromolecular Systems Through Long-Range Aware Message Passing


72. Learning Shrinks the Hard Tail: Training-Dependent Inference Scaling in a Solvable Linear Model


73. Evaluation of Multilingual LLMs Personalized Text Generation Capabilities Targeting Groups and Social-Media Platforms


74. Bridging OLAP and RAG: A Multidimensional Approach to the Design of Corpus Partitioning


75. O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL


76. RadDiff: Describing Differences in Radiology Image Sets with Natural Language


77. From Laboratory to Real-World Applications: Benchmarking Agentic Code Reasoning at the Repository Level


78. CSMCIR: CoT-Enhanced Symmetric Alignment with Memory Bank for Composed Image Retrieval


79. R$^3$L: Reflect-then-Retry Reinforcement Learning with Language-Guided Exploration, Pivotal Credit, and Positive Amplification


80. The Power of 10: New Rules for the Digital World


81. MHRC-Bench: A Multilingual Hardware Repository-Level Code Completion benchmark


82. Investigating Knowledge Distillation Through Neural Networks for Protein Binding Affinity Prediction


83. TreeAdv: Tree-Structured Advantage Redistribution for Group-Based RL


84. Inference Attacks Against Graph Generative Diffusion Models


85. ADEPT: Adaptive Dynamic Early-Exit Process for Transformers


86. Can AI Chatbots Provide Coaching in Engineering? Beyond Information Processing Toward Mastery


87. A Pre-trained Reaction Embedding Descriptor Capturing Bond Transformation Patterns


88. From Implicit to Explicit: Token-Efficient Logical Supervision for Mathematical Reasoning in LLMs


89. Towards Compositional Generalization of LLMs via Skill Taxonomy Guided Data Synthesis


90. Disentangling Aleatoric and Epistemic Uncertainty in Physics-Informed Neural Networks. Application to Insulation Material Degradation Prognostics


91. Discontinuous Galerkin finite element operator network for solving non-smooth PDEs


92. e5-omni: Explicit Cross-modal Alignment for Omni-modal Embeddings


93. AMIR-GRPO: Inducing Implicit Preference Signals into GRPO


94. Group and Exclusive Sparse Regularization-based Continual Learning of CNNs


95. In Search of Grandmother Cells: Tracing Interpretable Neurons in Tabular Representations


96. ReLA: Representation Learning and Aggregation for Job Scheduling with Reinforcement Learning


97. MFC-RFNet: A Multi-scale Guided Rectified Flow Network for Radar Sequence Prediction


98. ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis


99. Evaluating the Pre-Consultation Ability of LLMs using Diagnostic Guidelines


100. Investigation into respiratory sound classification for an imbalanced data set using hybrid LSTM-KAN architectures


101. Policy-Guided Search on Tree-of-Thoughts for Efficient Problem Solving with Bounded Language Model Queries


102. ALERT: Zero-shot LLM Jailbreak Detection via Internal Discrepancy Amplification


103. From Chains to Graphs: Self-Structured Reasoning for General-Domain LLMs


104. Can LLMs See Without Pixels? Benchmarking Spatial Intelligence from Textual Descriptions


105. Deontic Knowledge Graphs for Privacy Compliance in Multimodal Disaster Data Sharing


106. A Proposed Paradigm for Imputing Missing Multi-Sensor Data in the Healthcare Domain


107. Evaluating LLMs for Police Decision-Making: A Framework Based on Police Action Scenarios


108. Value-Action Alignment in Large Language Models under Privacy-Prosocial Conflict


109. Layer-Order Inversion: Rethinking Latent Multi-Hop Reasoning in Large Language Models


110. VeRPO: Verifiable Dense Reward Policy Optimization for Code Generation


111. A Reinforcement Learning-Based Model for Mapping and Goal-Directed Navigation Using Multiscale Place Fields



113. Deploy-Master: Automating the Deployment of 50,000+ Agent-Ready Scientific Tools in One Day


114. Bootstrapping Code Translation with Weighted Multilanguage Exploration


115. IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation


116. Reasoning Pattern Alignment Merging for Adaptive Reasoning


117. Beyond Perplexity: A Lightweight Benchmark for Knowledge Retention in Supervised Fine-Tuning


118. SDCD: Structure-Disrupted Contrastive Decoding for Mitigating Hallucinations in Large Vision-Language Models


119. Cyberattack Detection in Virtualized Microgrids Using LightGBM and Knowledge-Distilled Classifiers


120. Submodular Evaluation Subset Selection in Automatic Prompt Optimization


121. CroBIM-U: Uncertainty-Driven Referring Remote Sensing Image Segmentation


122. Efficient Sequential Recommendation for Long Term User Interest Via Personalization


123. Online Decision-Making Under Uncertainty for Vehicle-to-Building Systems


124. SegNSP: Revisiting Next Sentence Prediction for Linear Text Segmentation


125. EpiQAL: Benchmarking Large Language Models in Epidemiological Question Answering for Enhanced Alignment and Reasoning


126. Content vs. Form: What Drives the Writing Score Gap Across Socioeconomic Backgrounds? A Generated Panel Approach


127. FROST-Drive: Scalable and Efficient End-to-End Driving with a Frozen Vision Encoder


128. An Expectation-Maximization Algorithm for Domain Adaptation in Gaussian Causal Models


129. Automated Feedback Generation for Undergraduate Mathematics: Development and Evaluation of an AI Teaching Assistant


130. Microeconomic Foundations of Multi-Agent Learning


131. Soft Contextualized Encoder For User Defined Text Classification


132. Grading Scale Impact on LLM-as-a-Judge: Human-LLM Alignment Is Highest on 0-5 Grading Scale


133. Discriminating real and synthetic super-resolved audio samples using embedding-based classifiers


134. MARVEL: A Multi Agent-based Research Validator and Enabler using Large Language Models


135. The Illusion of Specialization: Unveiling the Domain-Invariant “Standing Committee” in Mixture-of-Experts Models


136. Spectral Archaeology: The Causal Topology of Model Evolution


137. Training-Free Adaptation of New-Generation LLMs using Legacy Clinical Models


138. Jailbreaking LLMs Without Gradients or Priors: Effective and Transferable Attacks


139. Tigrinya Number Verbalization: Rules, Algorithm, and Implementation


140. Eye-Q: A Multilingual Benchmark for Visual Word Puzzle Solving and Image-to-Phrase Reasoning


141. Metaphors are a Source of Cross-Domain Misalignment of Large Reasoning Models


142. MMErroR: A Benchmark for Erroneous Reasoning in Vision-Language Models


143. Attention mechanisms in neural networks


144. Extreme-value forest fire prediction A study of the Loss Function in an Ordinality Scheme


145. Bare-Metal Tensor Virtualization: Overcoming the Memory Wall in Edge-AI Inference on ARM64


146. HEEGNet: Hyperbolic Embeddings for EEG


147. Aligning Findings with Diagnosis: A Self-Consistent Reinforcement Learning Framework for Trustworthy Radiology Reporting


148. Ratio-Variance Regularized Policy Optimization for Efficient LLM Fine-tuning


149. CaricatureGS: Exaggerating 3D Gaussian Splatting Faces With Gaussian Curvature


150. Deep Learning-Based Image Recognition for Soft-Shell Shrimp Classification


151. Why LLMs Aren’t Scientists Yet: Lessons from Four Autonomous Research Attempts


152. VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models


153. Mass Concept Erasure in Diffusion Models with Concept Hierarchy


154. AI-Driven Cybersecurity Threats: A Survey of Emerging Risks and Defensive Strategies


155. CageDroneRF: A Large-Scale RF Benchmark and Toolkit for Drone Perception


156. PC2P: Multi-Agent Path Finding via Personalized-Enhanced Communication and Crowd Perception


157. 130k Lines of Formal Topology in Two Weeks: Simple and Cheap Autoformalization for Everyone?


158. AgentMark: Utility-Preserving Behavioral Watermarking for Agents


159. Lightweight Transformer Architectures for Edge Devices in Real-Time Applications


160. Automated Post-Incident Policy Gap Analysis via Threat-Informed Evidence Mapping using Large Language Models


161. HyperCLOVA X 32B Think


162. Feedback Indices to Evaluate LLM Responses to Rebuttals for Multiple Choice Type Questions


163. AI-Guided Discovery of Novel Ionic Liquid Solvents for Industrial CO2 Capture


164. $α^3$-Bench: A Unified Benchmark of Safety, Robustness, and Efficiency for LLM-Based UAV Agents over 6G Networks


165. A Quantum Model for Constrained Markowitz Modern Portfolio Using Slack Variables to Process Mixed-Binary Optimization under QAOA


166. MixRx: Predicting Drug Combination Interactions with LLMs


167. Topic Segmentation Using Generative Language Models


168. LLM_annotate: A Python package for annotating and analyzing fiction characters


169. GuardEval: A Multi-Perspective Benchmark for Evaluating Safety, Fairness, and Robustness in LLM Moderators


170. Less is more: Not all samples are effective for evaluation


171. Advances and Challenges in Semantic Textual Similarity: A Comprehensive Survey


172. The Instruction Gap: LLMs get lost in Following Instruction


173. OpenAI GPT-5 System Card


174. Benchmarking and Adapting On-Device Large Language Models for Clinical Decision Support


175. Internal Reasoning vs. External Control: A Thermodynamic Analysis of Sycophancy in Large Language Models


176. DeepResearch-Slice: Bridging the Retrieval-Utilization Gap via Explicit Text Slicing