전체 AI 논문 - 2025-10-10

1. How to Teach Large Multimodal Models New Skills


2. Agent Learning via Early Experience


3. FlowSearch: Advancing deep research with dynamic structured knowledge flow


4. CaRT: Teaching LLM Agents to Know When They Know Enough


5. AutoMLGen: Navigating Fine-Grained Optimization for Coding Agents


6. Looking to Learn: Token-wise Dynamic Gating for Low-Resource Vision-Language Modelling


7. Revisiting Hallucination Detection with Effective Rank-based Uncertainty


8. QAgent: A modular Search Agent with Interactive Query Understanding


9. LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings


10. Beyond Pass@k: Breadth-Depth Metrics for Reasoning Boundaries


11. First Try Matters: Revisiting the Role of Reflection in Reasoning Models


12. Symmetry-Aware Fully-Amortized Optimization with Scale Equivariant Graph Metanetworks


13. Co-TAP: Three-Layer Agent Interaction Protocol Technical Report


14. Chain-of-Trigger: An Agentic Backdoor that Paradoxically Enhances Agentic Robustness


15. Selection, Reflection and Self-Refinement: Revisit Reasoning Tasks via a Causal Lens


16. DODO: Causal Structure Learning with Budgeted Interventions


17. The Tournament Tree Method for preference elicitation in Multi-criteria decision-making


18. Measuring What Matters: The AI Pluralism Index


19. R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?


20. Prepared mind, fast response: A temporal decoupling framework for adaptive knowledge orchestration in open-domain dialogue


21. Can Risk-taking AI-Assistants suitably represent entities


22. From Ethical Declarations to Provable Independence: An Ontology-Driven Optimal-Transport Framework for Certifiably Fair AI Systems


23. AutoQual: An LLM Agent for Automated Discovery of Interpretable Features for Review Quality Assessment


24. Multi-Condition Conformal Selection


25. LinguaSim: Interactive Multi-Vehicle Testing Scenario Generation via Natural Language Instruction Based on Large Language Models


26. AILoRA: Function-Aware Asymmetric Initialization for Low-Rank Adaptation of Large Language Models


27. PEAR: Phase Entropy Aware Reward for Efficient Reasoning


28. Language Models Do Not Embed Numbers Continuously


29. ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation


30. VoiceAgentBench: Are Voice Assistants ready for agentic tasks?


31. TaoSR-SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance


32. Agent-Based Genetic Algorithm for Crypto Trading Strategy Optimization


33. Enabling Personalized Long-term Interactions in LLM-based Agents through Persistent Memory and User Profiles


34. Profit Mirage: Revisiting Information Leakage in LLM-based Financial Agents


35. Towards Meaningful Transparency in Civic AI Systems


36. Understanding DeepResearch via Reports


37. Augur: Modeling Covariate Causal Associations in Time Series via Large Language Models


38. FinMR: A Knowledge-Intensive Multimodal Benchmark for Advanced Financial Reasoning


39. An LLM-Powered Cooperative Framework for Large-Scale Multi-Vehicle Navigation


40. Strategic Communication under Threat: Learning Information Trade-offs in Pursuit-Evasion Games


41. GCPO: When Contrast Fails, Go Gold


42. An approach for systematic decomposition of complex llm tasks


43. From Noisy to Native: LLM-driven Graph Restoration for Test-Time Graph Domain Adaptation


44. Haibu Mathematical-Medical Intelligent Agent:Enhancing Large Language Model Reliability in Medical Tasks via Verifiable Reasoning Chains


45. SurveyG: A Multi-Agent LLM Framework with Hierarchical Citation Graph for Automated Survey Generation


46. oMeBench: Towards Robust Benchmarking of LLMs in Organic Mechanism Elucidation and Reasoning


47. Control Synthesis of Cyber-Physical Systems for Real-Time Specifications through Causation-Guided Reinforcement Learning


48. Multimodal Safety Evaluation in Generative Agent Social Simulations


49. Safely Exploring Novel Actions in Recommender Systems via Deployment-Efficient Policy Learning


50. Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models


51. A Case for Leveraging Generative AI to Expand and Enhance Training in the Provision of Mental Health Services


52. Traceability and Accountability in Role-Specialized Multi-Agent LLM Pipelines


53. AgentAsk: Multi-Agent Systems Need to Ask


54. Benchmarking is Broken - Don’t Let AI be its Own Judge


55. An Evaluation Study of Hybrid Methods for Multilingual PII Detection


56. Measuring and Mitigating Identity Bias in Multi-Agent Debate via Anonymization



58. Optimizing Ethical Risk Reduction for Medical Intelligent Systems with Constraint Programming


59. Evaluation of LLMs for Process Model Analysis and Optimization


60. ExpertAgent: Enhancing Personalized Education through Dynamic Planning and Retrieval-Augmented Long-Chain Reasoning


61. TS-Agent: A Time Series Reasoning Agent with Iterative Statistical Insight Gathering


62. Less is More: Strategic Expert Selection Outperforms Ensemble Complexity in Traffic Forecasting


63. ProSEA: Problem Solving via Exploration Agents


64. Position: AI Will Transform Neuropsychology Through Mental Health Digital Twins for Dynamic Mental Health Care, Especially for ADHD


65. Base Models Know How to Reason, Thinking Models Learn When


66. L2M-AID: Autonomous Cyber-Physical Defense by Fusing Semantic Reasoning of Large Language Models with Multi-Agent Reinforcement Learning (Preprint)


67. Truth-Aware Decoding: A Program-Logic Approach to Factual Language Generation


68. BLAZER: Bootstrapping LLM-based Manipulation Agents with Zero-Shot Data Generation


69. ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation


70. NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos


71. MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning


72. SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models


73. Dream to Recall: Imagination-Guided Experience Retrieval for Memory-Persistent Vision-and-Language Navigation


74. VideoNorms: Benchmarking Cultural Awareness of Video Language Models


75. On the optimization dynamics of RLVR: Gradient gap and step size thresholds


76. Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing


77. SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models


78. CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards


79. To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models


80. AI-Driven Radiology Report Generation for Traumatic Brain Injuries


81. DeepPrune: Parallel Scaling without Inter-trace Redundancy


82. Platform-Agnostic Modular Architecture for Quantum Benchmarking


83. Integral Signatures of Activation Functions: A 9-Dimensional Taxonomy and Stability Theory for Deep Learning


84. gLSTM: Mitigating Over-Squashing by Increasing Storage Capacity


85. Synthetic Series-Symbol Data Generation for Time Series Foundation Models


86. Gaze on the Prize: Shaping Visual Attention with Return-Guided Contrastive Learning


87. xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning


88. ClauseLens: Clause-Grounded, CVaR-Constrained Reinforcement Learning for Trustworthy Reinsurance Pricing


89. Prompts Generalize with Low Data: Non-vacuous Generalization Bounds for Optimizing Prompts with More Informative Priors


90. Single layer tiny Co$^4$ outpaces GPT-2 and GPT-BERT


91. FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts


92. Detecting Legend Items on Historical Maps Using GPT-4o with In-Context Learning


93. Airy: Reading Robot Intent through Height and Sky


94. Evaluating Small Vision-Language Models on Distance-Dependent Traffic Perception


95. DeepEN: Personalized Enteral Nutrition for Critically Ill Patients using Deep Reinforcement Learning


96. Learning What’s Missing: Attention Dispersion and EMA Stabilization in Length Generalization


97. Iterated Agent for Symbolic Regression


98. Counterfactual Identifiability via Dynamic Optimal Transport


99. Learning Neural Exposure Fields for View Synthesis


100. A Distributed Emulation Environment for In-Memory Computing Systems


101. Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization


102. Opponent Shaping in LLM Agents


103. Contrastive Decoding for Synthetic Data Generation in Low-Resource Language Modeling


104. The Hidden Bias: A Study on Explicit and Implicit Political Stereotypes in Large Language Models


105. Expressive Value Learning for Scalable Offline Reinforcement Learning


106. FuelCast: Benchmarking Tabular and Temporal Models for Ship Fuel Consumption


107. LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions


108. Memory Retrieval and Consolidation in Large Language Models through Function Tokens


109. Sentiment Matters: An Analysis of 200 Human-SAV Interactions


110. Robust Canonicalization through Bootstrapped Data Re-Alignment


111. Leveraging Whisper Embeddings for Audio-based Lyrics Matching


112. NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions


113. Quantum Agents for Algorithmic Discovery


114. DACIP-RC: Domain Adaptive Continual Instruction Pre-Training via Reading Comprehension on Business Conversations


115. AI Knowledge Assist: An Automated Approach for the Creation of Knowledge Bases for Conversational AI Agents


116. Think Just Enough: Sequence-Level Entropy as a Confidence Signal for LLM Reasoning


117. Improving Temporal Understanding Logic Consistency in Video-Language Models via Attention Enhancement


118. Approximate Domain Unlearning for Vision-Language Models


119. Interpreting LLM-as-a-Judge Policies via Verifiable Global Explanations


120. Bayesian Decision Making around Experts


121. VersionRAG: Version-Aware Retrieval-Augmented Generation for Evolving Documents


122. Development of Mental Models in Human-AI Collaboration: A Conceptual Framework


123. Lossless Vocabulary Reduction for Auto-Regressive Language Models


124. The Price of Thought: A Multilingual Analysis of Reasoning, Performance, and Cost of Negotiation in Large Language Models


125. Everything is Plausible: Investigating the Impact of LLM Rationales on Human Notions of Plausibility


126. A Novel Ensemble Learning Approach for Enhanced IoT Attack Detection: Redefining Security Paradigms in Connected Systems


127. An Adaptive Multi Agent Bitcoin Trading System


128. Attribution-by-design: Ensuring Inference-Time Provenance in Generative Music Systems


129. FedDTRE: Federated Dialogue Generation Models Powered by Trustworthiness Evaluation


130. A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models


131. TaoSR-AGRL: Adaptive Guided Reinforcement Learning Framework for E-commerce Search Relevance


132. Verifying Graph Neural Networks with Readout is Intractable


133. Towards Reliable LLM-based Robot Planning via Combined Uncertainty Estimation


134. MRI-derived quantification of hepatic vessel-to-volume ratios in chronic liver disease using a deep learning approach


135. FastUMI-100K: Advancing Data-driven Robotic Manipulation with a Large-scale UMI-style Dataset


136. Backdoor Vectors: a Task Arithmetic View on Backdoor Attacks and Defenses


137. Past, Present, and Future of Bug Tracking in the Generative AI Era


138. Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks


139. Leveraging Author-Specific Context for Scientific Figure Caption Generation: 3rd SciCap Challenge


140. Fewer Weights, More Problems: A Practical Attack on LLM Pruning


141. Is Architectural Complexity Always the Answer? A Case Study on SwinIR vs. an Efficient CNN


142. ZeroCard: Cardinality Estimation with Zero Dependence on Target Databases – No Data, No Query, No Retraining


143. Unveiling the Power of Multiple Gossip Steps: A Stability-Based Generalization Analysis in Decentralized Training



145. Active Confusion Expression in Large Language Models: Leveraging World Models toward Better Social Reasoning


146. LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?


147. A Systematic Evaluation of Self-Supervised Learning for Label-Efficient Sleep Staging with Wearable EEG


148. DISCO: Diversifying Sample Condensation for Efficient Model Evaluation


149. A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning


150. A Large-scale Dataset for Robust Complex Anime Scene Text Detection


151. TTOM: Test-Time Optimization and Memorization for Compositional Video Generation


152. STEPER: Step-wise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models


153. Towards Human-Like Grading: A Unified LLM-Enhanced Framework for Subjective Question Evaluation


154. MMM: Quantum-Chemical Molecular Representation Learning for Combinatorial Drug Recommendation


155. Contrastive Weak-to-strong Generalization


156. Team Xiaomi EV-AD VLA: Learning to Navigate Socially Through Proactive Risk Perception - Technical Report for IROS 2025 RoboSense Challenge Social Navigation Track


157. DM1: MeanFlow with Dispersive Regularization for 1-Step Robotic Manipulation


158. Self-Supervised Learning Strategies for a Platform to Test the Toxicity of New Chemicals and Materials


159. Meta-Learning Based Few-Shot Graph-Level Anomaly Detection


160. AdaSwitch: Adaptive Switching Generation for Knowledge Distillation


161. Self-Improving LLM Agents at Test-Time


162. MetaDefense: Defending Finetuning-based Jailbreak Attack Before and During Generation


163. The Rise of the Knowledge Sculptor: A New Archetype for Knowledge Work in the Age of Generative AI


164. SIMU: Selective Influence Machine Unlearning


165. Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents


166. Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models


167. HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation


168. LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology


169. IntentionVLA: Generalizable and Efficient Embodied Intention Reasoning for Human-Robot Interaction


170. Drift No More? Context Equilibria in Multi-Turn LLM Interactions


171. Trajectory Conditioned Cross-embodiment Skill Transfer


172. ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning


173. A Unified Multi-Task Learning Framework for Generative Auto-Bidding with Validation-Aligned Optimization


174. Parallel Test-Time Scaling for Latent Reasoning Models


175. UltraLED: Learning to See Everything in Ultra-High Dynamic Range Scenes


176. AppForge: From Assistant to Independent Developer - Are GPTs Ready for Software Development?


177. MeSH: Memory-as-State-Highways for Recursive Transformers


178. DEAS: DEtached value learning with Action Sequence for Scalable Offline RL


179. Causality Guided Representation Learning for Cross-Style Hate Speech Detection


180. Rethinking Reasoning: A Survey on Reasoning-based Backdoors in LLMs


181. Stress-Testing Model Specs Reveals Character Differences among Language Models


182. Curriculum Learning with Synthetic Data for Enhanced Pulmonary Nodule Detection in Chest Radiographs


183. Controllable Video Synthesis via Variational Inference


184. TCIP: Threshold-Controlled Iterative Pyramid Network for Deformable Medical Image Registration


185. IKNet: Interpretable Stock Price Prediction via Keyword-Guided Integration of News and Technical Indicators


186. OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference


187. Value Flows


188. Banking Done Right: Redefining Retail Banking with Language-Centric AI


189. Retentive Relevance: Capturing Long-Term User Value in Recommendation Systems


190. DGTEN: A Robust Deep Gaussian based Graph Neural Network for Dynamic Trust Evaluation with Uncertainty-Quantification Support


191. Vocabulary embeddings organize linguistic structure early in language model training


192. TGM: a Modular and Efficient Library for Machine Learning on Temporal Graphs



194. Accuracy, Memory Efficiency and Generalization: A Comparative Study on Liquid Neural Networks and Recurrent Neural Networks


195. Multi-Task Pre-Finetuning of Lightweight Transformer Encoders for Text Classification and NER


196. Investigating Thematic Patterns and User Preferences in LLM Interactions using BERTopic


197. Label Semantics for Robust Hyperspectral Image Classification


198. TRAVL: A Recipe for Making Video-Language Models Better Judges of Physics Implausibility


199. OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs


200. EEG Sleep Stage Classification with Continuous Wavelet Transform and Deep Learning


201. MLLM4TS: Leveraging Vision and Multimodal Language Models for General Time-Series Analysis


202. When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs


203. Can Speech LLMs Think while Listening?


204. A Denoising Framework for Real-World Ultra-Low Dose Lung CT Images Based on an Image Purification Strategy


205. Can Lessons From Human Teams Be Applied to Multi-Agent Systems? The Role of Structure, Diversity, and Interaction Dynamics


206. HEMERA: A Human-Explainable Transformer Model for Estimating Lung Cancer Risk using GWAS Data


207. MoGU: Mixture-of-Gaussians with Uncertainty-based Gating for Time Series Forecasting


208. Minimizing the Value-at-Risk of Loan Portfolio via Deep Neural Networks


209. LASER: An LLM-based ASR Scoring and Evaluation Rubric


210. Haystack Engineering: Context Engineering for Heterogeneous and Agentic Long-Context Evaluation


211. Quantum Grid Path Planning Using Parallel QAOA Circuits Based on Minimum Energy Principle


212. Attention to Order: Transformers Discover Phase Transitions via Learnability


213. Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts


214. Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model


215. Local MAP Sampling for Diffusion Models


216. MultiFair: Multimodal Balanced Fairness-Aware Medical Classification with Dual-Level Gradient Modulation


217. Deep Learning Based Approach to Enhanced Recognition of Emotions and Behavioral Patterns of Autistic Children


218. DUA-D2C: Dynamic Uncertainty Aware Method for Overfitting Remediation in Deep Learning