전체 AI 논문 - 2025-10-17

1. Agentic Design of Compositional Machines


2. GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling for Step-Level Reasoning


3. Stable but Miscalibrated: A Kantian View on Overconfidence from Filters to Large Language Models


4. TRI-DEP: A Trimodal Comparative Study for Depression Detection Using Speech, Text, and EEG


5. Budget-aware Test-time Scaling via Discriminative Verification


6. Mapping Smarter, Not Harder: A Test-Time Reinforcement Learning Agent That Improves Without Labels or Model Updates


7. The Gatekeeper Knows Enough


8. LabOS: The AI-XR Co-Scientist That Sees and Works With Humans


9. Where to Search: Measure the Prior-Structured Search Space of LLM Agents


10. Boosting Instruction Following at Scale


11. RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning


12. Agentic NL2SQL to Reduce Computational Costs


13. SimKO: Simple Pass@K Policy Optimization


14. ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling


15. Cognitive-Aligned Spatio-Temporal Large Language Models For Next Point-of-Interest Prediction


16. Purifying Task Vectors in Knowledge-Aware Subspace for Model Merging


17. Practical, Utilitarian Algorithm Configuration


18. NAEL: Non-Anthropocentric Ethical Logic


19. TITAN: Graph-Executable Reasoning for Cyber Threat Intelligence


20. Machine Learning and Public Health: Identifying and Mitigating Algorithmic Bias through a Systematic Review


21. Beyond Hallucinations: The Illusion of Understanding in Large Language Models


22. ColorBench: Benchmarking Mobile Agents with Graph-Structured Framework for Complex Long-Horizon Tasks


23. LLM Agents Beyond Utility: An Open-Ended Perspective


24. Symbol Grounding in Neuro-Symbolic AI: A Gentle Introduction to Reasoning Shortcuts


25. JSPLIT: A Taxonomy-based Solution for Prompt Bloating in Model Context Protocol


26. Helmsman: Autonomous Synthesis of Federated Learning Systems via Multi-Agent Collaboration


27. Eliminating Negative Occurrences of Derived Predicates from PDDL Axioms


28. IMAGINE: Integrating Multi-Agent System into One Model for Complex Reasoning and Planning


29. Hi-Agent: Hierarchical Vision-Language Agents for Mobile Device Control


30. Can MLLMs Absorb Math Reasoning Abilities from LLMs as Free Lunch?


31. AI for Service: Proactive Assistance with AI Glasses


32. Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction


33. Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies


34. A Guardrail for Safety Preservation: When Safety-Sensitive Subspace Meets Harmful-Resistant Null-Space


35. MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning


36. Towards Agentic Self-Learning LLMs in Search Environment


37. LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild


38. Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks


39. Implementation of AI in Precision Medicine


40. ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning


41. JEDA: Query-Free Clinical Order Search from Ambient Dialogues


42. Combining Reinforcement Learning and Behavior Trees for NPCs in Video Games with AMD Schola


43. CodeEvolve: An open source evolutionary coding agent for algorithm discovery and optimization


44. A Multimodal Approach to Heritage Preservation in the Context of Climate Change


45. Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems


46. STEMS: Spatial-Temporal Enhanced Safe Multi-Agent Coordination for Building Energy Management


47. Generating Fair Consensus Statements with Social Choice on Token-Level MDPs


48. Position: Require Frontier AI Labs To Release Small “Analog” Models


49. GammaZero: Learning To Guide POMDP Belief Space Search With Graph Representations


50. Do Large Language Models Show Biases in Causal Learning? Insights from Contingency Judgment


51. Do Slides Help? Multi-modal Context for Automatic Transcription of Conference Talks


52. Decision Oriented Technique (DOTechnique): Finding Model Validity Through Decision-Maker Context


53. Coupled Diffusion Sampling for Training-Free Multi-View Image Editing


54. From Pixels to Words – Towards Native Vision-Language Primitives at Scale


55. Terra: Explorable Native 3D World Model with Point Latents


56. WithAnyone: Towards Controllable and ID Consistent Image Generation


57. pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation


58. Attention Is All You Need for KV Cache in Diffusion LLMs


59. TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar


60. LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training


61. RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks


62. Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents


63. C4D: 4D Made from 3D through Dual Correspondences


64. CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions


65. RealDPO: Real or Not Real, that is the Preference


66. Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion


67. MetaBench: A Multi-task Benchmark for Assessing LLMs in Metabolomics


68. LaSeR: Reinforcement Learning with Last-Token Self-Rewarding


69. Circuit Insights: Towards Interpretability Beyond Activations


70. Predicting Task Performance with Context-aware Scaling Laws


71. MaskCaptioner : Learning to Jointly Segment and Caption Object Trajectories in Videos


72. Reasoning with Sampling: Your Base Model is Smarter Than You Think


73. Detecting Early and Implicit Suicidal Ideation via Longitudinal and Information Environment Signals on Social Media


74. Learning When Not to Learn: Risk-Sensitive Abstention in Bandits with Unbounded Rewards


75. Predicting kernel regression learning curves from only raw data statistics


76. Benchmarking Multimodal Large Language Models for Face Recognition


77. RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning


78. Scaling Artificial Intelligence for Multi-Tumor Early Detection with More Reports, Fewer Masks


79. Morphology-Aware Prognostic model for Five-Year Survival Prediction in Colorectal Cancer from H&E Whole Slide Images


80. Cross-Scenario Unified Modeling of User Interests at Billion Scale


81. Finding Answers in Thought Matters: Revisiting Evaluation on Large Language Models with Reasoning


82. Inpainting the Red Planet: Diffusion Models for the Reconstruction of Martian Environments in Virtual Reality


83. COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes


84. Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries


85. DEXTER: Diffusion-Guided EXplanations with TExtual Reasoning for Vision Models


86. Seesaw: Accelerating Training by Balancing Learning Rate and Batch Size Scheduling


87. Camera Movement Classification in Historical Footage: A Comparative Study of Deep Video Models


88. Where are the Whales: A Human-in-the-loop Detection Method for Identifying Whales in High-resolution Satellite Imagery


89. FedPPA: Progressive Parameter Alignment for Personalized Federated Learning


90. xLLM Technical Report


91. When Planners Meet Reality: How Learned, Reactive Traffic Agents Shift nuPlan Benchmarks


92. An Efficient Rubric-based Generative Verifier for Search-Augmented LLMs


93. Galaxy Morphology Classification with Counterfactual Explanation


94. In-Context Learning with Unpaired Clips for Instruction-based Video Editing


95. The Bidding Games: Reinforcement Learning for MEV Extraction on Polygon Blockchain


96. Causality Enhancement for Cross-Domain Recommendation


97. RLAIF-SPA: Optimizing LLM-based Emotional Speech Synthesis via RLAIF


98. GemiRec: Interest Quantization and Generation for Multi-Interest Recommendation


99. LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching


100. Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models


101. Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures


102. An Active Inference Model of Mouse Point-and-Click Behaviour


103. Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering


104. Just-In-Time Objectives: A General Approach for Specialized AI Interactions


105. STANCE: Motion Coherent Video Generation Via Sparse-to-Dense Anchored Encoding


106. Local Causal Discovery for Statistically Efficient Causal Inference


107. Selective Labeling with False Discovery Rate Control


108. Agentic Entropy-Balanced Policy Optimization


109. Real-Time Surgical Instrument Defect Detection via Non-Destructive Testing


110. State Your Intention to Steer Your Attention: An AI Assistant for Intentional Digital Living


111. E2Edev: Benchmarking Large Language Models in End-to-End Software Development Task


112. From Guess2Graph: When and How Can Unreliable Experts Safely Boost Causal Discovery in Finite Samples?


113. Semantic representations emerge in biologically inspired ensembles of cross-supervising neural networks


114. Stealthy Dual-Trigger Backdoors: Attacking Prompt Tuning in LM-Empowered Graph Foundation Models


115. LiRA: Linguistic Robust Anchoring for Cross-lingual Large Language Models


116. Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning


117. Towards Adaptable Humanoid Control via Adaptive Motion Tracking


118. Feature Selection and Regularization in Multi-Class Classification: An Empirical Study of One-vs-Rest Logistic Regression with Gradient Descent Optimization and L1 Sparsity Constraints


119. A Free Lunch in LLM Compression: Revisiting Retraining after Pruning


120. Big Data Approaches to Bovine Bioacoustics: A FAIR-Compliant Dataset and Scalable ML Framework for Precision Livestock Welfare


121. Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following


122. The Role of Social Learning and Collective Norm Formation in Fostering Cooperation in LLM Multi-Agent Systems


123. MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering


124. FairBatching: Fairness-Aware Batch Formation for LLM Inference


125. Beat Detection as Object Detection


126. Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers


127. From Binary to Bilingual: How the National Weather Service is Using Artificial Intelligence to Develop a Comprehensive Translation Program


128. SUM-AgriVLN: Spatial Understanding Memory for Agricultural Vision-and-Language Navigation


129. CURE: Confidence-driven Unified Reasoning Ensemble Framework for Medical Question Answering


130. Beyond One World: Benchmarking Super Heros in Role-Playing Across Multiversal Contexts


131. BinCtx: Multi-Modal Representation Learning for Robust Android App Behavior Detection


132. A Density-Informed Multimodal Artificial Intelligence Framework for Improving Breast Cancer Detection Across All Breast Densities


133. Stop-RAG: Value-Based Retrieval Control for Iterative RAG


134. A Robust Classification Method using Hybrid Word Embedding for Early Diagnosis of Alzheimer’s Disease


135. Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL


136. Column Generation Using Domain-Independent Dynamic Programming


137. MERLIN: A Testbed for Multilingual Multimodal Entity Recognition and Linking


138. Watermarking for Factuality: Guiding Vision-Language Models Toward Truth via Tri-layer Contrastive Decoding


139. Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning


140. TED++: Submanifold-Aware Backdoor Detection via Layerwise Tubular-Neighbourhood Screening


141. Learning Human-Humanoid Coordination for Collaborative Object Carrying


142. Beyond a Single Perspective: Towards a Realistic Evaluation of Website Fingerprinting Attacks


143. PRISM: Agentic Retrieval with LLMs for Multi-Hop Question Answering


144. Less is More: Denoising Knowledge Graphs For Retrieval Augmented Generation


145. CAST: Compositional Analysis via Spectral Tracking for Understanding Transformer Layer Functions


146. Do Joint Language-Audio Embeddings Encode Perceptual Timbre Semantics?


147. Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation


148. Reinforcement Learning for Unsupervised Domain Adaptation in Spatio-Temporal Echocardiography Segmentation


149. Spatial Computing Communications for Multi-User Virtual Reality in Distributed Mobile Edge Computing Network


150. Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models


151. Large Scale Retrieval for the LinkedIn Feed using Causal Language Models


152. LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning


153. DPRF: A Generalizable Dynamic Persona Refinement Framework for Optimizing Behavior Alignment Between Personalized LLM Role-Playing Agents and Humans


154. MAFA: A Multi-Agent Framework for Enterprise-Scale Annotation with Configurable Task Adaptation


155. Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures


156. Towards Reversible Model Merging For Low-rank Weights


157. FinAI Data Assistant: LLM-based Financial Database Query Processing with the OpenAI Function Calling API


158. Inferred global dense residue transition graphs from primary structure sequences enable protein interaction prediction via directed graph convolutional neural networks


159. Toward Cybersecurity-Expert Small Language Models


160. Extracting latent representations from X-ray spectra. Classification, regression, and accretion signatures of Chandra sources


161. Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning


162. Every Language Model Has a Forgery-Resistant Signature


163. DiffOPF: Diffusion Solver for Optimal Power Flow


164. Exploratory Causal Inference in SAEnce


165. On the expressivity of sparse maxout networks


166. Optical Computation-in-Communication enables low-latency, high-fidelity perception in telesurgery


167. Cyber-Resilient System Identification for Power Grid through Bayesian Integration


168. One Bug, Hundreds Behind: LLMs for Large-Scale Bug Discovery


169. Think Globally, Group Locally: Evaluating LLMs Using Multi-Lingual Word Grouping Games


170. Context-Selective State Space Models: Feedback is All You Need


171. Conditional Clifford-Steerable CNNs with Complete Kernel Basis for PDE Modeling


172. REAP the Experts: Why Pruning Prevails for One-Shot MoE compression


173. Finding Holes: Pathologist Level Performance Using AI for Cribriform Morphology Detection in Prostate Cancer


174. Efficient Few-Shot Learning in Remote Sensing: Fusing Vision and Vision-Language Models


175. Static Sandboxes Are Inadequate: Modeling Societal Complexity Requires Open-Ended Co-Evolution in LLM-Based Multi-Agent Simulations


176. Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention


177. Big Reasoning with Small Models: Instruction Retrieval at Inference Time


178. LLMs Can Get “Brain Rot”!


179. Readability $\ne$ Learnability: Rethinking the Role of Simplicity in Training Small Language Models


180. Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms


181. AI Debaters are More Persuasive when Arguing in Alignment with Their Own Beliefs


182. Knowledge Reasoning Language Model: Unifying Knowledge and Language for Inductive Knowledge Graph Reasoning


183. Schema for In-Context Learning


184. Benefits and Limitations of Communication in Multi-Agent Reasoning


185. Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences


186. Dual-attention ResNet outperforms transformers in HER2 prediction on DCE-MRI


187. GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents


188. Bayes or Heisenberg: Who(se) Rules?


189. Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection


190. K-frames: Scene-Driven Any-k Keyframe Selection for long video understanding


191. A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness


192. Reliable Fine-Grained Evaluation of Natural Language Math Proofs


193. Incomplete Multi-view Clustering via Hierarchical Semantic Alignment and Cooperative Completion


194. Physics-Informed autoencoder for DSC-MRI Perfusion post-processing: application to glioma grading


195. Order from Chaos: Comparative Study of Ten Leading LLMs on Unstructured Data Categorization


196. PAGE: Prompt Augmentation for text Generation Enhancement


197. Catch Your Breath: Adaptive Computation for Self-Paced Sequence Production


198. What Layers When: Learning to Skip Compute in LLMs with Residual Gates


199. FRACCO: A gold-standard annotated corpus of oncological entities with ICD-O-3.1 normalisation


200. Joint Discriminative-Generative Modeling via Dual Adversarial Training


201. Unlocking the Potential of Diffusion Language Models through Template Infilling


202. CoLoR-GAN: Continual Few-Shot Learning with Low-Rank Adaptation in Generative Adversarial Networks


203. FFT-Accelerated Auxiliary Variable MCMC for Fermionic Lattice Models: A Determinant-Free Approach with $O(N\log N)$ Complexity


204. Deep Edge Filter: Return of the Human-Crafted Layer in Deep Learning


205. Self-Training with Dynamic Weighting for Robust Gradual Domain Adaptation


206. Ensembling Large Language Models to Characterize Affective Dynamics in Student-AI Tutor Dialogues


207. ShishuLM: Lightweight Language Model with Hybrid Decoder-MLP Architecture and Paired Weight Sharing


208. Benchmarking Correctness and Security in Multi-Turn Code Generation


209. From Craft to Constitution: A Governance-First Paradigm for Principled Agent Engineering


210. Multimodal Retrieval-Augmented Generation with Large Language Models for Medical VQA


211. Harnessing Consistency for Robust Test-Time LLM Ensemble


212. BenchPress: A Human-in-the-Loop Annotation System for Rapid Text-to-SQL Benchmark Curation


213. ConsistencyAI: A Benchmark to Assess LLMs’ Factual Consistency When Responding to Different Demographic Groups


214. Revisiting the UID Hypothesis in LLM Reasoning Traces


215. On-device System of Compositional Multi-tasking in Large Language Models


216. DynaSpec: Context-aware Dynamic Speculative Sampling for Large-Vocabulary Language Models


217. Information flow in multilayer perceptrons: an in-depth analysis


218. Serialized EHR make for good text representations


219. ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking


220. Meronymic Ontology Extraction via Large Language Models


221. Seeing Hate Differently: Hate Subspace Modeling for Culture-Aware Hate Speech Detection


222. SIMBA UQ: Similarity-Based Aggregation for Uncertainty Quantification in Large Language Models


223. ConDABench: Interactive Evaluation of Language Models for Data Analysis


224. Entropy Meets Importance: A Unified Head Importance-Entropy Score for Stable and Efficient Transformer Pruning


225. Informed Routing in LLMs: Smarter Token-Level Computation for Faster Inference


226. Users as Annotators: LLM Preference Learning from Comparison Mode


227. A Linguistics-Aware LLM Watermarking via Syntactic Predictability


228. Bridging the Semantic Gap: Contrastive Rewards for Multilingual Text-to-SQL


229. Towards Neurocognitive-Inspired Intelligence: From AI’s Structural Mimicry to Human-Like Functional Cognition


230. A2AS: Agentic AI Runtime Security and Self-Defense


231. Leveraging Wireless Sensor Networks for Real-Time Monitoring and Control of Industrial Environments


232. GQVis: A Dataset of Genomics Data Questions and Visualizations for Generative AI


233. Reversing the Lens: Using Explainable AI to Understand Human Expertise


234. Generative AI in Heritage Practice: Improving the Accessibility of Heritage Guidance