전체 AI 논문 - 2025-10-23

1. Benchmarking World-Model Learning


2. Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents


3. Misalignment Bounty: Crowdsourcing AI Agent Misbehavior


4. Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning


5. RLIE: Rule Generation with Logistic Regression, Iterative Refinement, and Evaluation for Large Language Models


6. Explainable e-sports win prediction through Machine Learning classification in streaming


7. A Graph Engine for Guitar Chord-Tone Soloing Education


8. AgentSense: LLMs Empower Generalizable and Explainable Web-Based Participatory Urban Sensing


9. HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents in Hierarchical Rule Application


10. DAIL: Beyond Task Ambiguity for Language-Conditioned Reinforcement Learning


11. NeSyPr: Neurosymbolic Proceduralization For Efficient Embodied Reasoning


12. MSC-Bench: A Rigorous Benchmark for Multi-Server Tool Orchestration


13. Continual Knowledge Adaptation for Reinforcement Learning


14. Learning to Make Friends: Coaching LLM Agents toward Emergent Social Ties


15. An Argumentative Explanation Framework for Generalized Reason Model with Inconsistent Precedents


16. ChatGPT Unveils Its Limits: Principles of Law Deliver Checkmate


17. WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation


18. The Zero-Step Thinking: An Empirical Study of Mode Selection as Harder Early Exit in Reasoning Models


19. A Multi-faceted Analysis of Cognitive Abilities: Evaluating Prompt Methods with Large Language Models on the CONSORT Checklist


20. The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMS


21. Rectifying Shortcut Behaviors in Preference-based Reward Learning


22. Timely Clinical Diagnosis through Active Test Selection


23. Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality


24. Semantic World Models


25. Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning


26. Integrating Transparent Models, LLMs, and Practitioner-in-the-Loop: A Case of Nonprofit Program Evaluation


27. On Controlled Change: Generative AI’s Impact on Professional Authority in Journalism


28. AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders


29. SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration


30. A Survey on Cache Methods in Diffusion Models: Toward Efficient Multi-Modal Generation


31. Learning Affordances at Inference-Time for Vision-Language-Action Models


32. Enabling Granular Subgroup Level Model Evaluations by Generating Synthetic Medical Time Series


33. Do Prompts Reshape Representations? An Empirical Study of Prompting Effects on Embeddings


34. Toward Agentic Software Engineering Beyond Code: Framing Vision, Values, and Vocabulary


35. Serverless GPU Architecture for Enterprise HR Analytics: A Production-Scale BDaaS Implementation


36. Are Large Language Models Sensitive to the Motives Behind Communication?


37. Directive, Metacognitive or a Blend of Both? A Comparison of AI-Generated Feedback Types on Student Engagement, Confidence, and Outcomes


38. I Spy With My Model’s Eye: Visual Search as a Behavioural Test for MLLMs


39. Study of Training Dynamics for Memory-Constrained Fine-Tuning


40. Unraveling Emotions with Pre-Trained Models


41. From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction


42. Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent


43. Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1


44. XBench: A Comprehensive Benchmark for Visual-Language Explanations in Chest Radiography


45. A Goal-Driven Survey on Root Cause Analysis


46. Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark


47. Multi-modal Co-learning for Earth Observation: Enhancing single-modality models via modality collaboration


48. A Matter of Time: Revealing the Structure of Time in Vision-Language Models


49. Demonstrating Real Advantage of Machine-Learning-Enhanced Monte Carlo for Combinatorial Optimization


50. Insights into the Unknown: Federated Data Diversity Analysis on Molecular Data


51. Optimizing the Unknown: Black Box Bayesian Optimization with Energy-Based Model and Reinforcement Learning


52. From Prototypes to Sparse ECG Explanations: SHAP-Driven Counterfactuals for Multivariate Time-Series Multi-class Classification


53. Modeling realistic human behavior using generative agents in a multimodal transport system: Software architecture and Application to Toulouse


54. CARES: Context-Aware Resolution Selector for VLMs


55. Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning


56. VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos


57. KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge


58. Graph Unlearning Meets Influence-aware Negative Preference Optimization


59. A Concrete Roadmap towards Safety Cases based on Chain-of-Thought Monitoring


60. HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission


61. Universal Quantitative Abstraction: Categorical Duality and Logical Completeness for Probabilistic Systems


62. Neural Variational Dropout Processes


63. FairNet: Dynamic Fairness Correction without Performance Loss via Contrastive Conditional LoRA


64. Monitoring LLM-based Multi-Agent Systems Against Corruptions via Node Evaluation


65. EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection


66. ToMMeR – Efficient Entity Mention Detection from Large Language Models


67. ColorAgent: Building A Robust, Personalized, and Interactive OS Agent



69. AgenticMath: Enhancing LLM Reasoning via Agentic-based Math Data Generation


70. M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models


71. Learning To Defer To A Population With Limited Demonstrations


72. A New Type of Adversarial Examples


73. Foundation Model Forecasts: Form and Function


74. To Use or to Refuse? Re-Centering Student Agency with Generative AI in Engineering Design Education


75. Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning


76. Metadata Extraction Leveraging Large Language Models


77. Seabed-Net: A multi-task network for joint bathymetry estimation and seabed classification from remote sensing imagery in shallow waters


78. SORA-ATMAS: Adaptive Trust Management and Multi-LLM Aligned Governance for Future Smart Cities


79. Balancing Rewards in Text Summarization: Multi-Objective Reinforcement Learning via HyperVolume Optimization


80. Enabling Reconfiguration-Communication Overlap for Collective Communication in Optical Networks


81. Online Handwritten Signature Verification Based on Temporal-Spatial Graph Attention Transformer


82. Collaborative penetration testing suite for emerging generative AI algorithms


83. Knowledge and Common Knowledge of Strategies


84. Enhancing Early Alzheimer Disease Detection through Big Data and Ensemble Few-Shot Learning


85. Social World Model-Augmented Mechanism Design Policy Learning


86. LAPRAD: LLM-Assisted PRotocol Attack Discovery


87. FnRGNN: Distribution-aware Fairness in Graph Neural Network


88. See, Think, Act: Online Shopper Behavior Simulation with VLM Agents


89. SPOT: Scalable Policy Optimization with Trees for Markov Decision Processes


90. No Intelligence Without Statistics: The Invisible Backbone of Artificial Intelligence


91. An Active Diffusion Neural Network for Graphs


92. Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks


93. PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning


94. Interpretable Question Answering with Knowledge Graphs


95. Imbalanced Gradients in RL Post-Training of Multi-Task LLMs


96. News-Aware Direct Reinforcement Trading for Financial Markets


97. When Facts Change: Probing LLMs on Evolving Knowledge with evolveQA


98. X-Ego: Acquiring Team-Level Tactical Situational Awareness via Cross-Egocentric Contrastive Video Representation Learning


99. InvarGC: Invariant Granger Causality for Heterogeneous Interventional Time Series under Latent Confounding


100. A Cross-Environment and Cross-Embodiment Path Planning Framework via a Conditional Diffusion Model


101. Steering Autoregressive Music Generation with Recursive Feature Machines


102. A Novel Approach to Breast Cancer Segmentation using U-Net Model with Attention Mechanisms and FedProx


103. That’s Deprecated! Understanding, Detecting, and Steering Knowledge Conflicts in Language Models for Code Generation


104. What Makes a Good Curriculum? Disentangling the Effects of Data Ordering on LLM Mathematical Reasoning


105. Local Guidance for Configuration-Based Multi-Agent Pathfinding


106. PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions


107. REPAIR Approach for Social-based City Reconstruction Planning in case of natural disasters


108. “Over-the-Hood” AI Inclusivity Bugs and How 3 AI Product Teams Found and Fixed Them


109. CLiVR: Conversational Learning System in Virtual Reality with AI-Powered Patients


110. FlexiDataGen: An Adaptive LLM Framework for Dynamic Semantic Dataset Generation in Sensitive Domains


111. Prior-informed optimization of treatment recommendation via bandit algorithms trained on large language model-processed historical records


112. Plural Voices, Single Agent: Towards Inclusive AI in Multi-User Domestic Spaces


113. $Δ$t-Mamba3D: A Time-Aware Spatio-Temporal State-Space Model for Breast Cancer Risk Prediction


114. Robust Driving QA through Metadata-Grounded Context and Task-Specific Prompts


115. $\nabla$-SDF: Learning Euclidean Signed Distance Functions Online with Gradient-Augmented Octree Interpolation and Neural Residual


116. ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge


117. NeuroAda: Activating Each Neuron’s Potential for Parameter-Efficient Fine-Tuning


118. StutterZero and StutterFormer: End-to-End Speech Conversion for Stuttering Transcription and Correction


119. A Justice Lens on Fairness and Ethics Courses in Computing Education: LLM-Assisted Multi-Perspective and Thematic Evaluation


120. BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping


121. Application of Reduced-Order Models for Temporal Multiscale Representations in the Prediction of Dynamical Systems


122. Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients


123. Benchmarking On-Device Machine Learning on Apple Silicon with MLX


124. Misinformation Detection using Large Language Models with Explainability


125. MMAO-Bench: MultiModal All in One Benchmark Reveals Compositional Law between Uni-modal and Omni-modal in OmniModels


126. Context-aware Fairness Evaluation and Mitigation in LLMs


127. ADPO: Anchored Direct Preference Optimization


128. Prospects for Using Artificial Intelligence to Understand Intrinsic Kinetics of Heterogeneous Catalytic Reactions


129. Large Connectome Model: An fMRI Foundation Model of Brain Connectomes Empowered by Brain-Environment Interaction in Multitask Learning Landscape


130. Learning from the Best, Differently: A Diversity-Driven Rethinking on Data Selection



132. 3D Optimization for AI Inference Scaling: Balancing Accuracy, Cost, and Latency


133. DuoLens: A Framework for Robust Detection of Machine-Generated Multilingual Text and Code


134. Evaluating LLMs for Career Guidance: Comparative Analysis of Computing Competency Recommendations Across Ten African Countries


135. AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And Simulators


136. CosmoCore Affective Dream-Replay Reinforcement Learning for Code Generation


137. CodeCRDT: Observation-Driven Coordination for Multi-Agent LLM Code Generation


138. Small Language Models Offer Significant Potential for Science Community


139. Contextual Augmentation for Entity Linking using Large Language Models


140. LLM Bazaar: A Service Design for Supporting Collaborative Learning with an LLM-Powered Multi-Party Collaboration Infrastructure


141. What is Implementation Science; and Why It Matters for Bridging the Artificial Intelligence Innovation-to-Application Gap in Medical Imaging


142. A Unified Formal Theory on the Logical Limits of Symbol Grounding