전체 AI 논문 - 2025-10-31

1. LLMs Process Lists With General Filter Heads


2. The Oversight Game: Learning to Cooperatively Balance an AI Agent’s Safety and Autonomy


3. Cross-Platform Evaluation of Reasoning Capabilities in Foundation Models


4. Unveiling Intrinsic Text Bias in Multimodal Large Language Models through Attention Key-Space Analysis


5. Delegated Authorization for Agents Constrained to Semantic Task-to-Scope Matching


6. The Era of Agentic Organization: Learning to Organize with Language Models


7. Normative Reasoning in Large Language Models: A Comparative Benchmark from Logical and Modal Perspectives


8. Agentic AI Home Energy Management System: A Large Language Model Framework for Residential Load Scheduling


9. EdgeRunner 20B: Military Task Parity with GPT-5 while Running on the Edge


10. Human-AI Complementarity: A Goal for Amplified Oversight


11. Context Engineering 2.0: The Context of Context Engineering



13. Who Has The Final Say? Conformity Dynamics in ChatGPT’s Selections


14. Chain-of-Thought Hijacking


15. MedSAE: Dissecting MedCLIP Representations with Sparse Autoencoders


16. Autograder+: A Multi-Faceted AI Framework for Rich Pedagogical Feedback in Programming Education


17. A Pragmatic View of AI Personhood


18. Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings


19. AI Mathematician as a Partner in Advancing Mathematical Discovery - A Case Study in Homogenization Theory


20. BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning


21. Discovering State Equivalences in UCT Search Trees By Action Pruning


22. GraphCompliance: Aligning Policy and Context Graphs for LLM-Based Regulatory Compliance


23. Graph-Enhanced Policy Optimization in LLM Agent Training


24. Retrieval Augmented Generation-Enhanced Distributed LLM Agents for Generalizable Traffic Signal Control with Emergency Vehicles


25. Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses


26. One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning


27. The FM Agent


28. Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math


29. Beyond Benchmarks: The Economics of AI Inference


30. GUI Knowledge Bench: Revealing the Knowledge Gap Behind VLM Failures in GUI Tasks


31. Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4


32. Can AI be Accountable?


33. Large Language Model-assisted Autonomous Vehicle Recovery from Immobilization


34. AutoSurvey2: Empowering Researchers with Next Level Automated Literature Surveys


35. From Queries to Insights: Agentic LLM Pipelines for Spatio-Temporal Text-to-SQL


36. Estimating cognitive biases with attention-aware inverse planning


37. Humains-Junior: A 3.8B Language Model Achieving GPT-4o-Level Factual Accuracy by Directed Exoskeleton Reasoning


38. FinOps Agent – A Use-Case for IT Infrastructure and Cost Optimization


39. SciTrust 2.0: A Comprehensive Framework for Evaluating Trustworthiness of Large Language Models in Scientific Applications


40. Approximating Human Preferences Using a Multi-Judge Learned System


41. The Information-Theoretic Imperative: Compression and the Epistemic Foundations of Intelligence


42. Through the Judge’s Eyes: Inferred Thinking Traces Improve Reliability of LLM Raters


43. Symbolically Scaffolded Play: Designing Role-Sensitive Prompts for Generative NPC Dialogue


44. An Agentic Framework for Rapid Deployment of Edge AI Solutions in Industry 5.0


45. Towards Piece-by-Piece Explanations for Chess Positions with SHAP


46. Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark


47. Gistify! Codebase-Level Understanding via Runtime Execution


48. Defeating the Training-Inference Mismatch via FP16


49. Remote Labor Index: Measuring AI Automation of Remote Work


50. Clone Deterministic 3D Worlds with Geometrically-Regularized World Models


51. Faithful and Fast Influence Function via Advanced Sampling


52. STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization


53. AMO-Bench: Large Language Models Still Struggle in High School Math Competitions


54. Deep sequence models tend to memorize geometrically; it is unclear why


55. A General Incentives-Based Framework for Fairness in Multi-agent Resource Allocation


56. ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference


57. Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off


58. On the limitation of evaluating machine unlearning using only a single training seed


59. The End of Manual Decoding: Towards Truly End-to-End Language Models


60. Process Integrated Computer Vision for Real-Time Failure Prediction in Steel Rolling Mill


61. Evontree: Ontology Rule-Guided Self-Evolution of Large Language Models


62. Hybrid DQN-TD3 Reinforcement Learning for Autonomous Navigation in Dynamic Environments


63. Aeolus: A Multi-structural Flight Delay Dataset


64. ResMatching: Noise-Resilient Computational Super-Resolution via Guided Conditional Flow Matching


65. Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems


66. InfoFlow: Reinforcing Search Agent Via Reward Density Optimization


67. Multiclass Local Calibration With the Jensen-Shannon Distance


68. Adaptive Inverse Kinematics Framework for Learning Variable-Length Tool Manipulation in Robotics


69. The Structure of Relation Decoding Linear Operators in Large Language Models


70. Inside CORE-KG: Evaluating Structured Prompting and Coreference Resolution for Knowledge Graphs


71. Simulating and Experimenting with Social Media Mobilization Using LLM Agents


72. Bayesian Network Fusion of Large Language Models for Sentiment Analysis


73. Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing


74. SecureReviewer: Enhancing Large Language Models for Secure Code Review through Secure-aware Fine-tuning


75. Robust Graph Condensation via Classification Complexity Mitigation


76. Personalized Treatment Outcome Prediction from Scarce Data via Dual-Channel Knowledge Distillation and Adaptive Fusion


77. SSCL-BW: Sample-Specific Clean-Label Backdoor Watermarking for Dataset Ownership Verification


78. LoCoT2V-Bench: A Benchmark for Long-Form and Complex Text-to-Video Generation


79. Human-in-the-loop Online Rejection Sampling for Robotic Manipulation


80. SPG-CDENet: Spatial Prior-Guided Cross Dual Encoder Network for Multi-Organ Segmentation


81. The Geometry of Dialogue: Graphing Language Models to Reveal Synergistic Teams for Multi-Agent Collaboration


82. Reinforcement Learning for Pollution Detection in a Randomized, Sparse and Nonstationary Environment with an Autonomous Underwater Vehicle


83. MisSynth: Improving MISSCI Logical Fallacies Classification with Synthetic Data


84. Linear Causal Discovery with Interventional Constraints


85. GLYPH-SR: Can We Achieve Both High-Quality Image Super-Resolution and High-Fidelity Text Recovery via VLM-guided Latent Diffusion Model?


86. From Amateur to Master: Infusing Knowledge into LLMs via Automated Curriculum Learning


87. Posterior Sampling by Combining Diffusion Models with Annealed Langevin Dynamics


88. Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime


89. Understanding Hardness of Vision-Language Compositionality from A Token-level Causal Lens


90. Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games


91. Unravelling the Mechanisms of Manipulating Numbers in Language Models


92. Distributional Multi-objective Black-box Optimization for Diffusion-model Inference-time Multi-Target Generation


93. A Research Roadmap for Augmenting Software Engineering Processes and Software Products with Generative AI


94. Angular Steering: Behavior Control via Rotation in Activation Space


95. MPRU: Modular Projection-Redistribution Unlearning as Output Filter for Classification Pipelines


96. Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space


97. Hybrid LLM and Higher-Order Quantum Approximate Optimization for CSA Collateral Management


98. Towards Global Retrieval Augmented Generation: A Benchmark for Corpus-Level Reasoning


99. What’s In My Human Feedback? Learning Interpretable Descriptions of Preference Data


100. Don’t Let It Fade: Preserving Edits in Diffusion Language Models via Token Timestep Allocation


101. Predicting All-Cause Hospital Readmissions from Medical Claims Data of Hospitalised Patients


102. ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts


103. Accumulative SGD Influence Estimation for Data Attribution


104. Linking Heterogeneous Data with Coordinated Agent Flows for Social Media Analysis


105. Learning to Manage Investment Portfolios beyond Simple Utility Functions


106. Segmentation over Complexity: Evaluating Ensemble and Hybrid Approaches for Anomaly Detection in Industrial Time Series


107. Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware Alignment


108. MV-MLM: Bridging Multi-View Mammography and Language for Breast Cancer Diagnosis and Risk Prediction


109. Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation


110. WOD-E2E: Waymo Open Dataset for End-to-End Driving in Challenging Long-tail Scenarios


111. EgoExo-Con: Exploring View-Invariant Video Temporal Understanding


112. Security Risk of Misalignment between Text and Image in Multi-modal Model


113. SAFE: A Novel Approach to AI Weather Evaluation through Stratified Assessments of Forecasts over Earth


114. Network-Constrained Policy Optimization for Adaptive Multi-agent Vehicle Routing


115. Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism


116. Learning Geometry: A Framework for Building Adaptive Manifold Models through Metric Optimization


117. Data-driven Projection Generation for Efficiently Solving Heterogeneous Quadratic Programming Problems


118. Dynamic VLM-Guided Negative Prompting for Diffusion Models


119. Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods


120. SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning


121. Artificial Intelligence-Enabled Analysis of Radiology Reports: Epidemiology and Consequences of Incidental Thyroid Findings


122. Rethinking Cross-lingual Alignment: Balancing Transfer and Cultural Erasure in Multilingual LLMs


123. PORTool: Tool-Use LLM Training with Rewarded Tree


124. RADRON: Cooperative Localization of Ionizing Radiation Sources by MAVs with Compton Cameras


125. Climate Adaptation-Aware Flood Prediction for Coastal Cities Using Deep Learning


126. Dual Mixture-of-Experts Framework for Discrete-Time Survival Analysis


127. The Quest for Reliable Metrics of Responsible AI


128. DARTS: A Drone-Based AI-Powered Real-Time Traffic Incident Detection System


129. Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning


130. Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer


131. WaveVerif: Acoustic Side-Channel based Verification of Robotic Workflows


132. Application and Validation of Geospatial Foundation Model Data for the Prediction of Health Facility Programmatic Outputs – A Case Study in Malawi


133. Revisiting Multilingual Data Mixtures in Language Model Pretraining


134. A Process Mining-Based System For The Analysis and Prediction of Software Development Workflows


135. Multi-Agent Reinforcement Learning for Market Making: Competition without Collusion


136. Transferring Causal Effects using Proxies


137. Evaluating the Impact of LLM-Assisted Annotation in a Perspectivized Setting: the Case of FrameNet Annotation


138. PRISM: Proof-Carrying Artifact Generation through LLM x MDE Synergy and Stratified Constraints


139. AAGATE: A NIST AI RMF-Aligned Governance Platform for Agentic AI


140. Identity Management for Agentic AI: The new frontier of authorization, authentication, and security for an AI agent world


141. ScaleDiff: Higher-Resolution Image Synthesis via Efficient and Model-Agnostic Diffusion


142. Metis-SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start


143. MemEIC: A Step Toward Continual and Compositional Knowledge Editing


144. Non-myopic Matching and Rebalancing in Large-Scale On-Demand Ride-Pooling Systems Using Simulation-Informed Reinforcement Learning


145. The Kinetics of Reasoning: How Chain-of-Thought Shapes Learning in Transformers?


146. Unsupervised local learning based on voltage-dependent synaptic plasticity for resistive and ferroelectric synapses


147. BlackboxNLP-2025 MIB Shared Task: Improving Circuit Faithfulness via Better Edge Selection


148. HiMAE: Hierarchical Masked Autoencoders Discover Resolution-Specific Structure in Wearable Time Series


149. zFLoRA: Zero-Latency Fused Low-Rank Adapters


150. LASTIST: LArge-Scale Target-Independent STance dataset


151. A Practitioner’s Guide to Kolmogorov-Arnold Networks


152. Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets


153. DINO-YOLO: Self-Supervised Pre-training for Data-Efficient Object Detection in Civil Engineering Applications