전체 AI 논문 - 2025-10-16

1. Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math


2. From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails


3. Training LLM Agents to Empower Humans


4. A Modal Logic for Temporal and Jurisdictional Classifier Models


5. Tandem Training for Language Models


6. A Methodology for Assessing the Risk of Metric Failure in LLMs Within the Financial Domain


7. Confidence as a Reward: Transforming LLMs into Reward Models


8. Mobile Coverage Analysis using Crowdsourced Data


9. Assessing LLM Reasoning Through Implicit Causal Chain Discovery in Climate Discourse


10. Learnable Game-theoretic Policy Optimization for Data-centric Self-explanation Rationalization


11. SAJA: A State-Action Joint Attack Framework on Multi-Agent Deep Reinforcement Learning


12. An Analytical Framework to Enhance Autonomous Vehicle Perception for Smart Cities


13. EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems


14. Personalized Learning Path Planning with Goal-Driven Learner State Modeling


15. Adaptive Reasoning Executor: A Collaborative Agent System for Efficient Reasoning


16. Emotional Cognitive Modeling Framework with Desire-Driven Objective Optimization for LLM-empowered Agent in Social Simulation


17. Repairing Reward Functions with Human Feedback to Mitigate Reward Hacking


18. Toward Reasoning-Centric Time-Series Analysis


19. From Narratives to Probabilistic Reasoning: Predicting and Interpreting Drivers’ Hazardous Actions in Crashes Using Large Language Model


20. SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents


21. DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping


22. From Literal to Liberal: A Meta-Prompting Framework for Eliciting Human-Aligned Exception Handling in Large Language Models


23. Generative Universal Verifier as Multimodal Meta-Reasoner


24. Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs


25. Provably Invincible Adversarial Attacks on Reinforcement Learning Systems: A Rate-Distortion Information-Theoretic Approach


26. The Art of Scaling Reinforcement Learning Compute for LLMs


27. InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy


28. Scaling Vision Transformers for Functional MRI with Flat Maps


29. RECODE: Reasoning Through Code Generation for Visual Question Answering


30. Multi-Scale High-Resolution Logarithmic Grapher Module for Efficient Vision GNNs


31. FIRST: Federated Inference Resource Scheduling Toolkit for Scientific AI Model Access


32. NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching


33. Dedelayed: Deleting remote inference delay via on-device correction


34. Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents


35. MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion


36. CanvasMAR: Improving Masked Autoregressive Video Generation With Canvas


37. Axial Neural Networks for Dimension-Free Foundation Models


38. Time Series Foundation Models: Benchmarking Challenges and Requirements


39. Closing the Gap Between Text and Speech Understanding in LLMs


40. Unlocking Public Catalogues: Instruction-Tuning LLMs for ICD Coding of German Tumor Diagnoses


41. The Role of Computing Resources in Publishing Foundation Model Research


42. Message Passing on the Edge: Towards Scalable and Expressive GNNs


43. NOSA: Native and Offloadable Sparse Attention


44. Subject Roles in the EU AI Act: Mapping and Regulatory Implications


45. Deflanderization for Game Dialogue: Balancing Character Authenticity with Task Execution in LLM-based NPCs


46. OpenDerisk: An Industrial Framework for AI-Driven SRE, with Design, Implementation, and Case Studies


47. Modeling Cultural Bias in Facial Expression Recognition with Adaptive Agents


48. In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers


49. K-Merge: Online Continual Merging of Adapters for On-device Large Language Models


50. Narrow Operator Models of Stellarator Equilibria in Fourier Zernike Basis


51. UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning


52. Offline and Online KL-Regularized RLHF under Differential Privacy


53. MedREK: Retrieval-Based Editing for Medical LLMs with Key-Aware Prompts


54. ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding


55. DistilCLIP-EEG: Enhancing Epileptic Seizure Detection Through Multi-modal Learning and Knowledge Distillation


56. LiteraryQA: Towards Effective Evaluation of Long-document Narrative QA


57. Neural Sum-of-Squares: Certifying the Nonnegativity of Polynomials with Transformers


58. Rectify and Align GPS Points to Parking Spots via Rank-1 Constraint


59. Semantic Communication Enabled Holographic Video Processing and Transmission


60. From Minimal Existence to Human Definition: The CES-IMU-HSG Theoretical Framework


61. MADREC: A Multi-Aspect Driven LLM Agent for Explainable and Adaptive Recommendation


62. A New Perspective on Transformers in Online Reinforcement Learning for Continuous Control


63. Document Intelligence in the Era of Large Language Models: A Survey


64. Language as a Label: Zero-Shot Multimodal Classification of Everyday Postures under Data Scarcity


65. Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training


66. Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control


67. Personal Attribute Leakage in Federated Speech Models


68. Protect: Towards Robust Guardrailing Stack for Trustworthy Enterprise LLM Systems


69. AOAD-MAT: Transformer-based multi-agent deep reinforcement learning model considering agents’ order of action decisions


70. Thompson Sampling via Fine-Tuning of LLMs


71. Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning


72. Self-Augmented Visual Contrastive Decoding


73. LLM one-shot style transfer for Authorship Attribution and Verification


74. Higher Satisfaction, Lower Cost: A Technical Report on How LLMs Revolutionize Meituan’s Intelligent Interaction Systems


75. To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models


76. A Ratio-Based Shapley Value for Collaborative Machine Learning - Extended Version


77. Real-Time Crowd Counting for Embedded Systems with Lightweight Architecture


78. MotionBeat: Motion-Aligned Music Representation via Embodied Contrastive Learning and Bar-Equivariant Contact-Aware Encoding


79. What “Not” to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging


80. MimicParts: Part-aware Style Injection for Speech-Driven 3D Motion Generation


81. CleverCatch: A Knowledge-Guided Weak Supervision Model for Fraud Detection


82. LLM-Guided Synthetic Augmentation (LGSA) for Mitigating Bias in AI Systems


83. Paper Copilot: Tracking the Evolution of Peer Review in AI Conferences


84. StressTransfer: Stress-Aware Speech-to-Speech Translation with Emphasis Preservation


85. Behavioral Embeddings of Programs: A Quasi-Dynamic Approach for Optimization Prediction


86. Program of Thoughts for Financial Reasoning: Leveraging Dynamic In-Context Examples and Generative Retrieval


87. Stable LLM Ensemble: Interaction between Example Representativeness and Diversity


88. On the Reasoning Abilities of Masked Diffusion Language Models


89. Multi-Label Clinical Text Eligibility Classification and Summarization System


90. DriveCritic: Towards Context-Aware, Human-Aligned Evaluation for Autonomous Driving with Vision-Language Models


91. TRUSTVIS: A Multi-Dimensional Trustworthiness Evaluation Framework for Large Language Models


92. ESI: Epistemic Uncertainty Quantification via Semantic-preserving Intervention for Large Language Models


93. A Multi-dimensional Semantic Surprise Framework Based on Low-Entropy Semantic Manifolds for Fine-Grained Out-of-Distribution Detection


94. Agentic Discovery: Closing the Loop with Cooperative Agents


95. Transformer-based Scalable Beamforming Optimization via Deep Residual Learning


96. NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models


97. True Self-Supervised Novel View Synthesis is Transferable


98. Towards Human-Centric Intelligent Treatment Planning for Radiation Therapy


99. VLA-0: Building State-of-the-Art VLAs with Zero Modification


100. Time-Varying Optimization for Streaming Data Via Temporal Weighting


101. SceneAdapt: Scene-aware Adaptation of Human Motion Diffusion


102. SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video Models


103. Randomness and Interpolation Improve Gradient Descent


104. Deliberate Lab: A Platform for Real-Time Human-AI Social Experiments


105. Developing and Validating the Arabic Version of the Attitudes Toward Large Language Models Scale


106. CurLL: A Developmental Framework to Evaluate Continual Learning in Language Models


107. Max It or Miss It: Benchmarking LLM On Solving Extremal Problems


108. A Multimodal XAI Framework for Trustworthy CNNs and Bias Detection in Deep Representation Learning


109. Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation


110. SpareCodeSearch: Searching for Code Context When You Have No Spare GPU


111. HyWA: Hypernetwork Weight Adapting Personalized Voice Activity Detection


112. InferA: A Smart Assistant for Cosmological Ensemble Data


113. KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems


114. Three Lenses on the AI Revolution: Risk, Transformation, Continuity


115. A Critical Review of the Need for Knowledge-Centric Evaluation of Quranic Recitation


116. Adaptive Generation of Bias-Eliciting Questions for LLMs


117. Efficient Adaptive Transformer: An Empirical Study and Reproducible Framework


118. Ethic-BERT: An Enhanced Deep Learning Model for Ethical and Non-Ethical Content Classification


119. VLURes: Benchmarking VLM Visual and Linguistic Understanding in Low-Resource Languages


120. FaStFACT: Faster, Stronger Long-Form Factuality Evaluations in LLMs


121. A\textsuperscript{2}FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning


122. Semantic knowledge guides innovation and drives cultural evolution


123. Repurposing Annotation Guidelines to Instruct LLM Annotators: A Case Study


124. Gelina: Unified Speech and Gesture Synthesis via Interleaved Token Prediction


125. Coherent Load Profile Synthesis with Conditional Diffusion for LV Distribution Network Scenario Generation


126. MTSQL-R1: Towards Long-Horizon Multi-Turn Text-to-SQL via Agentic Training


127. Gobernanza y trazabilidad “a prueba de AI Act” para casos de uso legales: un marco técnico-jurídico, métricas forenses y evidencias auditables


128. Mathematics with large language models as provers and verifiers


129. Automatic Speech Recognition in the Modern Era: Architectures, Training, and Evaluation


130. Scheming Ability in LLM-to-LLM Strategic Interactions


131. Classifier-Augmented Generation for Structured Workflow Prediction


132. Evidence Without Injustice: A New Counterfactual Test for Fair Algorithms


133. Beyond Discrete Categories: Multi-Task Valence-Arousal Modeling for Pet Vocalization Analysis


134. MEDEQUALQA: Evaluating Biases in LLMs with Counterfactual Reasoning


135. From Noise to Signal to Selbstzweck: Reframing Human Label Variation in the Era of Post-training in NLP


136. Cancer Diagnosis Categorization in Electronic Health Records Using Large Language Models and BioBERT: Model Performance Evaluation Study


137. Benchmarking Open-Source Large Language Models for Persian in Zero-Shot and Few-Shot Learning


138. AutoCode: LLMs as Problem Setters for Competitive Programming