전체 AI 논문 - 2025-10-03

1. BioX-Bridge: Model Bridging for Unsupervised Cross-Modal Knowledge Transfer across Biosignals


2. RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems


3. The Unreasonable Effectiveness of Scaling Agents for Computer Use


4. The Reasoning Boundary Paradox: How Reinforcement Learning Constrains Language Models


5. UpSafe$^\circ$C: Upcycling for Controllable Safety in Large Language Models


6. A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports


7. FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models


8. Do AI Models Perform Human-like Abstract Reasoning Across Modalities?


9. Demystifying the Roles of LLM Layers in Retrieval, Knowledge, and Reasoning


10. ReTabAD: A Benchmark for Restoring Semantic Context in Tabular Anomaly Detection


11. Zero-shot reasoning for simulating scholarly peer-review


12. To Mask or to Mirror: Human-AI Alignment in Collective Reasoning


13. Constrained Adaptive Rejection Sampling


14. Learning a Dense Reasoning Reward Model from Expert Demonstration via Inverse Reinforcement Learning


15. Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning


16. Human-AI Teaming Co-Learning in Military Operations


17. REBot: From RAG to CatRAG with Semantic Enrichment and Graph Routing


18. A cybersecurity AI agent selection and decision support framework


19. MetaboT: AI-based agent for natural language-based interaction with metabolomics knowledge graphs


20. VaPR – Vision-language Preference alignment for Reasoning


21. Improving AGI Evaluation: A Data Science Perspective


22. A Locally Executable AI System for Improving Preoperative Patient Communication: A Multi-Domain Clinical Evaluation


23. Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness


24. GuruAgents: Emulating Wise Investors with Prompt-Guided LLM Agents


25. Understanding the Geospatial Reasoning Capabilities of LLMs: A Trajectory Recovery Perspective


26. Learning to Decide with Just Enough: Information-Theoretic Context Summarization for CDMPs


27. PychoBench: Evaluating the Psychology Intelligence of Large Language Models


28. AgentRec: Next-Generation LLM-Powered Multi-Agent Collaborative Recommendation with Adaptive Intelligence


29. AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning


30. InvThink: Towards AI Safety via Inverse Reasoning


31. Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models


32. Information Seeking for Robust Decision Making under Partial Observability


33. LOGicalThought: Logic-Based Ontological Grounding of LLMs for High-Assurance Reasoning


34. Towards Interpretable and Inference-Optimal COT Reasoning with Sparse Autoencoder-Guided Generation


35. Lateral Tree-of-Thoughts Surpasses ToT by Incorporating Logically-Consistent, Low-Utility Candidates


36. AIReg-Bench: Benchmarking Language Models That Assess AI Regulation Compliance


37. VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning


38. On the Role of Domain Experts in Creating Effective Tutoring Systems


39. A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining


40. OntoLogX: Ontology-Guided Knowledge Graph Extraction from Cybersecurity Logs with Large Language Models


41. Automating Data-Driven Modeling and Analysis for Engineering Applications using Large Language Model Agents


42. Fine-tuning with RAG for Improving LLM Learning of New Skills


43. Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort


44. Retrieval-Augmented Framework for LLM-Based Clinical Decision Support


45. MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments


46. Aristotle: IMO-level Automated Theorem Proving


47. Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models


48. The Social Laboratory: A Psychometric Framework for Multi-Agent LLM Evaluation


49. Cyber Academia-Chemical Engineering (CA-ChemE): A Living Digital Town for Self-Directed Research Evolution and Emergent Scientific Discovery


50. Modeling Others’ Minds as Code


51. OR-Toolformer: Modeling and Solving Operations Research Problems with Tool Augmented Large Language Models


52. NoiseShift: Resolution-Aware Noise Recalibration for Better Low-Resolution Image Generation


53. Diffusion Models and the Manifold Hypothesis: Log-Domain Smoothing is Geometry Adaptive


54. Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models


55. Interactive Training: Feedback-Driven Neural Network Optimization


56. VideoNSA: Native Sparse Attention Scales Video Understanding


57. F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data


58. Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks


59. Learning to Generate Object Interactions with Physics-Guided Video Diffusion


60. Self-Forcing++: Towards Minute-Scale High-Quality Video Generation


61. Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation


62. Parallel Scaling Law: Unveiling Reasoning Generalization through A Cross-Linguistic Perspective


63. InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents


64. microCLIP: Unsupervised CLIP Adaptation via Coarse-Fine Token Fusion for Fine-Grained Image Classification


65. How to Combat Reactive and Dynamic Jamming Attacks with Reinforcement Learning


66. Paving the Way Towards Kinematic Assessment Using Monocular Video: A Preclinical Benchmark of State-of-the-Art Deep-Learning-Based 3D Human Pose Estimators Against Inertial Sensors in Daily Living Activities


67. DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing


68. Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation


69. ExGRPO: Learning to Reason from Experience


70. RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning


71. More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration


72. TempoControl: Temporal Attention Guidance for Text-to-Video Models


73. DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning


74. Detection of Chagas Disease from the ECG: The George B. Moody PhysioNet Challenge 2025


75. ARUQULA – An LLM based Text2SPARQL Approach using ReAct and Knowledge Graph Exploration Utilities


76. EvolveCaptions: Empowering DHH Users Through Real-Time Collaborative Captioning


77. GRACE: A Language Model Framework for Explainable Inverse Reinforcement Learning


78. Learning to Reason for Hallucination Span Detection


79. Go witheFlow: Real-time Emotion Driven Audio Effects Modulation


80. SIEVE: Towards Verifiable Certification for Code-datasets


81. Comparing Contrastive and Triplet Loss in Audio-Visual Embedding: Intra-Class Variance and Greediness Analysis


82. Unlocking Vision-Language Models for Video Anomaly Detection via Fine-Grained Prompting


83. Human-Robo-advisor collaboration in decision-making: Evidence from a multiphase mixed methods experimental study


84. How to Find Fantastic Papers: Self-Rankings as a Powerful Predictor of Scientific Impact Beyond Peer Review


85. BioinfoMCP: A Unified Platform Enabling MCP Interfaces in Agentic Bioinformatics


86. The Disparate Impacts of Speculative Decoding


87. VarCoNet: A variability-aware self-supervised framework for functional connectome extraction from resting-state fMRI


88. SpurBreast: A Curated Dataset for Investigating Spurious Correlations in Real-world Breast MRI Classification


89. Unlocking Symbol-Level Precoding Efficiency Through Tensor Equivariant Neural Network


90. When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos


91. KAIROS: Unified Training for Universal Non-Autoregressive Time Series Forecasting


92. The Current State of AI Bias Bounties: An Overview of Existing Programmes and Research


93. LiLa-Net: Lightweight Latent LiDAR Autoencoder for 3D Point Cloud Reconstruction


94. Generating Findings for Jaw Cysts in Dental Panoramic Radiographs Using GPT-4o: Building a Two-Stage Self-Correction Loop with Structured Output (SLSO) Framework


95. Clarifying Semantics of In-Context Examples for Unit Test Generation


96. ZK-WAGON: Imperceptible Watermark for Image Generation Models using ZK-SNARKs


97. Exploring Resolution-Wise Shared Attention in Hybrid Mamba-U-Nets for Improved Cross-Corpus Speech Enhancement


98. Foundation Visual Encoders Are Secretly Few-Shot Anomaly Detectors


99. Automated Defect Detection for Mass-Produced Electronic Components Based on YOLO Object Detection Models


100. Are LLMs Better GNN Helpers? Rethinking Robust Graph Learning under Deficiencies with Iterative Refinement


101. Multimodal Foundation Models for Early Disease Detection


102. HRTFformer: A Spatially-Aware Transformer for Personalized HRTF Upsampling in Immersive Audio Rendering


103. Small is Sufficient: Reducing the World AI Energy Consumption Through Model Selection


104. FINCH: Financial Intelligence using Natural language for Contextualized SQL Handling


105. REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration


106. TACOS: Task Agnostic COordinator of a multi-drone System


107. A Modular Theory of Subjective Consciousness for Natural and Artificial Minds


108. NGGAN: Noise Generation GAN Based on the Practical Measurement Dataset for Narrowband Powerline Communications


109. Pre-Hoc Predictions in AutoML: Leveraging LLMs to Enhance Model Selection and Benchmarking for Tabular datasets


110. SingMOS-Pro: An Comprehensive Benchmark for Singing Quality Assessment


111. Rethinking the shape convention of an MLP


112. Nav-EE: Navigation-Guided Early Exiting for Efficient Vision-Language Models in Autonomous Driving


113. Comparison of Unsupervised Metrics for Evaluating Judicial Decision Extraction


114. Pack and Force Your Memory: Long-form and Consistent Video Generation


115. Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks


116. Secure Multi-Modal Data Fusion in Federated Digital Health Systems via MCP


117. Unsupervised Dynamic Feature Selection for Robust Latent Spaces in Vision Tasks


118. Machine-interpretable Engineering Design Standards for Valve Specification


119. Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement


120. Latency-aware Multimodal Federated Learning over UAV Networks


121. PyramidStyler: Transformer-Based Neural Style Transfer with Pyramidal Positional Encoding and Reinforcement Learning


122. PolySim: Bridging the Sim-to-Real Gap for Humanoid Control via Multi-Simulator Dynamics Randomization


123. Representational Alignment Across Model Layers and Brain Regions with Hierarchical Optimal Transport


124. Holistic Order Prediction in Natural Scenes


125. Format Inertia: A Failure Mechanism of LLMs in Medical Pre-Consultation


126. How Do Language Models Compose Functions?


127. Look Less, Reason More: Rollout-Guided Adaptive Pixel-Space Reasoning


128. FOR-Prompting: From Objection to Revision via an Asymmetric Prompting Protocol


129. Shift-Invariant Attribute Scoring for Kolmogorov-Arnold Networks via Shapley Value


130. MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization


131. Learning Time-Series Representations by Hierarchical Uniformity-Tolerance Latent Balancing


132. Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning


133. SoK: Measuring What Matters for Closed-Loop Security Agents


134. The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM


135. Source-Free Cross-Domain Continual Learning


136. Position: Privacy Is Not Just Memorization!


137. NLP Methods for Detecting Novel LLM Jailbreaks and Keyword Analysis with BERT


138. Towards Human-Centered RegTech: Unpacking Professionals’ Strategies and Needs for Using LLMs Safely


139. BioBlobs: Differentiable Graph Partitioning for Protein Representation Learning


140. Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls


141. Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead


142. LLM4Rec: Large Language Models for Multimodal Generative Recommendation with Causal Debiasing


143. RAG-BioQA Retrieval-Augmented Generation for Long-Form Biomedical Question Answering


144. Bridging Collaborative Filtering and Large Language Models with Dynamic Alignment, Multimodal Fusion and Evidence-grounded Explanations


145. A Comparison of Independent and Joint Fine-tuning Strategies for Retrieval-Augmented Generation


146. Enhancing Noise Robustness of Parkinson’s Disease Telemonitoring via Contrastive Feature Augmentation


147. Think Right: Learning to Mitigate Under-Over Thinking via Adaptive, Attentive Compression


148. Guiding Multimodal Large Language Models with Blind and Low Vision People Visual Questions for Proactive Visual Interpretations


149. Synthetic Prefixes to Mitigate Bias in Real-Time Neural Query Autocomplete


150. From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning?


151. Rethinking KL Regularization in RLHF: From Value Estimation to Gradient Optimization


152. POLAR: Automating Cyber Threat Prioritization through LLM-Powered Assessment


153. Predictive Preference Learning from Human Interventions


154. WALT: Web Agents that Learn Tools


155. Predictive Modeling and Explainable AI for Veterinary Safety Profiles, Residue Assessment, and Health Outcomes Using Real-World Data and Physicochemical Properties


156. From Videos to Indexed Knowledge Graphs – Framework to Marry Methods for Multimodal Content Analysis and Understanding


157. Beyond Majority Voting: LLM Aggregation by Leveraging Higher-Order Information


158. AortaDiff: A Unified Multitask Diffusion Framework For Contrast-Free AAA Imaging


159. Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed


160. VL-KnG: Visual Scene Understanding for Navigation Goal Identification using Spatiotemporal Knowledge Graphs


161. Pharmacophore-Guided Generative Design of Novel Drug-Like Molecules


162. Purrception: Variational Flow Matching for Vector-Quantized Image Generation


163. From keywords to semantics: Perceptions of large language models in data discovery


164. RealClass: A Framework for Classroom Speech Simulation with Public Datasets and Game Engines


165. The Three Regimes of Offline-to-Online Reinforcement Learning


166. The Command Line GUIde: Graphical Interfaces from Man Pages via AI


167. Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression


168. GeoSURGE: Geo-localization using Semantic Fusion with Hierarchy of Geographic Embeddings


169. AFFORD2ACT: Affordance-Guided Automatic Keypoint Selection for Generalizable and Lightweight Robotic Manipulation


170. BioVERSE: Representation Alignment of Biomedical Modalities to LLMs for Multi-Modal Reasoning


171. Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting


172. Neural Network Surrogates for Free Energy Computation of Complex Chemical Systems


173. Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence


174. INSIGHT: INference-time Sequence Introspection for Generating Help Triggers in Vision-Language-Action Models


175. DeMuon: A Decentralized Muon for Matrix Optimization over Graphs


176. SPUS: A Lightweight and Parameter-Efficient Foundation Model for PDEs


177. Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks


178. WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents


179. HiSpec: Hierarchical Speculative Decoding for LLMs


180. Low Rank Gradients and Where to Find Them


181. Enhancing the development of Cherenkov Telescope Array control software with Large Language Models


182. From 2D to 3D, Deep Learning-based Shape Reconstruction in Magnetic Resonance Imaging: A Review


183. Microsaccade-Inspired Probing: Positional Encoding Perturbations Reveal LLM Misbehaviours


184. Evaluating New AI Cell Foundation Models on Challenging Kidney Pathology Cases Unaddressed by Previous Foundation Models


185. Emergent evaluation hubs in a decentralizing large language model ecosystem


186. LLM-based Multi-Agent Blackboard System for Information Discovery in Data Science


187. An Analysis of the New EU AI Act and A Proposed Standardization Framework for Machine Learning Fairness


188. TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture


189. Noisy-Pair Robust Representation Alignment for Positive-Unlabeled Learning


190. LLM Based Sentiment Classification From Bangladesh E-Commerce Reviews


191. Identifying Information-Transfer Nodes in a Recurrent Neural Network Reveals Dynamic Representations


192. Think Twice, Generate Once: Safeguarding by Progressive Self-Reflection


193. AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees


194. OpenAI’s GPT-OSS-20B Model and Safety Alignment Issues in a Low-Resource Language


195. RLP: Reinforcement as a Pretraining Objective


196. Budgeted Broadcast: An Activity-Dependent Pruning Rule for Neural Network Efficiency


197. RSTGCN: Railway-centric Spatio-Temporal Graph Convolutional Network for Train Delay Prediction


198. IoT-MCP: Bridging LLMs and IoT Systems Through Model Context Protocol


199. In AI Sweet Harmony: Sociopragmatic Guardrail Bypasses and Evaluation-Awareness in OpenAI gpt-oss-20b


200. Measuring Algorithmic Partisanship via Zero-Shot Classification and Its Implications on Political Discourse


201. RJE: A Retrieval-Judgment-Exploration Framework for Efficient Knowledge Graph Question Answering with LLMs


202. Kant: An Efficient Unified Scheduling System for Large-Scale AI Clusters


203. Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs


204. GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models


205. Let’s Play Across Cultures: A Large Multilingual, Multicultural Benchmark for Assessing Language Models’ Understanding of Sports


206. Redundancy-as-Masking: Formalizing the Artificial Age Score (AAS) to Model Memory Aging in Generative AI


207. Confidence-Aware Routing for Large Language Model Reliability Enhancement: A Multi-Signal Approach to Pre-Generation Hallucination Mitigation


208. Automated Extraction of Material Properties using LLM-based AI Agents


209. Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks


210. Trustworthy Summarization via Uncertainty Quantification and Risk Awareness in Large Language Models


211. Enhancing Transformer-Based Rerankers with Synthetic Data and LLM-Based Supervision


212. ClaimCheck: Real-Time Fact-Checking with Small Language Models


213. Utilizing Modern Large Language Models (LLM) for Financial Trend Analysis and Digest Creation


214. Context Matters: Comparison of commercial large language tools in veterinary medicine


215. Discourse vs emissions: Analysis of corporate narratives, symbolic practices, and mimicry through LLMs


216. Towards Open-Ended Discovery for Low-Resource NLP


217. Uncovering Implicit Bias in Large Language Models with Concept Learning Dataset


218. Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs


219. Mamba Outpaces Reformer in Stock Prediction with Sentiments from Top Ten LLMs


220. LegiScout: A Visual Tool for Understanding Complex Legislation


221. An Anthropologist LLM to Elicit Users’ Moral Preferences through Role-Play


222. Quantum-Assisted Correlation Clustering