전체 AI 논문 - 2025-12-10

1. Auditing Games for Sandbagging


2. Large Causal Models from Large Language Models


3. ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning


4. RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models


5. Each Prompt Matters: Scaling Reinforcement Learning Without Wasting Rollouts on Hundred-Billion-Scale MoE


6. The Agent Capability Problem: Predicting Solvability Through Information-Theoretic Bounds


7. Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement


8. How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations


9. LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services


10. A Geometric Unification of Concept Learning with Concept Cones


11. M-STAR: Multi-Scale Spatiotemporal Autoregression for Human Mobility Modeling


12. Cross-platform Product Matching Based on Entity Alignment of Knowledge Graph with RAEA model


13. Sample from What You See: Visuomotor Policy Learning via Diffusion Bridge with Observation-Embedded Stochastic Differential Equation


14. PICKT: Practical Interlinked Concept Knowledge Tracing for Personalized Learning using Knowledge Map Concept Relations


15. ContextualSHAP : Enhancing SHAP Explanations Through Contextual Language Generation


16. A Neural Affinity Framework for Abstract Reasoning: Diagnosing the Compositional Gap in Transformer Architectures via Procedural Task Taxonomy


17. VIGIL: A Reflective Runtime for Self-Healing Agents


18. ClinNoteAgents: An LLM Multi-Agent System for Predicting and Interpreting Heart Failure 30-Day Readmission from Clinical Notes


19. Utilizing Multi-Agent Reinforcement Learning with Encoder-Decoder Architecture Agents to Identify Optimal Resection Location in Glioblastoma Multiforme Patients


20. On Memory: A comparison of memory mechanisms in world models


21. Do Persona-Infused LLMs Affect Performance in a Strategic Reasoning Game?


22. JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models


23. Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning


24. DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems


25. ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems


26. Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents


27. Stochasticity in Agentic Evaluations: Quantifying Inconsistency with Intraclass Correlation


28. Academic journals’ AI policies fail to curb the surge in AI-assisted academic writing


29. LightSearcher: Efficient DeepSearch via Experiential Memory


30. FlatFormer: A Flat Transformer Knowledge Tracing Model Based on Cognitive Bias Injection


31. The Effect of Belief Boxes and Open-mindedness on Persuasion


32. Smart Spatial Planning in Egypt: An Algorithm-Driven Approach to Public Service Evaluation in Qena City


33. UncertaintyZoo: A Unified Toolkit for Quantifying Predictive Uncertainty in Deep Learning Systems


34. GENIUS: An Agentic AI Framework for Autonomous Design and Execution of Simulation Protocols


35. Less Is More for Multi-Step Logical Reasoning of LLM Generalisation Under Rule Removal, Paraphrasing, and Compression


36. DaGRPO: Rectifying Gradient Conflict in Reasoning via Distinctiveness-Aware Group Relative Policy Optimization


37. How Sharp and Bias-Robust is a Model? Dual Evaluation Perspectives on Knowledge Graph Completion


38. AI Application in Anti-Money Laundering for Sustainable and Transparent Financial Systems


39. On measuring grounding and generalizing grounding problems


40. ARCANE: A Multi-Agent Framework for Interpretable and Configurable Alignment


41. Deep learning for autism detection using clinical notes: A comparison of transfer learning for a transparent and black-box approach


42. Going All-In on LLM Accuracy: Fake Prediction Markets, Real Confidence Signals


43. Relational Visual Similarity


44. One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation


45. WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling


46. Provable Long-Range Benefits of Next-Token Prediction


47. Understanding Privacy Risks in Code Models Through Training Dynamics: A Causal Approach


48. Group Representational Position Encoding


49. Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support


50. SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination


51. Improving action classification with brain-inspired deep networks


52. The Native Spiking Microarchitecture: From Iontronic Primitives to Bit-Exact FP8 Arithmetic


53. Enabling Delayed-Full Charging Through Transformer-Based Real-Time-to-Departure Modeling for EV Battery Longevity


54. In-Context and Few-Shots Learning for Forecasting Time Series Data based on Large Language Models


55. Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment


56. When Large Language Models Do Not Work: Online Incivility Prediction through Graph Neural Networks


57. DIST-CLIP: Arbitrary Metadata and Image Guided MRI Harmonization via Disentangled Anatomy-Contrast Representations


58. An AI-Powered Autonomous Underwater System for Sea Exploration and Scientific Research


59. A Mathematical Theory of Top-$k$ Sparse Attention via Total Variation Distance


60. Incorporating Structure and Chord Constraints in Symbolic Transformer-based Melodic Harmonization


61. Time Series Foundation Models for Process Model Forecasting


62. PCMind-2.1-Kaiyuan-2B Technical Report


63. Metric-Fair Prompting: Treating Similar Samples Similarly


64. Complementary Learning Approach for Text Classification using Large Language Models


65. R2MF-Net: A Recurrent Residual Multi-Path Fusion Network for Robust Multi-directional Spine X-ray Segmentation


66. Weighted Contrastive Learning for Anomaly-Aware Time-Series Forecasting


67. Dual-Stream Cross-Modal Representation Learning via Residual Semantic Decorrelation


68. Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models


69. MoCoRP: Modeling Consistent Relations between Persona and Response for Persona-based Dialogue


70. Minimum Bayes Risk Decoding for Error Span Detection in Reference-Free Automatic Machine Translation Evaluation


71. VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection


72. Model-Based Reinforcement Learning Under Confounding


73. LIME: Making LLM Data More Efficient with Linguistic Metadata Embeddings


74. SPAD: Seven-Source Token Probability Attribution with Syntactic Aggregation for Detecting Hallucinations in RAG


75. Exploring possible vector systems for faster training of neural networks with preconfigured latent spaces


76. AutoICE: Automatically Synthesizing Verifiable C Code via LLM-driven Evolution


77. Artificial Intelligence and Nuclear Weapons Proliferation: The Technological Arms Race for (In)visibility


78. From Real-World Traffic Data to Relevant Critical Scenarios


79. Understanding LLM Agent Behaviours via Game Theory: Strategy Recognition, Biases and Multi-Agent Dynamics


80. Persian-Phi: Efficient Cross-Lingual Adaptation of Compact LLMs via Curriculum Learning


81. Social welfare optimisation in well-mixed and structured populations


82. Forget and Explain: Transparent Verification of GNN Unlearning


83. KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models


84. MIDG: Mixture of Invariant Experts with knowledge injection for Domain Generalization in Multimodal Sentiment Analysis


85. When normalization hallucinates: unseen risks in AI-powered whole slide image processing


86. Data-driven Exploration of Mobility Interaction Patterns


87. Do LLMs Trust the Code They Write?


88. Asymptotic analysis of shallow and deep forgetting in replay with Neural Collapse


89. ESPADA: Execution Speedup via Semantics Aware Demonstration Data Downsampling for Imitation Learning


90. Structure-Aware Feature Rectification with Region Adjacency Graphs for Training-Free Open-Vocabulary Semantic Segmentation


91. DeepAgent: A Dual Stream Multi Agent Fusion for Robust Multimodal Deepfake Detection


92. Venus: An Efficient Edge Memory-and-Retrieval System for VLM-based Online Video Understanding


93. Local-Curvature-Aware Knowledge Graph Embedding: An Extended Ricci Flow Approach


94. ContextAnyone: Context-Aware Diffusion for Character-Consistent Text-to-Video Generation


95. DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management


96. Radiance-Field Reinforced Pretraining: Scaling Localization Models with Unlabeled Wireless Signals


97. Exact Synthetic Populations for Scalable Societal and Market Modeling


98. Towards Accurate UAV Image Perception: Guiding Vision-Language Models with Stronger Task Prompts


99. SIT-Graph: State Integrated Tool Graph for Multi-Turn Agents


100. Effective Attention-Guided Multi-Scale Medical Network for Skin Lesion Segmentation


101. SINRL: Socially Integrated Navigation with Reinforcement Learning using Spiking Neural Networks


102. DGGAN: Degradation Guided Generative Adversarial Network for Real-time Endoscopic Video Enhancement


103. IFFair: Influence Function-driven Sample Reweighting for Fair Classification


104. Dropout Prompt Learning: Towards Robust and Adaptive Vision-Language Models


105. Towards Robust Protective Perturbation against DeepFake Face Swapping


106. NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models


107. VFM-VLM: Vision Foundation Model and Vision Language Model based Visual Comparison for 3D Pose Estimation


108. Geometric Prior-Guided Federated Prompt Calibration


109. MASim: Multilingual Agent-Based Simulation for Social Science


110. START: Spatial and Textual Learning for Chart Understanding


111. Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach


112. JEPA as a Neural Tokenizer: Learning Robust Speech Representations with Density Adaptive Attention


113. FlowLPS: Langevin-Proximal Sampling for Flow-based Inverse Problem Solvers



115. A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning


116. TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning


117. DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning


118. RisConFix: LLM-based Automated Repair of Risk-Prone Drone Configurations


119. FOAM: Blocked State Folding for Memory-Efficient LLM Training


120. The Geometry of Persona: Disentangling Personality from Reasoning in Large Language Models


121. Leveraging KV Similarity for Online Structured Pruning in LLMs


122. ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking


123. Procrustean Bed for AI-Driven Retrosynthesis: A Unified Framework for Reproducible Evaluation


124. Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design


125. $\mathrm{D}^{\mathrm{3}}$-Predictor: Noise-Free Deterministic Diffusion for Dense Prediction


126. DAUNet: A Lightweight UNet Variant with Deformable Convolutions and Parameter-Free Attention for Medical Image Segmentation


127. Power of Boundary and Reflection: Semantic Transparent Object Segmentation using Pyramid Vision Transformer with Transparent Cues


128. A Comprehensive Study of Supervised Machine Learning Models for Zero-Day Attack Detection: Analyzing Performance on Imbalanced Data


129. Reformulate, Retrieve, Localize: Agents for Repository-Level Bug Localization


130. Transferring Clinical Knowledge into ECGs Representation


131. Latency-Response Theory Model: Evaluating Large Language Models via Response Accuracy and Chain-of-Thought Length


132. FVA-RAG: Falsification-Verification Alignment for Mitigating Sycophantic Hallucinations


133. Optimizing video analytics inference pipelines: a case study


134. Multi-Accent Mandarin Dry-Vocal Singing Dataset: Benchmark for Singing Accent Recognition


135. Benchmarking Deep Neural Networks for Modern Recommendation Systems


136. Singing Timbre Popularity Assessment Based on Multimodal Large Foundation Model


137. Prompting-in-a-Series: Psychology-Informed Contents and Embeddings for Personality Recognition With Decoder-Only Models


138. Flash Multi-Head Feed-Forward Network


139. Comparing BFGS and OGR for Second-Order Optimization


140. VideoVLA: Video Generators Can Be Generalizable Robot Manipulators


141. Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge


142. A Unifying Human-Centered AI Fairness Framework


143. Hidden Leaks in Time Series Forecasting: How Data Leakage Affects LSTM Evaluation Across Configurations and Validation Strategies


144. Adaptive Normalization Mamba with Multi Scale Trend Decomposition and Patch MoE Encoding


145. Evaluating the Sensitivity of BiLSTM Forecasting Models to Sequence Length and Input Noise


146. Deep Reinforcement Learning for Phishing Detection with Transformer-Based Semantic Features


147. NeuroABench: A Multimodal Evaluation Benchmark for Neurosurgical Anatomy Identification


148. SoK: Trust-Authorization Mismatch in LLM Agent Interactions


149. BabelCoder: Agentic Code Translation with Specification Alignment


150. JoPano: Unified Panorama Generation via Joint Modeling


151. WisPaper: Your AI Scholar Search Engine


152. Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior


153. ArchPower: Dataset for Architecture-Level Power Modeling of Modern CPU Design


154. Formal that “Floats” High: Formal Verification of Floating Point Arithmetic


155. Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs


156. CAuSE: Decoding Multimodal Classifiers using Faithful Natural Language Explanation


157. Partial Inverse Design of High-Performance Concrete Using Cooperative Neural Networks for Constraint-Aware Mix Generation


158. RMAdapter: Reconstruction-based Multi-Modal Adapter for Vision-Language Models


159. Optimal and Diffusion Transports in Machine Learning


160. Angular Regularization for Positive-Unlabeled Learning on the Hypersphere


161. From Description to Score: Can LLMs Quantify Vulnerabilities?


162. From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs


163. RDSplat: Robust Watermarking Against Diffusion Editing for 3D Gaussian Splatting


164. Stitch and Tell: A Structured Multimodal Data Augmentation Method for Spatial Understanding


165. VisChainBench: A Benchmark for Multi-Turn, Multi-Image Visual Reasoning Beyond Language Priors


166. Becoming Experienced Judges: Selective Test-Time Learning for Evaluators


167. PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance


168. Task-Model Alignment: A Simple Path to Generalizable AI-Generated Image Detection


169. Arc Gradient Descent: A Mathematically Derived Reformulation of Gradient Descent with Phase-Aware, User-Controlled Step Dynamics


170. A Patient-Doctor-NLP-System to contest inequality for less privileged


171. “The Dentist is an involved parent, the bartender is not”: Revealing Implicit Biases in QA with Implicit BBQ


172. The Role of Entropy in Visual Grounding: Analysis and Optimization


173. A Novel Deep Neural Network Architecture for Real-Time Water Demand Forecasting


174. A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations


175. Predictive Modeling of I/O Performance for Machine Learning Training Pipelines: A Data-Driven Approach to Storage Optimization


176. Mechanistic Interpretability of GPT-2: Lexical and Contextual Layers in Sentiment Analysis


177. GradientSpace: Unsupervised Data Clustering for Improved Instruction Tuning


178. Rethinking Robustness: A New Approach to Evaluating Feature Attribution Methods


179. Towards Small Language Models for Security Query Generation in SOC Workflows


180. TextMamba: Scene Text Detector with Mamba


181. GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering


182. Adaptive Test-Time Training for Predicting Need for Invasive Mechanical Ventilation in Multi-Center Cohorts


183. Financial Fraud Identification and Interpretability Study for Listed Companies Based on Convolutional Neural Network


184. Masked Autoencoder Pretraining on Strong-Lensing Images for Joint Dark-Matter Model Classification and Super-Resolution


185. Memory Power Asymmetry in Human-AI Relationships: Preserving Mutual Forgetting in the Digital Age


186. ChargingBoul: A Competitive Negotiating Agent with Novel Opponent Modeling


187. Beyond Satisfaction: From Placebic to Actionable Explanations For Enhanced Understandability


188. Towards Efficient Hypergraph and Multi-LLM Agent Recommender Systems


189. QL-LSTM: A Parameter-Efficient LSTM for Stable Long-Sequence Modeling


190. Deep Manifold Part 2: Neural Network Mathematics


191. SUGAR: A Sweeter Spot for Generative Unlearning of Many Identities


192. Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks


193. BEACON: A Unified Behavioral-Tactical Framework for Explainable Cybercrime Analysis with Large Language Models


194. A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation


195. Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning


196. Novel Deep Learning Architectures for Classification and Segmentation of Brain Tumors from MRI Images


197. ShadowWolf – Automatic Labelling, Evaluation and Model Training Optimised for Camera Trap Wildlife Images


198. AI as “Co-founder”: GenAI for Entrepreneurship


199. Method of UAV Inspection of Photovoltaic Modules Using Thermal and RGB Data Fusion


200. PRIMRose: Insights into the Per-Residue Energy Metrics of Proteins with Double InDel Mutations using Deep Learning


201. Classifying German Language Proficiency Levels Using Large Language Models


202. Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control


203. Instance Dependent Testing of Samplers using Interval Conditioning


204. Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices


205. When Gender is Hard to See: Multi-Attribute Support for Long-Range Recognition


206. Rethinking Training Dynamics in Scale-wise Autoregressive Generation


207. AgenticCyber: A GenAI-Powered Multi-Agent System for Multimodal Threat Detection and Adaptive Response in Cybersecurity


208. RLAX: Large-Scale, Distributed Reinforcement Learning for Large Language Models on TPUs


209. Web Technologies Security in the AI Era: A Survey of CDN-Enhanced Defenses


210. Protecting Bystander Privacy via Selective Hearing in LALMs


211. Proportional integral derivative booster for neural networks-based time-series prediction: Case of water demand prediction


212. Why They Disagree: Decoding Differences in Opinions about AI Risk on the Lex Fridman Podcast


213. When Distance Distracts: Representation Distance Bias in BT-Loss for Reward Models


214. Exploiting Spatiotemporal Properties for Efficient Event-Driven Human Pose Estimation


215. Degrading Voice: A Comprehensive Overview of Robust Voice Conversion Through Input Manipulation


216. Chemistry Integrated Language Model using Hierarchical Molecular Representation for Polymer Informatics


217. Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks


218. Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models


219. RefBench-PRO: Perceptual and Reasoning Oriented Benchmark for Referring Expression Comprehension


220. Networked Restless Multi-Arm Bandits with Reinforcement Learning


221. Who Will Top the Charts? Multimodal Music Popularity Prediction via Adaptive Fusion of Modality Experts and Temporal Engagement Modeling


222. Convergence of Outputs When Two Large Language Models Interact in a Multi-Agentic Setup


223. DUET: Agentic Design Understanding via Experimentation and Testing


224. Auto-exploration for online reinforcement learning


225. Quantifying Memory Use in Reinforcement Learning with Temporal Range


226. Do You Feel Comfortable? Detecting Hidden Conversational Escalation in AI Chatbots


227. Multi-Modal Zero-Shot Prediction of Color Trajectories in Food Drying


228. DEFEND: Poisoned Model Detection and Malicious Client Exclusion Mechanism for Secure Federated Learning-based Road Condition Classification


229. Learning Invariant Graph Representations Through Redundant Information


230. Physics-Informed Neural Koopman Machine for Interpretable Longitudinal Personalized Alzheimer’s Disease Forecasting


231. Toward Patch Robustness Certification and Detection for Deep Learning Systems Beyond Consistent Samples


232. WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving


233. Future You: Designing and Evaluating Multimodal AI-generated Digital Twins for Strengthening Future Self-Continuity


234. Explainable Melanoma Diagnosis with Contrastive Learning and LLM-based Report Generation


235. JaxWildfire: A GPU-Accelerated Wildfire Simulator for Reinforcement Learning


236. Empathy by Design: Aligning Large Language Models for Healthcare Dialogue


237. EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing


238. When Privacy Isn’t Synthetic: Hidden Data Leakage in Generative AI Models


239. Reinforcement Learning Integrated Agentic RAG for Software Test Cases Authoring


240. The Road of Adaptive AI for Precision in Cybersecurity


241. Beyond Prototyping: Autonomous, Enterprise-Grade Frontend Development from Pixel to Production via a Specialized Multi-Agent Framework


242. Auto-SPT: Automating Semantic Preserving Transformations for Code


243. Physics-Guided Deepfake Detection for Voice Authentication Systems


244. The SAM2-to-SAM3 Gap in the Segment Anything Model Family: Why Prompt-Based Expertise Fails in Concept-Driven Image Segmentation


245. PrefGen: Multimodal Preference Learning for Preference-Conditioned Image Generation


246. Uncovering Students’ Inquiry Patterns in GenAI-Supported Clinical Practice: An Integration of Epistemic Network Analysis and Sequential Pattern Mining


247. Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization


248. POrTAL: Plan-Orchestrated Tree Assembly for Lookahead


249. KidSpeak: A General Multi-purpose LLM for Kids’ Speech Recognition and Screening


250. Domain-Specific Foundation Model Improves AI-Based Analysis of Neuropathology


251. VG3T: Visual Geometry Grounded Gaussian Transformer


252. Adaptive Dataset Quantization: A New Direction for Dataset Pruning


253. FlockVote: LLM-Empowered Agent-Based Modeling for Simulating U.S. Presidential Elections


254. Accelerating Materials Discovery: Learning a Universal Representation of Chemical Processes for Cross-Domain Property Prediction


255. A Multi-objective Optimization Approach for Feature Selection in Gentelligent Systems


256. Video Models Start to Solve Chess, Maze, Sudoku, Mental Rotation, and Raven’ Matrices