전체 AI 논문 - 2026-04-08

1. Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents


2. ACE-Bench: Agent Configurable Evaluation with Scalable Horizons and Controllable Difficulty under Lightweight Environments


3. Artificial Intelligence and the Structure of Mathematics


4. How LLMs Follow Instructions: Skillful Coordination, Not a Universal Mechanism


5. Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis


6. Flowr – Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains


7. Beyond Compromise: Pareto-Lenient Consensus for Efficient Multi-Preference LLM Alignment


8. Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration


9. MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning


10. Context-Value-Action Architecture for Value-Driven Large Language Model Agents


11. HybridKV: Hybrid KV Cache Compression for Efficient Multimodal Large Language Model Inference


12. Joint Knowledge Base Completion and Question Answering by Combining Large Language Models and Small Language Models


13. JTON: A Token-Efficient JSON Superset with Zen Grid Tabular Encoding for Large Language Models


14. When Do We Need LLMs? A Diagnostic for Language-Driven Bandits


15. Deep Researcher Agent: An Autonomous Framework for 24/7 Deep Learning Experimentation with Zero-Cost Monitoring


16. Vision-Guided Iterative Refinement for Frontend Code Generation


17. Reciprocal Trust and Distrust in Artificial Intelligence Systems: The Hard Problem of Regulation


18. Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents


19. Emergent social transmission of model-based representations without inference


20. Can Large Language Models Reinvent Foundational Algorithms?


21. QA-MoE: Towards a Continuous Reliability Spectrum with Quality-Aware Mixture of Experts for Robust Multimodal Sentiment Analysis


22. LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo


23. CuraLight: Debate-Guided Data Curation for LLM-Centered Traffic Signal Control


24. PECKER: A Precisely Efficient Critical Knowledge Erasure Recipe For Machine Unlearning in Diffusion Models


25. Beyond Behavior: Why AI Evaluation Needs a Cognitive Revolution


26. Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge


27. ResearchEVO: An End-to-End Framework for Automated Scientific Discovery and Documentation


28. COSMO-Agent: Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration


29. From Large Language Model Predicates to Logic Tensor Networks: Neurosymbolic Offer Validation in Regulated Procurement


30. A canonical generalization of OBDD


31. SignalClaw: LLM-Guided Evolutionary Synthesis of Interpretable Traffic Signal Control Skills


32. Experience Transfer for Multimodal LLM Agents in Minecraft Game


33. Inventory of the 12 007 Low-Dimensional Pseudo-Boolean Landscapes Invariant to Rank, Translation, and Rotation


34. ActivityEditor: Learning to Synthesize Physically Valid Human Mobility


35. Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition


36. UniCreative: Unifying Long-form Logic and Short-form Sparkle via Reference-Free Reinforcement Learning


37. OmniDiagram: Advancing Unified Diagram Code Generation via Visual Interrogation Reward


38. Thinking Diffusion: Penalize and Guide Visual-Grounded Reasoning in Diffusion Multimodal Language Models


39. SCMAPR: Self-Correcting Multi-Agent Prompt Refinement for Complex-Scenario Text-to-Video Generation


40. Auditable Agents


41. Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning


42. OntoTKGE: Ontology-Enhanced Temporal Knowledge Graph Extrapolation


43. Adaptive Serverless Resource Management via Slot-Survival Prediction and Event-Driven Lifecycle Control


44. Automated Auditing of Hospital Discharge Summaries for Care Transitions


45. PRISM-MCTS: Learning from Reasoning Trajectories with Metacognitive Reflection


46. Multi-Agent Pathfinding with Non-Unit Integer Edge Costs via Enhanced Conflict-Based Search and Graph Discretization


47. CODESTRUCT: Code Agents over Structured Action Spaces


48. HYVE: Hybrid Views for LLM Context Engineering over Machine Data


49. Reason Analogically via Cross-domain Prior Knowledge: An Empirical Study of Cross-domain Knowledge Transfer for In-Context Learning


50. Neural Assistive Impulses: Synthesizing Exaggerated Motions for Physics-based Characters


51. Towards Effective In-context Cross-domain Knowledge Transfer via Domain-invariant-neurons-based Retrieval


52. LLM-as-Judge for Semantic Judging of Powerline Segmentation in UAV Inspection


53. TFRBench: A Reasoning Benchmark for Evaluating Forecasting Systems


54. LatentAudit: Real-Time White-Box Faithfulness Monitoring for Retrieval-Augmented Generation with Verifiable Deployment


55. ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning


56. From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs


57. Dynamic Agentic AI Expert Profiler System Architecture for Multidomain Intelligence Modeling


58. TRACE: Capability-Targeted Agentic Training


59. Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills


60. Breakthrough the Suboptimal Stable Point in Value-Factorization-Based Multi-Agent Reinforcement Learning


61. Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition


62. Simulating the Evolution of Alignment and Values in Machine Intelligence


63. EAGLE: Edge-Aware Graph Learning for Proactive Delivery Delay Prediction in Smart Logistics Networks


64. From Governance Norms to Enforceable Controls: A Layered Translation Method for Runtime Guardrails in Agentic AI


65. Attribution Bias in Large Language Models


66. ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces


67. Instruction-Tuned LLMs for Parsing and Mining Unstructured Logs on Leadership HPC Systems


68. Learning to Focus: CSI-Free Hierarchical MARL for Reconfigurable Reflectors


69. Bypassing the CSI Bottleneck: MARL-Driven Spatial Control for Reflector Arrays


70. IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents


71. A mathematical theory of evolution for self-designing AIs


72. Non-monotonic causal discovery with Kolmogorov-Arnold Fuzzy Cognitive Maps


73. Uncertainty-Guided Latent Diagnostic Trajectory Learning for Sequential Clinical Diagnosis


74. MedGemma 1.5 Technical Report


75. MMORF: A Multi-agent Framework for Designing Multi-objective Retrosynthesis Planning Systems


76. Part-Level 3D Gaussian Vehicle Generation with Joint and Hinge Axis Estimation


77. PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing


78. Algebraic Structure Discovery for Real World Combinatorial Optimisation Problems: A General Framework from Abstract Algebra to Quotient Space Learning


79. ReVEL: Multi-Turn Reflective LLM-Guided Heuristic Evolution via Structured Performance Feedback


80. Proximity Measure of Information Object Features for Solving the Problem of Their Identification in Information Systems


81. Operational Noncommutativity in Sequential Metacognitive Judgments


82. Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning through Navya-Nyaya


83. In-Place Test-Time Training


84. DiffHDR: Re-Exposing LDR Videos with Video Diffusion Models


85. MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control


86. Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement


87. Who Governs the Machine? A Machine Identity Governance Taxonomy (MIGT) for AI Systems Operating Across Enterprise and Geopolitical Boundaries


88. Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization


89. Shot-Based Quantum Encoding: A Data-Loading Paradigm for Quantum Neural Networks


90. PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer


91. Gym-Anything: Turn any Software into an Agent Environment


92. Lightweight Multimodal Adaptation of Vision Language Models for Species Recognition and Habitat Context Interpretation in Drone Thermal Imagery


93. LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering


94. Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives


95. LAG-XAI: A Lie-Inspired Affine Geometric Framework for Interpretable Paraphrasing in Transformer Latent Spaces


96. Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning


97. Graph-PiT: Enhancing Structural Coherence in Part-Based Image Synthesis via Graph Priors


98. Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles


99. A Multi-Stage Validation Framework for Trustworthy Large-scale Clinical Information Extraction using Large Language Models


100. CritBench: A Framework for Evaluating Cybersecurity Capabilities of Large Language Models in IEC 61850 Digital Substation Environments


101. Governance and Regulation of Artificial Intelligence in Developing Countries: A Case Study of Nigeria


102. The Model Agreed, But Didn’t Learn: Diagnosing Surface Compliance in Large Language Models


103. A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms


104. Does Pass Rate Tell the Whole Story? Evaluating Design Constraint Compliance in LLM-based Issue Resolution


105. Polynomial-Time Algorithm for Thiele Voting Rules with Voter Interval Preferences


106. Saliency-Guided Representation with Consistency Policy Learning for Visual Unsupervised Reinforcement Learning


107. “I See What You Did There”: Can Large Vision-Language Models Understand Multimodal Puns?


108. ReLU Networks for Exact Generation of Similar Graphs


109. Selective Aggregation of Attention Maps Improves Diffusion-Based Visual Interpretation


110. Automatic dental superimposition of 3D intraorals and 2D photographs for human identification


111. Swiss-Bench 003: Evaluating LLM Reliability and Adversarial Security for Swiss Regulatory Contexts


112. Neural Network Pruning via QUBO Optimization


113. Evaluating Learner Representations for Differentiation Prior to Instructional Outcomes


114. EEG-MFTNet: An Enhanced EEGNet Architecture with Multi-Scale Temporal Convolutions and Transformer Fusion for Cross-Session Motor Imagery Decoding


115. “OK Aura, Be Fair With Me”: Demographics-Agnostic Training for Bias Mitigation in Wake-up Word Detection


116. What Models Know, How Well They Know It: Knowledge-Weighted Fine-Tuning for Learning When to Say “I Don’t Know”


117. CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models


118. On the Robustness of Diffusion-Based Image Compression to Bit-Flip Errors


119. Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing



121. CRFT: Consistent-Recurrent Feature Flow Transformer for Cross-Modal Image Registration


122. Attention Editing: A Versatile Framework for Cross-Architecture Attention Conversion


123. From Incomplete Architecture to Quantified Risk: Multimodal LLM-Driven Security Assessment for Cyber-Physical Systems


124. Rectified Schrödinger Bridge Matching for Few-Step Visual Navigation


125. SnapFlow: One-Step Action Generation for Flow-Matching VLAs via Progressive Self-Distillation


126. LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals


127. Multiscale Physics-Informed Neural Network for Complex Fluid Flows with Long-Range Dependencies


128. Analogical Reasoning as a Doctor: A Foundation Model for Gastrointestinal Endoscopy Diagnosis


129. Semantic-Topological Graph Reasoning for Language-Guided Pulmonary Screening


130. Evaluation of Randomization through Style Transfer for Enhanced Domain Generalization


131. INTERACT: An AI-Driven Extended Reality Framework for Accesible Communication Featuring Real-Time Sign Language Interpretation and Emotion Recognition


132. AI-Driven Modular Services for Accessible Multilingual Education in Immersive Extended Reality Settings: Integrating Speech Processing, Translation, and Sign Language Rendering


133. Foundations for Agentic AI Investigations from the Forensic Analysis of OpenClaw


134. Context-Agent: Dynamic Discourse Trees for Non-Linear Dialogue


135. FastDiSS: Few-step Match Many-step Diffusion Language Model on Sequence-to-Sequence Generation–Full Version


136. Turbulence-like 5/3 spectral scaling in contextual representations of language as a complex system


137. Controllable Singing Style Conversion with Boundary-Aware Information Bottleneck


138. Learned Elevation Models as a Lightweight Alternative to LiDAR for Radio Environment Map Estimation


139. Unifying VLM-Guided Flow Matching and Spectral Anomaly Detection for Interpretable Veterinary Diagnosis


140. On the Role of Fault Localization Context for LLM-Based Program Repair


141. LLM Evaluation as Tensor Completion: Low Rank Structure and Semiparametric Efficiency


142. MA-IDS: Multi-Agent RAG Framework for IoT Network Intrusion Detection with an Experience Library


143. Learning What Matters: Dynamic Dimension Selection and Aggregation for Interpretable Vision-Language Reward Modeling


144. LanG – A Governance-Aware Agentic AI Platform for Unified Security Operations


145. Human Interaction-Aware 3D Reconstruction from a Single Image


146. Your LLM Agent Can Leak Your Data: Data Exfiltration via Backdoored Tool Use


147. Bridging Natural Language and Microgrid Dynamics: A Context-Aware Simulator and Dataset


148. ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads


149. VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG



151. 3DTurboQuant: Training-Free Near-Optimal Quantization for 3D Reconstruction Models


152. OGA-AID: Clinician-in-the-loop AI Report Drafting Assistant for Multimodal Observational Gait Analysis in Post-Stroke Rehabilitation


153. DQA: Diagnostic Question Answering for IT Support


154. Anchored Cyclic Generation: A Novel Paradigm for Long-Sequence Symbolic Music Generation


155. LLMs Should Express Uncertainty Explicitly


156. Broken by Default: A Formal Verification Study of Security Vulnerabilities in AI-Generated Code


157. Spec Kit Agents: Context-Grounded Agentic Workflows


158. Region-R1: Reinforcing Query-Side Region Cropping for Multi-Modal Re-Ranking


159. Extending Tabular Denoising Diffusion Probabilistic Models for Time-Series Data Generation


160. Exemplar Retrieval Without Overhypothesis Induction: Limits of Distributional Sequence Learning in Early Word Learning


161. XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts


162. Curvature-Aware Optimization for High-Accuracy Physics-Informed Neural Networks


163. RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains


164. Improving Clinical Trial Recruitment using Clinical Narratives and Large Language Models


165. OrthoFuse: Training-free Riemannian Fusion of Orthogonal Style-Concept Adapters for Diffusion Models


166. LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows


167. Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI


168. From Use to Oversight: How Mental Models Influence User Behavior and Output in AI Writing Assistants


169. Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning


170. What Makes a Good Response? An Empirical Analysis of Quality in Qualitative Interviews


171. Planning to Explore: Curiosity-Driven Planning for LLM Test Generation


172. Compiled AI: Deterministic Code Generation for LLM-Based Workflow Automation


173. EffiPair: Improving the Efficiency of LLM-generated Code with Relative Contrastive Feedback


174. Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning


175. Offline RL for Adaptive Policy Retrieval in Prior Authorization


176. Watch Before You Answer: Learning from Visually Grounded Post-Training


177. $π^2$: Structure-Originated Reasoning Data Improves Long-Context Reasoning Ability of Large Language Models


178. CRAB: Codebook Rebalancing for Bias Mitigation in Generative Recommendation


179. Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner


180. Simultaneous Dual-View Mammogram Synthesis Using Denoising Diffusion Probabilistic Models


181. Edit, But Verify: An Empirical Audit of Instructed Code-Editing Benchmarks


182. Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation


183. Nidus: Externalized Reasoning for AI-Assisted Engineering


184. Feature-Aware Anisotropic Local Differential Privacy for Utility-Preserving Graph Representation Learning in Metal Additive Manufacturing


185. AutoLALA: Automatic Loop Algebraic Locality Analysis for AI and HPC Kernels


186. Dynamic Linear Coregionalization for Realistic Synthetic Multivariate Time Series


187. This Treatment Works, Right? Evaluating LLM Sensitivity to Patient Question Framing in Medical QA


188. PCA-Driven Adaptive Sensor Triage for Edge AI Inference


189. ID-Sim: An Identity-Focused Similarity Metric


190. Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space


191. StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing


192. Scaling Coding Agents via Atomic Skills


193. Comparative Characterization of KV Cache Management Strategies for LLM Inference


194. YMIR: A new Benchmark Dataset and Model for Arabic Yemeni Music Genre Classification Using Convolutional Neural Networks


195. Generalizable Audio-Visual Navigation via Binaural Difference Attention and Action Transition Prediction


196. EduIllustrate: Towards Scalable Automated Generation Of Multimodal Educational Content


197. Learning Stable Predictors from Weak Supervision under Distribution Shift


198. Closed-Loop Autonomous Software Development via Jira-Integrated Backlog Orchestration: A Case Study in Deterministic Control and Safety-Constrained Automation


199. PRIME: Prototype-Driven Multimodal Pretraining for Cancer Prognosis with Missing Modalities


200. Evaluation of Embedding-Based and Generative Methods for LLM-Driven Document Classification: Opportunities and Challenges


201. FreakOut-LLM: The Effect of Emotional Stimuli on Safety Alignment


202. Architecture Without Architects: How AI Coding Agents Shape Software Architecture


203. Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression


204. Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling


205. CURE:Circuit-Aware Unlearning for LLM-based Recommendation


206. Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents


207. Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code’s Auto Mode


208. MG$^2$-RAG: Multi-Granularity Graph for Multimodal Retrieval-Augmented Generation


209. Self-Supervised Foundation Model for Calcium-imaging Population Dynamics


210. The Planetary Cost of AI Acceleration, Part II: The 10th Planetary Boundary and the 6.5-Year Countdown


211. Generative AI for Video Trailer Synthesis: From Extractive Heuristics to Autoregressive Creativity


212. Synthetic Trust Attacks: Modeling How Generative AI Manipulates Human Decisions in Social Engineering Fraud


213. Learning to Retrieve from Agent Trajectories


214. From PDF to RAG-Ready: Evaluating Document Conversion Frameworks for Domain-Specific Question Answering


215. SUMMIR: A Hallucination-Aware Framework for Ranking Sports Insights from LLMs


216. Inclusion-of-Thoughts: Mitigating Preference Instability via Purifying the Decision Space


217. The Illusion of Latent Generalization: Bi-directionality and the Reversal Curse


218. TDA-RC: Task-Driven Alignment for Knowledge-Based Reasoning Chains in Large Language Models


219. Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems


220. Contextuality as an External Bookkeeping Cost under Fixed Shared-State Semantics