전체 AI 논문 - 2026-03-23

1. Learning Dynamic Belief Graphs for Theory-of-mind Reasoning


2. Pitfalls in Evaluating Interpretability Agents


3. DIAL-KG: Schema-Free Incremental Knowledge Graph Construction via Dynamic Schema Induction and Evolution-Intent Assessment


4. Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs


5. On the Ability of Transformers to Verify Plans


6. Utility-Guided Agent Orchestration for Efficient LLM Tool Use


7. FormalEvolve: Neuro-Symbolic Evolutionary Search for Diverse and Prover-Effective Autoformalization


8. Embodied Science: Closing the Discovery Loop with Agentic Embodied AI


9. Stepwise: Neuro-Symbolic Proof Search for Automated Systems Verification


10. A Subgoal-driven Framework for Improving Long-Horizon LLM Agents


11. HyEvo: Self-Evolving Hybrid Agentic Workflows for Efficient Reasoning


12. PowerLens: Taming LLM Agents for Safe and Personalized Mobile Power Management


13. PA2D-MORL: Pareto Ascent Directional Decomposition based Multi-Objective Reinforcement Learning


14. ItinBench: Benchmarking Planning Across Multiple Cognitive Dimensions with Large Language Models


15. Learning to Disprove: Formal Counterexample Generation with Large Language Models


16. Teaching an Agent to Sketch One Part at a Time


17. Hyperagents


18. When both Grounding and not Grounding are Bad – A Partially Grounded Encoding of Planning into SAT (Extended Version)


19. From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering


20. LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation


21. VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking


22. Improving Generalization on Cybersecurity Tasks with Multi-Modal Contrastive Learning


23. Adaptive Greedy Frame Selection for Long Video Understanding


24. AI Agents Can Already Autonomously Perform Experimental High Energy Physics


25. Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation


26. The Robot’s Inner Critic: Self-Refinement of Social Behaviors through VLM-based Replanning


27. Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models


28. Design-OS: A Specification-Driven Framework for Engineering System Design with a Control-Systems Design Case


29. Enhancing Hyperspace Analogue to Language (HAL) Representations via Attention-Based Pooling for Text Classification


30. An Agentic Multi-Agent Architecture for Cybersecurity Risk Management


31. Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models


32. Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning


33. Demonstration of Adapt4Me: An Uncertainty-Aware Authoring Environment for Personalizing Automatic Speech Recognition to Non-normative Speech


34. Var-JEPA: A Variational Formulation of the Joint-Embedding Predictive Architecture – Bridging Predictive and Generative Self-Supervised Learning


35. The $\mathbf{Y}$-Combinator for LLMs: Solving Long-Context Rot with $λ$-Calculus


36. Spectral Alignment in Forward-Backward Representations via Temporal Abstraction


37. An Empirical Study of SFT-DPO Interaction and Parameterization in Small Language Models


38. LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace Domain


39. Agentic Harness for Real-World Compilers


40. Fine-tuning Timeseries Predictors Using Reinforcement Learning


41. The End of Rented Discovery: How AI Search Redistributes Power Between Hotels and Intermediaries


42. LoASR-Bench: Evaluating Large Speech Language Models on Low-Resource Automatic Speech Recognition Across Language Families


43. CoverageBench: Evaluating Information Coverage across Tasks and Domains


44. Orchestrating Human-AI Software Delivery: A Retrospective Longitudinal Field Study of Three Software Modernization Programs



46. Physics-Informed Long-Range Coulomb Correction for Machine-learning Hamiltonians


47. Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States


48. X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving


49. Promoting Critical Thinking With Domain-Specific Generative AI Provocations


50. Trojan’s Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance


51. Graph2TS: Structure-Controlled Time Series Generation via Quantile-Graph VAEs


52. HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction


53. RAM: Recover Any 3D Human Motion in-the-Wild


54. Span-Level Machine Translation Meta-Evaluation


55. Learning Like Humans: Analogical Concept Learning for Generalized Category Discovery


56. Revealing Domain-Spatiality Patterns for Configuration Tuning: Domain Knowledge Meets Fitness Landscapes


57. Integrating Meta-Features with Knowledge Graph Embeddings for Meta-Learning


58. What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time


59. Failure Modes for Deep Learning-Based Online Mapping: How to Measure and Address Them


60. Semantic Delta: An Interpretable Signal Differentiating Human and LLMs Dialogue


61. Gesture2Speech: How Far Can Hand Movements Shape Expressive Speech?


62. FrameNet Semantic Role Classification by Analogy


63. Enhancing Alignment for Unified Multimodal Models via Semantically-Grounded Supervision


64. Offshore oil and gas platform dynamics in the North Sea, Gulf of Mexico, and Persian Gulf: Exploiting the Sentinel-1 archive


65. Learning Hierarchical Orthogonal Prototypes for Generalized Few-Shot 3D Point Cloud Segmentation


66. Uncertainty-aware Prototype Learning with Variational Inference for Few-shot Point Cloud Segmentation


67. MOSS-TTSD: Text to Spoken Dialogue Generation


68. FedRG: Unleashing the Representation Geometry for Federated Learning with Noisy Clients


69. AIGQ: An End-to-End Hybrid Generative Architecture for E-commerce Query Recommendation


70. GoAgent: Group-of-Agents Communication Topology Generation for LLM-based Multi-Agent Systems


71. ATHENA: Adaptive Test-Time Steering for Improving Count Fidelity in Diffusion Models


72. Toward High-Fidelity Visual Reconstruction: From EEG-Based Conditioned Generation to Joint-Modal Guided Rebuilding


73. The Residual Stream Is All You Need: On the Redundancy of the KV Cache in Transformer Inference


74. PolicySim: An LLM-Based Agent Social Simulation Sandbox for Proactive Policy Optimization


75. OmniDiT: Extending Diffusion Transformer to Omni-VTON Framework


76. MetaCues: Enabling Critical Engagement with Generative AI for Information Seeking and Sensemaking


77. Dual Prompt-Driven Feature Encoding for Nighttime UAV Tracking


78. DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management


79. CAF-Score: Calibrating CLAP with LALMs for Reference-free Audio Captioning Evaluation


80. LoD-Loc v3: Generalized Aerial Localization in Dense Cities using Instance Silhouette Alignment


81. FB-CLIP: Fine-Grained Zero-Shot Anomaly Detection with Foreground-Background Disentanglement


82. Physics-Informed Neural Network with Adaptive Clustering Learning Mechanism for Information Popularity Prediction


83. ARMOR: Adaptive Resilience Against Model Poisoning Attacks in Continual Federated Learning for Mobile Indoor Localization


84. Data-driven ensemble prediction of the global ocean


85. Skilled AI Agents for Embedded and IoT Systems Development


86. Evolving Embodied Intelligence: Graph Neural Network–Driven Co-Design of Morphology and Control in Soft Robotics



88. PFM-VEPAR: Prompting Foundation Models for RGB-Event Camera based Pedestrian Attribute Recognition



90. Optimal Scalar Quantization for Matrix Multiplication: Closed-Form Density and Phase Transition


91. Plagiarism or Productivity? Students Moral Disengagement and Behavioral Intentions to Use ChatGPT in Academic Writing


92. Subspace Kernel Learning on Tensor Sequences


93. FDARxBench: Benchmarking Regulatory and Clinical Reasoning on FDA Generic Drug Assessment


94. dinov3.seg: Open-Vocabulary Semantic Segmentation with DINOv3


95. Depictions of Depression in Generative AI Video Models: A Preliminary Study of OpenAI’s Sora 2


96. Inducing Sustained Creativity and Diversity in Large Language Models


97. Gastric-X: A Multimodal Multi-Phase Benchmark Dataset for Advancing Vision-Language Models in Gastric Cancer Analysis


98. FedAgain: A Trust-Based and Robust Federated Learning Strategy for an Automated Kidney Stone Identification in Ureteroscopy


99. Linear Social Choice with Few Queries: A Moment-Based Approach


100. Beyond the Desk: Barriers and Future Opportunities for AI to Assist Scientists in Embodied Physical Tasks


101. TRACE: Trajectory Recovery with State Propagation Diffusion for Urban Mobility


102. Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL


103. A Framework for Formalizing LLM Agent Security


104. Global Convergence of Multiplicative Updates for the Matrix Mechanism: A Collaborative Proof with Gemini 3


105. TrustFlow: Topic-Aware Vector Reputation Propagation for Multi-Agent Ecosystems


106. LoFi: Location-Aware Fine-Grained Representation Learning for Chest X-ray


107. Vocabulary shapes cross-lingual variation of word-order learnability in language models


108. Is Evaluation Awareness Just Format Sensitivity? Limitations of Probe-Based Evidence under Controlled Prompt Structure


109. The Autonomy Tax: Defense Training Breaks LLM Agents


110. Investigating In-Context Privacy Learning by Integrating User-Facing Privacy Tools into Conversational Agents


111. Scalable Prompt Routing via Fine-Grained Latent Task Discovery


112. A Novel Solution for Zero-Day Attack Detection in IDS using Self-Attention and Jensen-Shannon Divergence in WGAN-GP


113. Beyond Weighted Summation: Learnable Nonlinear Aggregation Functions for Robust Artificial Neurons


114. Spectral Tempering for Embedding Compression in Dense Passage Retrieval


115. Diffusion-Guided Semantic Consistency for Multimodal Heterogeneity


116. Do Post-Training Algorithms Actually Differ? A Controlled Study Across Model Scales Uncovers Scale-Dependent Ranking Inversions


117. POET: Power-Oriented Evolutionary Tuning for LLM-Based RTL PPA Optimization


118. PAI: Fast, Accurate, and Full Benchmark Performance Projection with AI


119. Goedel-Code-Prover: Hierarchical Proof Search for Open State-of-the-Art Code Verification


120. Target Concept Tuning Improves Extreme Weather Forecasting


121. A General Deep Learning Framework for Wireless Resource Allocation under Discrete Constraints


122. Prompt-tuning with Attribute Guidance for Low-resource Entity Matching


123. Ternary Gamma Semirings: From Neural Implementation to Categorical Foundations


124. Memory-Driven Role-Playing: Evaluation and Enhancement of Persona Knowledge Utilization in LLMs


125. LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels


126. MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels


127. GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space


128. Exploring Subnetwork Interactions in Heterogeneous Brain Network via Prior-Informed Graph Learning



130. PhyGile: Physics-Prefix Guided Motion Generation for Agile General Humanoid Motion Tracking


131. Agreement Between Large Language Models, Human Reviewers, and Authors in Evaluating STROBE Checklists for Observational Studies in Rheumatology


132. Parameter-Efficient Token Embedding Editing for Clinical Class-Level Unlearning


133. Maximizing mutual information between user-contexts and responses improve LLM personalization with no additional data


134. LLM-MRD: LLM-Guided Multi-View Reasoning Distillation for Fake News Detection


135. Automatic Analysis of Collaboration Through Human Conversational Data Resources: A Review


136. A Visualization for Comparative Analysis of Regression Models


137. Neural Dynamics Self-Attention for Spiking Transformers


138. Speculating Experts Accelerates Inference for Mixture-of-Experts


139. Joint Return and Risk Modeling with Deep Neural Networks for Portfolio Construction


140. Generalized Stock Price Prediction for Multiple Stocks Combined with News Fusion


141. CDEoH: Category-Driven Automatic Algorithm Design With Large Language Models


142. Framing Effects in Independent-Agent Large Language Models: A Cross-Family Behavioral Analysis


143. URAG: A Benchmark for Uncertainty Quantification in Retrieval-Augmented Large Language Models


144. From Feature-Based Models to Generative AI: Validity Evidence for Constructed Response Scoring


145. HypeLoRA: Hyper-Network-Generated LoRA Adapters for Calibrated Language Model Fine-Tuning


146. From Flat to Structural: Enhancing Automated Short Answer Grading with GraphRAG


147. Improving Automatic Summarization of Radiology Reports through Mid-Training of Large Language Models


148. CURE: A Multimodal Benchmark for Clinical Understanding and Retrieval Evaluation


149. LSR: Linguistic Safety Robustness Benchmark for Low-Resource West African Languages


150. Transformers are Stateless Differentiable Neural Computers


151. A Human-Centered Workflow for Using Large Language Models in Content Analysis


152. Full-Stack Domain Enhancement for Combustion LLMs: Construction and Optimization


153. Probing to Refine: Reinforcement Distillation of LLMs via Explanatory Inversion


154. When the Pure Reasoner Meets the Impossible Object: Analytic vs. Synthetic Fine-Tuning and the Suppression of Genesis in Language Models


155. Generative Active Testing: Efficient LLM Evaluation via Proxy Task Adaptation


156. How Motivation Relates to Generative AI Use: A Large-Scale Survey of Mexican High School Students


157. The α-Law of Observable Belief Revision in Large Language Model Inference


158. HATL: Hierarchical Adaptive-Transfer Learning Framework for Sign Language Machine Translation


159. Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis


160. MAPLE: Metadata Augmented Private Language Evolution


161. LARFT: Closing the Cognition-Action Gap for Length Instruction Following in Large Language Models


162. A comprehensive study of LLM-based argument classification: from Llama through DeepSeek to GPT-5.2


163. GeoChallenge: A Multi-Answer Multiple-Choice Benchmark for Geometric Reasoning with Diagrams


164. DuCCAE: A Hybrid Engine for Immersive Conversation via Collaboration, Augmentation, and Evolution


165. When Prompt Optimization Becomes Jailbreaking: Adaptive Red-Teaming of Large Language Models


166. L-PRISMA: An Extension of PRISMA in the Era of Generative Artificial Intelligence (GenAI)