전체 AI 논문 - 2026-04-27

1. Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond


2. Rethinking Math Reasoning Evaluation: A Robust LLM-as-a-Judge Framework Beyond Symbolic Rigidity


3. QuantClaw: Precision Where It Matters for OpenClaw


4. On the Hybrid Nature of ABPMS Process Frames and its Implications on Automated Process Discovery


5. Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents


6. From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company


7. AgentSearchBench: A Benchmark for AI Agent Search in the Wild


8. CognitiveTwin: Robust Multi-Modal Digital Twins for Predicting Cognitive Decline in Alzheimer’s Disease


9. Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models


10. When Does LLM Self-Correction Help? A Control-Theoretic Markov Diagnostic and Verify-First Intervention


11. Emergent Strategic Reasoning Risks in AI: A Taxonomy-Driven Evaluation Framework


12. Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents


13. Sound Agentic Science Requires Adversarial Experiments


14. Rethinking Publication: A Certification Framework for AI-Enabled Research


15. Read the Paper, Write the Code: Agentic Reproduction of Social-Science Results


16. MolClaw: An Autonomous Agent with Hierarchical Skills for Drug Molecule Evaluation, Screening, and Optimization


17. An Artifact-based Agent Framework for Adaptive and Reproducible Medical Image Processing


18. Math Takes Two: A test for emergent mathematical reasoning in communication


19. An Undecidability Proof for the Plan Existence Problem


20. Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation


21. CRAFT: Clustered Regression for Adaptive Filtering of Training data


22. How Supply Chain Dependencies Complicate Bias Measurement and Accountability Attribution in AI Hiring Applications


23. Rethinking XAI Evaluation: A Human-Centered Audit of Shapley Benchmarks in High-Stakes Settings


24. From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification


25. Learning Evidence Highlighting for Frozen LLMs


26. Data-Free Contribution Estimation in Federated Learning using Gradient von Neumann Entropy


27. Cross-Stage Coherence in Hierarchical Driving VQA: Explicit Baselines and Learned Gated Context Projectors


28. SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning


29. QDTraj: Exploration of Diverse Trajectory Primitives for Articulated Objects Robotic Manipulation


30. ArmSSL: Adversarial Robust Black-Box Watermarking for Self-Supervised Learning Pre-trained Encoders


31. Controllable Spoken Dialogue Generation: An LLM-Driven Grading System for K-12 Non-Native English Learners


32. On the Properties of Feature Attribution for Supervised Contrastive Learning


33. FeatEHR-LLM: Leveraging Large Language Models for Feature Engineering in Electronic Health Records


34. CGC: Compositional Grounded Contrast for Fine-Grained Multi-Image Understanding


35. SSG: Logit-Balanced Vocabulary Partitioning for LLM Watermarking


36. How Hard is it to Decide if a Fact is Relevant to a Query?


37. From Local to Cluster: A Unified Framework for Causal Discovery with Latent Variables


38. Distance-Misaligned Training in Graph Transformers and Adaptive Graph-Aware Control


39. Hidden Failure Modes of Gradient Modification under Adam in Continual Learning, and Adaptive Decoupled Moment Routing as a Repair


40. CNSL-bench: Benchmarking the Sign Language Understanding Capabilities of MLLMs on Chinese National Sign Language


41. LeHome: A Simulation Environment for Deformable Object Manipulation in Household Scenarios


42. ChangeQuery: Advancing Remote Sensing Change Analysis for Natural and Human-Induced Disasters from Visual Detection to Semantic Understanding


43. FETS Benchmark: Foundation Models Outperform Dataset-specific Machine Learning in Energy Time Series Forecasting


44. BLAST: Benchmarking LLMs with ASP-based Structured Testing


45. Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets



47. Semantic Error Correction and Decoding for Short Block Channel Codes


48. Towards Safe Mobility: A Unified Transportation Foundation Model enabled by Open-Ended Vision-Language Dataset


49. Protect the Brain When Treating the Heart: A Convolutional Neural Network for Detecting Emboli


50. A Probabilistic Framework for Hierarchical Goal Recognition


51. Navigating Large-Scale Document Collections: MuDABench for Multi-Document Analytical QA


52. Tell Me Why: Designing an Explainable LLM-based Dialogue System for Student Problem Behavior Diagnosis


53. Learning-augmented robotic automation for real-world manufacturing


54. Preserve Support, Not Correspondence: Dynamic Routing for Offline Reinforcement Learning


55. A Co-Evolutionary Theory of Human-AI Coexistence: Mutualism, Governance, and Dynamics in Complex Societies


56. Verbal Confidence Saturation in 3-9B Open-Weight Instruction-Tuned LLMs: A Pre-Registered Psychometric Validity Screen


57. UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions


58. Evaluating LLM-Based Goal Extraction in Requirements Engineering: Prompting Strategies and Their Limitations


59. An LLM-Driven Closed-Loop Autonomous Learning Framework for Robots Facing Uncovered Tasks in Open Environments


60. From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification


61. ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression


62. ReCast: Recasting Learning Signals for Reinforcement Learning in Generative Recommendation


63. Estimating Tail Risks in Language Model Output Distributions


64. GenMatter: Perceiving Physical Objects with Generative Matter Models


65. PrivSTRUCT: Untangling Data Purpose Compliance of Privacy Policies in Google Play Store


66. Reliable Self-Harm Risk Screening via Adaptive Multi-Agent LLM Systems


67. When AI Speaks, Whose Values Does It Express? A Cross-Cultural Audit of Individualism-Collectivism Bias in Large Language Models


68. PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training


69. Spontaneous Persuasion: An Audit of Model Persuasiveness in Everyday Conversations


70. Wiggle and Go! System Identification for Zero-Shot Dynamic Rope Manipulation


71. Ethics Testing: Proactive Identification of Generative AI System Harms


72. Removing Sandbagging in LLMs by Training with Weak Supervision


73. Shard the Gradient, Scale the Model: Serverless Federated Aggregation via Gradient Partitioning


74. Optimal Question Selection from a Large Question Bank for Clinical Field Recovery in Conversational Psychiatric Intake


75. Reliability Auditing for Downstream LLM tasks in Psychiatry: LLM-Generated Hospitalization Risk Scores


76. Lightweight Retrieval-Augmented Generation and Large Language Model-Based Modeling for Scalable Patient-Trial Matching


77. Call-Chain-Aware LLM-Based Test Generation for Java Projects


78. H-Sets: Hessian-Guided Discovery of Set-Level Feature Interactions in Image Classifiers


79. EgoMAGIC- An Egocentric Video Field Medicine Dataset for Training Perception Algorithms


80. Mochi: Aligning Pre-training and Inference for Efficient Graph Foundation Models via Meta-Learning


81. Shared Lexical Task Representations Explain Behavioral Variability In LLMs


82. Foundation models for discovering robust biomarkers of neurological disorders from dynamic functional connectivity


83. Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning


84. Multi-Task Optimization over Networks of Tasks


85. Model Predictive Control of Hybrid Dynamical Systems


86. A general optimization solver based on OP-to-MaxSAT reduction


87. A systematic review of generative AI usage for IT project management


88. MambaCSP: Hybrid-Attention State Space Models for Hardware-Efficient Channel State Prediction


89. Focus Session: Hardware and Software Techniques for Accelerating Multimodal Foundation Models


90. Feedback Over Form: Why Execution Feedback Matters More Than Pipeline Topology in 1-3B Code Generation


91. The Biggest Risk of Embodied AI is Governance Lag


92. Large Language Models Are Bad Dice Players: LLMs Struggle to Generate Random Numbers from Statistical Distributions