LLM 관련 주요 논문 - 2026-04-01

1. The Triadic Cognitive Architecture: Bounding Autonomous Action via Spatio-Temporal and Epistemic Friction


2. C-TRAIL: A Commonsense World Framework for Trajectory Planning in Autonomous Driving


3. ATP-Bench: Towards Agentic Tool Planning for MLLM Interleaved Generation


4. ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training


5. AgentFixer: From Failure Detection to Fix Recommendations in LLM Agentic Systems


6. Spontaneous Functional Differentiation in Large Language Models: A Brain-Like Intelligence Economy


7. Measuring the metacognition of AI


8. Beyond the Steeper Curve: AI-Mediated Metacognitive Decoupling and the Limits of the Dunning-Kruger Metaphor


9. FlowPIE: Test-Time Scientific Idea Evolution with Flow-Guided Literature Exploration


10. Learning to Generate Formally Verifiable Step-by-Step Logic Reasoning via Structured Formal Intermediaries


11. ELT-Bench-Verified: Benchmark Quality Issues Underestimate AI Agent Capabilities


12. AI-Generated Prior Authorization Letters: Strong Clinical Content, Weak Administrative Scaffolding


13. BenchScope: How Many Independent Signals Does Your Benchmark Provide?


14. Beyond pass@1: A Reliability Science Framework for Long-Horizon LLM Agents


15. Route-Induced Density and Stability (RIDE): Controlled Intervention and Mechanism Analysis of Routing-Style Meta Prompts on LLM Internal States


16. Webscraper: Leverage Multimodal Large Language Models for Index-Content Web Scraping


17. SimMOF: AI agent for Automated MOF Simulations


18. Knowledge database development by large language models for countermeasures against viruses and marine toxins


19. REFINE: Real-world Exploration of Interactive Feedback and Student Behaviour


20. SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents


21. GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification


22. PAR$^2$-RAG: Planned Active Retrieval and Reasoning for Multi-Hop Question Answering


23. Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures


24. Mimosa Framework: Toward Evolving Multi-Agent Systems for Scientific Research


25. ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts


26. Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought?


27. Tucker Attention: A generalization of approximate attention mechanisms


28. Hybrid Framework for Robotic Manipulation: Integrating Reinforcement Learning and Large Language Models


29. Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks


30. Enhancing Structural Mapping with LLM-derived Abstractions for Analogical Reasoning in Narratives


31. Bethe Ansatz with a Large Language Model


32. SISA: A Scale-In Systolic Array for GEMM Acceleration


33. UniRank: End-to-End Domain-Specific Reranking of Hybrid Text-Image Candidates


34. Perfecting Human-AI Interaction at Clinical Scale. Turning Production Signals into Safer, More Human Conversations


35. Interview-Informed Generative Agents for Product Discovery: A Validation Study


36. Performance Evaluation of LLMs in Automated RDF Knowledge Graph Generation


37. UnWeaving the knots of GraphRAG – turns out VectorRAG is almost enough


38. Towards Empowering Consumers through Sentence-level Readability Scoring in German ESG Reports


39. DIAL: Decoupling Intent and Action via Latent World Modeling for End-to-End VLA


40. From Skeletons to Semantics: Design and Deployment of a Hybrid Edge-Based Action Detection System for Public Safety


41. TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios


42. BotVerse: Real-Time Event-Driven Simulation of Social Agents


43. KEditVis: A Visual Analytics System for Knowledge Editing of Large Language Models


44. Agenda-based Narrative Extraction: Steering Pathfinding Algorithms with Large Language Models


45. An Empirical Study of Multi-Agent Collaboration for Automated Research


46. Convergent Representations of Linguistic Constructions in Human and Artificial Neural Systems


47. IMAGAgent: Orchestrating Multi-Turn Image Editing via Constraint-Aware Planning and Reflection


48. Learn2Fold: Structured Origami Generation with World Model Planning


49. Bringing Up a Bilingual BabyLM: Investigating Multilingual Language Acquisition Using Small-Scale Models


50. Baby Scale: Investigating Models Trained on Individual Children’s Language Input


51. MemFactory: Unified Inference & Training Framework for Agent Memory


52. M-MiniGPT4: Multilingual VLLM Alignment via Translated Data


53. An Isotropic Approach to Efficient Uncertainty Quantification with Gradient Norms


54. Adversarial Prompt Injection Attack on Multimodal Large Language Models


55. AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models


56. Hallucination-aware intermediate representation edit in large vision-language models


57. Security in LLM-as-a-Judge: A Comprehensive SoK


58. Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus


59. Sima AIunty: Caste Audit in LLM-Driven Matchmaking


60. PRISM: A Multi-View Multi-Capability Retail Video Dataset for Embodied Vision-Language Models


61. Omni-NegCLIP: Enhancing CLIP with Front-Layer Contrastive Fine-Tuning for Comprehensive Negation Understanding


62. Scaling the Long Video Understanding of Multimodal Large Language Models via Visual Memory Mechanism


63. MemRerank: Preference Memory for Personalized Product Reranking


64. Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs


65. Software Vulnerability Detection Using a Lightweight Graph Neural Network


66. Multi-Layered Memory Architectures for LLM Agents: An Experimental Evaluation of Long-Term Context Retention


67. Developing Adaptive Context Compression Techniques for Large Language Models (LLMs) in Long-Running Interactions


68. Designing FSMs Specifications from Requirements with GPT 4.0


69. SemLoc: Structured Grounding of Free-Form LLM Reasoning for Fault Localization


70. APEX-EM: Non-Parametric Online Learning for Autonomous Agents via Structured Procedural-Episodic Experience Replay


71. WybeCoder: Verified Imperative Code Generation


72. CivicShield: A Cross-Domain Defense-in-Depth Framework for Securing Government-Facing AI Chatbots Against Multi-Turn Adversarial Attacks


73. Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning


74. The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning


75. Human-Like Lifelong Memory: A Neuroscience-Grounded Architecture for Infinite Interaction


76. Improving Efficiency of GPU Kernel Optimization Agents using a Domain-Specific Language and Speed-of-Light Guidance


77. Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference


78. Design Principles for the Construction of a Benchmark Evaluating Security Operation Capabilities of Multi-agent AI Systems


79. Privacy Guard & Token Parsimony by Prompt and Context Handling and LLM Routing


80. Multi-Agent LLMs for Adaptive Acquisition in Bayesian Optimization


81. Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs


82. GUARD-SLM: Token Activation-Based Defense Against Jailbreak Attacks for Small Language Models


83. StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving


84. The Last Fingerprint: How Markdown Training Shapes LLM Prose