LLM 관련 주요 논문 - 2026-04-07

1. MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents


2. ANX: Protocol-First Design for AI Agent Interaction with a Supporting 3EX Decoupled Architecture


3. AI Trust OS – A Continuous Governance Framework for Autonomous AI Observability and Zero-Trust Compliance in Enterprise Environments


4. Springdrift: An Auditable Persistent Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Ambient Self-Perception


5. Search, Do not Guess: Teaching Small Language Models to Be Effective Search Agents


6. SuperLocalMemory V3.3: The Living Brain – Biologically-Inspired Forgetting, Cognitive Quantization, and Multi-Channel Retrieval for Zero-LLM Agent Memory Systems


7. Memory Intelligence Agent


8. Scalable and Explainable Learner-Video Interaction Prediction using Multimodal Large Language Models


9. What Makes a Sale? Rethinking End-to-End Seller–Buyer Retail Dynamics with LLM Agents


10. ShieldNet: Network-Level Guardrails against Emerging Supply-Chain Injections in Agentic Systems


11. MolDA: Molecular Understanding and Generation via Large Language Diffusion Model


12. Automatically Generating Hard Math Problems from Hypothesis-Driven Error Analysis


13. Optimizing Service Operations via LLM-Powered Multi-Agent Simulation


14. Decocted Experience Improves Test-Time Inference in LLM Agents


15. REAM: Merging Improves Pruning of Experts in LLMs


16. RoboPhD: Evolving Diverse Complex Agents Under Tight Evaluation Budgets


17. Implementing surrogate goals for safer bargaining in LLM-based agents


18. Soft Tournament Equilibrium


19. RESCORE: LLM-Driven Simulation Recovery in Control Systems Research Papers


20. Preservation Is Not Enough for Width Growth: Regime-Sensitive Selection of Dense LM Warm Starts


21. InferenceEvolve: Towards Automated Causal Effect Estimators through Self-Evolving AI


22. Combee: Scaling Prompt Learning for Self-Improving Language Model Agents


23. TimeSeek: Temporal Reliability of Agentic Forecasters


24. Schema-Aware Planning and Hybrid Knowledge Toolset for Reliable Knowledge Graph Triple Verification


25. Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty


26. CoALFake: Collaborative Active Learning with Human-LLM Co-Annotation for Cross-Domain Fake News Detection


27. Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents


28. Solar-VLM: Multimodal Vision-Language Models for Augmented Solar Power Forecasting


29. Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents


30. InsTraj: Instructing Diffusion Models with Travel Intentions to Generate Real-world Trajectories


31. Compliance-by-Construction Argument Graphs: Using Generative AI to Produce Evidence-Linked Formal Arguments for Certification-Grade Accountability


32. FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification


33. LLM-Agent-based Social Simulation for Attitude Diffusion


34. FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning


35. PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Latency Arbitrage


36. Affording Process Auditability with QualAnalyzer: An Atomistic LLM Analysis Tool for Qualitative Research


37. Structured Multi-Criteria Evaluation of Large Language Models with Fuzzy Analytic Hierarchy Process and DualJudge


38. PRAISE: Prefix-Based Rollout Reuse in Agentic Search Training


39. TableVision: A Large-Scale Benchmark for Spatially Grounded Reasoning over Complex Hierarchical Tables


40. Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative Engine Optimization


41. Single-agent vs. Multi-agents for Automated Video Analysis of On-Screen Collaborative Learning Behaviors


42. Entropy and Attention Dynamics in Small Language Models: A Trace-Level Structural Analysis on the TruthfulQA Benchmark


43. When Adaptive Rewards Hurt: Causal Probing and the Switching-Stability Dilemma in LLM-Guided LEO Satellite Scheduling


44. When Do Hallucinations Arise? A Graph Perspective on the Evolution of Path Reuse and Path Compression


45. Towards the AI Historian: Agentic Information Extraction from Primary Sources


46. Automated Analysis of Global AI Safety Initiatives: A Taxonomy-Driven LLM Approach


47. Structural Rigidity and the 57-Token Predictive Window: A Physical Framework for Inference-Layer Governability in Large Language Models


48. Resource-Conscious Modeling for Next- Day Discharge Prediction Using Clinical Notes


49. Hume’s Representational Conditions for Causal Judgment: What Bayesian Formalization Abstracted Away


50. VERT: Reliable LLM Judges for Radiology Report Evaluation


51. Evaluating Artificial Intelligence Through a Christian Understanding of Human Flourishing


52. Toward Full Autonomous Laboratory Instrumentation Control with Large Language Models


53. IC3-Evolve: Proof-/Witness-Gated Offline LLM-Driven Heuristic Evolution for IC3 Hardware Model Checking


54. Vero: An Open RL Recipe for General Visual Reasoning


55. Agentic Federated Learning: The Future of Distributed Training Orchestration


56. Rethinking Exploration in RLVR: From Entropy Regularization to Refinement via Bidirectional Entropy Modulation


57. Strengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework


58. Plausibility as Commonsense Reasoning: Humans Succeed, Large Language Models Do not


59. LiveFact: A Dynamic, Time-Aware Benchmark for LLM-Driven Fake News Detection


60. SkillX: Automatically Constructing Skill Knowledge Bases for Agents


61. Cog-DRIFT: Exploration on Adaptively Reformulated Instances Enables Learning from Hard Reasoning Problems


62. Hallucination Basins: A Dynamic Framework for Understanding and Controlling LLM Hallucinations


63. Discovering Failure Modes in Vision-Language Models using RL


64. Metaphors We Compute By: A Computational Audit of Cultural Translation vs. Thinking in LLMs


65. Individual and Combined Effects of English as a Second Language and Typos on LLM Performance


66. MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition


67. An AI Teaching Assistant for Motion Picture Engineering


68. ROSClaw: A Hierarchical Semantic-Physical Framework for Heterogeneous Multi-Agent Collaboration


69. Ruling Out to Rule In: Contrastive Hypothesis Retrieval for Medical Question Answering


70. PassiveQA: A Three-Action Framework for Epistemically Calibrated Question Answering via Supervised Finetuning


71. Temporal Inversion for Learning Interval Change in Chest X-Rays


72. Paper Espresso: From Paper Overload to Research Insight


73. Mapping the Exploitation Surface: A 10,000-Trial Taxonomy of What Makes LLM Agents Exploit Vulnerabilities


74. ENCRUST: Encapsulated Substitution and Agentic Refinement on a Live Scaffold for Safe C-to-Rust Translation


75. One Model for All: Multi-Objective Controllable Language Models


76. SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models


77. Discrete Prototypical Memories for Federated Time Series Foundation Models


78. DP-OPD: Differentially Private On-Policy Distillation for Language Models


79. Conversational Control with Ontologies for Large Language Models: A Lightweight Framework for Constrained Generation


80. Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality


81. Responses Fall Short of Understanding: Revealing the Gap between Internal Representations and Responses in Visual Document Understanding


82. Relative Density Ratio Optimization for Stable and Statistically Consistent Model Alignment


83. How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models


84. Compressible Softmax-Attended Language under Incompressible Attention


85. GROUNDEDKG-RAG: Grounded Knowledge Graph Index for Long-document Question Answering


86. Poisoned Identifiers Survive LLM Deobfuscation: A Case Study on Claude Opus 4.6


87. Commercial Persuasion in AI-Mediated Conversations


88. APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs


89. LOCARD: An Agentic Framework for Blockchain Forensics


90. Which English Do LLMs Prefer? Triangulating Structural Bias Towards American English in Foundation Models


91. ClawArena: Benchmarking AI Agents in Evolving Information Environments


92. GENFIG1: Visual Summaries of Scholarly Work as a Challenge for Vision-Language Models


93. Many Preferences, Few Policies: Towards Scalable Language Model Personalization


94. Learning Robust Visual Features in Computed Tomography Enables Efficient Transfer Learning for Clinical Tasks


95. From Paper to Program: A Multi-Stage LLM-Assisted Workflow for Accelerating Quantum Many-Body Algorithm Development


96. Embedding Enhancement via Fine-Tuned Language Models for Learner-Item Cognitive Modeling


97. Extracting and Steering Emotion Representations in Small Language Models: A Methodological Comparison


98. CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks


99. Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents


100. Gram-Anchored Prompt Learning for Vision-Language Models via Second-Order Statistics


101. TraceGuard: Structured Multi-Dimensional Monitoring as a Collusion-Resistant Control Protocol


102. VLA-Forget: Vision-Language-Action Unlearning for Embodied Foundation Models


103. Symbolic-Vector Attention Fusion for Collective Intelligence


104. Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference


105. AdaptFuse: Training-Free Sequential Preference Learning via Externalized Bayesian Inference


106. Uncertainty as a Planning Signal: Multi-Turn Decision Making for Goal-Oriented Conversation


107. Automating Cloud Security and Forensics Through a Secure-by-Design Generative AI Framework


108. I-CALM: Incentivizing Confidence-Aware Abstention for LLM Hallucination Mitigation


109. Enhancing behavioral nudges with large language model-based iterative personalization: A field experiment on electricity and hot-water conservation


110. Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus


111. Automated Conjecture Resolution with Formal Verification


112. When Does Multimodal AI Help? Diagnostic Complementarity of Vision-Language Models and CNNs for Spectrum Management in Satellite-Terrestrial Networks


113. Automated Attention Pattern Discovery at Scale in Large Language Models


114. Build on Priors: Vision–Language–Guided Neuro-Symbolic Imitation Learning for Data-Efficient Real-World Robot Manipulation


115. AutoReSpec: A Framework for Generating Specification using Large Language Models


116. Can Humans Tell? A Dual-Axis Study of Human Perception of LLM-Generated News


117. Testing the Limits of Truth Directions in LLMs


118. CREBench: Evaluating Large Language Models in Cryptographic Binary Reverse Engineering


119. Fusion and Alignment Enhancement with Large Language Models for Tail-item Sequential Recommendation


120. LightThinker++: From Reasoning Compression to Memory Management


121. Unlocking Prompt Infilling Capability for Diffusion Language Models


122. Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling


123. Persistent Cross-Attempt State Optimization for Repository-Level Code Generation


124. Toward Executable Repository-Level Code Generation via Environment Alignment


125. SecPI: Secure Code Generation with Reasoning Models via Security Reasoning Internalization


126. Focus Matters: Phase-Aware Suppression for Hallucination in Vision-Language Models


127. LangFIR: Discovering Sparse Language-Specific Features from Monolingual Data for Language Steering


128. Inside the Scaffold: A Source-Code Taxonomy of Coding Agent Architectures


129. Large Language Models Align with the Human Brain during Creative Thinking


130. Fine-tuning DeepSeek-OCR-2 for Molecular Structure Recognition


131. Evolutionary Search for Automated Design of Uncertainty Quantification Methods


132. Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution


133. RDFace: A Benchmark Dataset for Rare Disease Facial Image Analysis under Extreme Data Scarcity and Phenotype-Aware Synthetic Generation


134. Measuring LLM Trust Allocation Across Conflicting Software Artifacts


135. Inference-Path Optimization via Circuit Duplication in Frozen Visual Transformers for Marine Species Classification


136. Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior


137. CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge


138. The Ideation Bottleneck: Decomposing the Quality Gap Between AI-Generated and Human Economics Research


139. AICCE: AI Driven Compliance Checker Engine


140. VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing


141. V-Reflection: Transforming MLLMs from Passive Observers to Active Interrogators


142. Generative Chemical Language Models for Energetic Materials Discovery


143. Beyond Static Vision: Scene Dynamic Field Unlocks Intuitive Physics Understanding in Multi-modal Large Language Models


144. XAttnRes: Cross-Stage Attention Residuals for Medical Image Segmentation


145. 3D-IDE: 3D Implicit Depth Emergent


146. Scaling Teams or Scaling Time? Memory Enabled Lifelong Learning in LLM Multi-Agent Systems


147. RAGnaroX: A Secure, Local-Hosted ChatOps Assistant Using Small Language Models


148. SafeScreen: A Safety-First Screening Framework for Personalized Video Retrieval for Vulnerable Users


149. LPC-SM: Local Predictive Coding and Sparse Memory for Long-Context Language Modeling


150. SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression


151. Robust LLM Performance Certification via Constrained Maximum Likelihood Estimation


152. FVRuleLearner: Operator-Level Reasoning Tree (OP-Tree)-Based Rules Learning for Formal Verification


153. Scaling DPPs for RAG: Density Meets Diversity


154. The Persuasion Paradox: When LLM Explanations Fail to Improve Human-AI Team Performance


155. From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification


156. LLMs-Healthcare : Current Applications and Challenges of Large Language Models in various Medical Specialties