LLM 관련 주요 논문 - 2026-04-15

1. Modeling Co-Pilots for Text-to-Model Translation


2. Drawing on Memory: Dual-Trace Encoding Improves Cross-Session Recall in LLM Agents


3. BEAM: Bi-level Memory-adaptive Algorithmic Evolution for LLM-Powered Heuristic Design


4. AISafetyBenchExplorer: A Metric-Aware Catalogue of AI Safety Benchmarks Reveals Fragmented Measurement and Weak Benchmark Governance


5. RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair


6. DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding


7. Transferable Expertise for Autonomous Agents via Real-World Case-Based Learning


8. MISID: A Multimodal Multi-turn Dataset for Complex Intent Recognition in Strategic Deception Games


9. Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport


10. RPRA: Predicting an LLM-Judge for Efficient but Performant Inference


11. KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance


12. Every Picture Tells a Dangerous Story: Memory-Augmented Multi-Agent Jailbreak Attacks on VLMs


13. DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant


14. IDEA: An Interpretable and Editable Decision-Making Framework for LLMs via Verbal-to-Numeric Calibration


15. Cross-Cultural Simulation of Citizen Emotional Responses to Bureaucratic Red Tape Using LLM Agents


16. A Two-Stage LLM Framework for Accessible and Verified XAI Explanations


17. Technical Report – A Context-Sensitive Multi-Level Similarity Framework for First-Order Logic Arguments: An Axiomatic Study


18. CIA: Inferring the Communication Topology from LLM-based Multi-Agent Systems


19. Operationalising the Right to be Forgotten in LLMs: A Lightweight Sequential Unlearning Framework for Privacy-Aligned Deployment in Politically Sensitive Environments


20. Heuristic Classification of Thoughts Prompting (HCoT): Integrating Expert System Heuristics for Structured Reasoning into Large Language Models


21. Preventing Safety Drift in Large Language Models via Coupled Weight and Activation Constraints


22. ReflectCAP: Detailed Image Captioning with Reflective Memory


23. MultiDocFusion: Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents


24. Frontier-Eng: Benchmarking Self-Evolving Agents on Real-World Engineering Tasks with Generative Optimization


25. GAM: Hierarchical Graph-based Agentic Memory for LLM Agents


26. A Scoping Review of Large Language Model-Based Pedagogical Agents


27. How memory can affect collective and cooperative behaviors in an LLM-Based Social Particle Swarm


28. HintMR: Eliciting Stronger Mathematical Reasoning in Small Language Models


29. Designing Reliable LLM-Assisted Rubric Scoring for Constructed Responses: Evidence from Physics Exams


30. Modality-Native Routing in Agent-to-Agent Networks: A Multimodal A2A Protocol Extension


31. Beyond Scores: Diagnostic LLM Evaluation via Fine-Grained Abilities


32. TRUST Agents: A Collaborative Multi-Agent Framework for Fake News Detection, Explainable Verification, and Logic-Aware Claim Reasoning


33. Policy-Invisible Violations in LLM-Based Agents


34. Evaluating Relational Reasoning in LLMs with REL


35. EMBER: Autonomous Cognitive Behaviour from Learned Spiking Neural Network Dynamics in a Hybrid LLM Architecture


36. Development, Evaluation, and Deployment of a Multi-Agent System for Thoracic Tumor Board


37. Beyond Factual Grounding: The Case for Opinion-Aware Retrieval-Augmented Generation


38. Towards Platonic Representation for Table Reasoning: A Foundation for Permutation-Invariant Retrieval


39. Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents


40. Long-Horizon Plan Execution in Large Tool Spaces through Entropy-Guided Branching


41. The A-R Behavioral Space: Execution-Level Profiling of Tool-Using Language Model Agents in Organizational Deployment


42. Spatial Atlas: Compute-Grounded Reasoning for Spatial-Aware Research Agent Benchmarks


43. LLM-HYPER: Generative CTR Modeling for Cold-Start Ad Personalization via LLM-Based Hypernetworks


44. Mathematics Teachers Interactions with a Multi-Agent System for Personalized Problem Generation


45. Memory as Metabolism: A Design for Companion Knowledge Systems


46. Identity as Attractor: Geometric Evidence for Persistent Agent Architecture in LLM Activation Space


47. When to Forget: A Memory Governance Primitive


48. The Long-Horizon Task Mirage? Diagnosing Where and Why Agentic Systems Break


49. Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe


50. Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation


51. One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness


52. LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software


53. Distorted or Fabricated? A Survey on Hallucination in Video LLMs


54. CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference


55. OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension


56. CLASP: Class-Adaptive Layer Fusion and Dual-Stage Pruning for Multimodal Large Language Models



58. LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety


59. PromptEcho: Annotation-Free Reward from Vision-Language Models for Text-to-Image Reinforcement Learning


60. Learning Chain Of Thoughts Prompts for Predicting Entities, Relations, and even Literals on Knowledge Graphs


61. TimeSAF: Towards LLM-Guided Semantic Asynchronous Fusion for Time Series Forecasting


62. Calibration-Aware Policy Optimization for Reasoning LLMs


63. LLM-Guided Prompt Evolution for Password Guessing


64. When Does Data Augmentation Help? Evaluating LLM and Back-Translation Methods for Hausa and Fongbe NLP


65. MODIX: A Training-Free Multimodal Information-Driven Positional Index Scaling for Vision-Language Models


66. NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: Professional Image Quality Assessment (Track 1)


67. Topology-Aware Reasoning over Incomplete Knowledge Graph with Graph-Based Soft Prompting


68. KG-Reasoner: A Reinforced Model for End-to-End Multi-Hop Knowledge Graph Reasoning


69. Mining Large Language Models for Low-Resource Language Data: Comparing Elicitation Strategies for Hausa and Fongbe


70. Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation


71. Chain-of-Models Pre-Training: Rethinking Training Acceleration of Vision Foundation Models


72. Beyond Output Correctness: Benchmarking and Evaluating Large Language Model Reasoning in Coding Tasks


73. SCRIPT: A Subcharacter Compositional Representation Injection Module for Korean Pre-Trained Language Models


74. Cooperative Memory Paging with Keyword Bookmarks for Long-Horizon LLM Conversations


75. Scaffold-Conditioned Preference Triplets for Controllable Molecular Optimization with Large Language Models


76. EgoEsportsQA: An Egocentric Video Benchmark for Perception and Reasoning in Esports


77. Is Vibe Coding the Future? An Empirical Assessment of LLM Generated Codes for Construction Safety


78. GCA Framework: A Gulf-Grounded Dataset and Agentic Pipeline for Climate Decision Support


79. Local-Splitter: A Measurement Study of Seven Tactics for Reducing Cloud LLM Token Usage on Coding-Agent Workloads


80. CascadeDebate: Multi-Agent Deliberation for Cost-Aware LLM Cascades


81. Coding-Free and Privacy-Preserving MCP Framework for Clinical Agentic Research Intelligence System


82. ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion Perception


83. SpecBound: Adaptive Bounded Self-Speculation with Layer-wise Confidence Calibration


84. Continuous Knowledge Metabolism: Generating Scientific Hypotheses from Evolving Literature


85. TEMPLATEFUZZ: Fine-Grained Chat Template Fuzzing for Jailbreaking and Red Teaming LLMs


86. LLM-Guided Semantic Bootstrapping for Interpretable Text Classification with Tsetlin Machines


87. Towards grounded autonomous research: an end-to-end LLM mini research loop on published computational physics


88. Fully Homomorphic Encryption on Llama 3 model for privacy preserving LLM inference


89. LLM-Based Automated Diagnosis Of Integration Test Failures At Google


90. Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models


91. Interpretable DNA Sequence Classification via Dynamic Feature Generation in Decision Trees


92. Leveraging Weighted Syntactic and Semantic Context Assessment Summary (wSSAS) Towards Text Categorization Using LLMs


93. SIR-Bench: Evaluating Investigation Depth in Security Incident Response Agents


94. Benchmarking Deflection and Hallucination in Large Vision-Language Models


95. LLMs Struggle with Abstract Meaning Comprehension More Than Expected


96. Filtered Reasoning Score: Evaluating Reasoning Quality on a Model’s Most-Confident Traces


97. INDOTABVQA: A Benchmark for Cross-Lingual Table Understanding in Bahasa Indonesia Documents


98. AnyPoC: Universal Proof-of-Concept Test Generation for Scalable LLM-Based Bug Detection


99. AutoSurrogate: An LLM-Driven Multi-Agent Framework for Autonomous Construction of Deep Learning Surrogate Models in Subsurface Flow


100. How Transformers Learn to Plan via Multi-Token Prediction


101. Disposition Distillation at Small Scale: A Three-Arc Negative Result


102. Evaluating the Limitations of Protein Sequence Representations for Parkinson’s Disease Classification


103. Polynomial Expansion Rank Adaptation: Enhancing Low-Rank Fine-Tuning with High-Order Interactions


104. Schema-Adaptive Tabular Representation Learning with LLMs for Generalizable Multimodal Clinical Reasoning


105. M$^\star$: Every Task Deserves Its Own Memory Harness


106. GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization


107. Should There be a Teacher In-the-Loop? A Study of Generative AI Personalized Tasks Middle School