LLM 관련 주요 논문 - 2026-04-29

1. Recursive Multi-Agent Systems


2. ADEMA: A Knowledge-State Orchestration Architecture for Long-Horizon Knowledge Synthesis with LLMAgents


3. RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion


4. Think Before You Act – A Neurocognitive Governance Model for Autonomous AI Agents


5. DualFact+: A Multimodal Fact Verification Framework for Procedural Video Understanding


6. Automated Adversarial Collaboration for Advancing Theory Building in the Cognitive Sciences


7. SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials


8. JURY-RL: Votes Propose, Proofs Dispose for Label-Free RLVR


9. ValueAlpha: Agreement-Gated Stress Testing of LLM-Judged Investment Rationales Before Returns Are Observable


10. DATAREEL: Automated Data-Driven Video Story Generation with Animations


11. From Insight to Action: A Novel Framework for Interpretability-Guided Data Selection in Large Language Models


12. Semantic Layers for Reliable LLM-Powered Data Analytics: A Paired Benchmark of Accuracy and Hallucination Across Three Frontier Models


13. Doing More With Less: Revisiting the Effectiveness of LLM Pruning for Test-Time Scaling


14. Cooperate to Compete: Strategic Coordination in Multi-Agent Conquest


15. Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization


16. Sparse Personalized Text Generation with Multi-Trajectory Reasoning


17. Assessing Y-Axis Influence: Bias in Multimodal Language Models on Chart-to-Table Translation


18. Adaptive Prompt Embedding Optimization for LLM Jailbreaking


19. Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate


20. Three Models of RLHF Annotation: Extension, Evidence, and Authority


21. Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers


22. When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient


23. RESTestBench: A Benchmark for Evaluating the Effectiveness of LLM-Generated REST API Test Cases from NL Requirements


24. Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling


25. SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring


26. G-Loss: Graph-Guided Fine-Tuning of Language Models


27. From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling


28. Towards Agentic Investigation of Security Alerts


29. PSI-Bench: Towards Clinically Grounded and Interpretable Evaluation of Depression Patient Simulators


30. CGU-ILALab at FoodBench-QA 2026: Comparing Traditional and LLM-based Approaches for Recipe Nutrient Estimation


31. SAFEdit: Does Multi-Agent Decomposition Resolve the Reliability Challenges of Instructed Code Editing?


32. Cross-Lingual Jailbreak Detection via Semantic Codebooks


33. Learning Generalizable Multimodal Representations for Software Vulnerability Detection


34. LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation


35. Prefill-Time Intervention for Mitigating Hallucination in Large Vision-Language Models


36. Large language models eroding science understanding: an experimental study


37. Health System Scale Semantic Search Across Unstructured Clinical Notes


38. Emotive Architectures: The Role of LLMs in Adjusting Work Environments


39. Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models


40. Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling


41. SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents


42. From CRUD to Autonomous Agents: Formal Validation and Zero-Trust Security for Semantic Gateways in AI-Native Enterprise Systems


43. Assistants, Not Architects: The Role of LLMs in Networked Systems Design


44. From World-Gen to Quest-Line: A Dependency-Driven Prompt Pipeline for Coherent RPG Generation


45. An Investigation of Linguistic Biases in LLM-Based Recommendations


46. Do LLMs Capture Embodied Cognition and Cultural Variation? Cross-Linguistic Evidence from Demonstratives


47. FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices


48. The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models


49. AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices


50. Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation


51. The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents



53. DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams


54. BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate


55. Making AI-Assisted Grant Evaluation Auditable without Exposing the Model


56. Kohn-Sham Hamiltonian from Effective Field Theory: Quasiparticle Band Narrowing from Frozen Core Dynamics


57. Frictive Policy Optimization for LLMs: Epistemic Intervention, Risk-Sensitive Control, and Reflective Alignment


58. M$^3$-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering


59. Analyzing LLM Reasoning to Uncover Mental Health Stigma


60. Dual-Track CoT: Budget-Aware Stepwise Guidance for Small LMs


61. Faithful Autoformalization via Roundtrip Verification and Repair


62. Compute Aligned Training: Optimizing for Test Time Inference


63. BenchGuard: Who Guards the Benchmarks? Automated Auditing of LLM Agent Benchmarks


64. ADE: Adaptive Dictionary Embeddings – Scaling Multi-Anchor Representations to Large Language Models


65. Rethinking Layer Redundancy in Large Language Models: Calibration Objectives and Search for Depth Pruning


66. Large Language Models Explore by Latent Distilling


67. MultiHedge: Adaptive Coordination via Retrieval-Augmented Control


68. On the Trainability of Masked Diffusion Language Models via Blockwise Locality


69. Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity


70. Salca: A Sparsity-Aware Hardware Accelerator for Efficient Long-Context Attention Decoding


71. Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora


72. SWE-QA: A Dataset and Benchmark for Complex Code Understanding


73. Nautile-370M: Spectral Memory Meets Attention in a Small Reasoning Model


74. ITAS: A Multi-Agent Architecture for LLM-Based Intelligent Tutoring


75. From Prototype to Classroom: An Intelligent Tutoring System for Quantum Education


76. Semantic Denial of Service in LLM-controlled robots


77. Cloud to Edge: Benchmarking LLM Inference On Hardware-Accelerated Single-Board Computers