LLM 관련 주요 논문 - 2026-04-20

1. ASMR-Bench: Auditing for Sabotage in ML Research


2. Using Large Language Models and Knowledge Graphs to Improve the Interpretability of Machine Learning Models in Manufacturing


3. Learning to Reason with Insight for Informal Theorem Proving


4. Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed Models


5. MARCH: Multi-Agent Radiology Clinical Hierarchy for CT Report Generation


6. SocialGrid: A Benchmark for Planning and Social Reasoning in Embodied Multi-Agent Systems


7. ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams



9. Integrating Graphs, Large Language Models, and Agents: Reasoning and Retrieval


10. Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents


11. Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4


12. KWBench: Measuring Unprompted Problem Recognition in Knowledge Work


13. Structured Abductive-Deductive-Inductive Reasoning for LLMs via Algebraic Invariants


14. LLM Reasoning Is Latent, Not the Chain of Thought


15. The World Leaks the Future: Harness Evolution for Future Prediction Agents



17. Subliminal Transfer of Unsafe Behaviors in AI Agent Distillation


18. LACE: Lattice Attention for Cross-thread Exploration


19. GIST: Multimodal Knowledge Extraction and Spatial Grounding via Intelligent Semantic Topology


20. VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects



22. BAGEL: Benchmarking Animal Knowledge Expertise in Language Models


23. Beyond Surface Statistics: Robust Conformal Prediction for LLMs via Internal Representations


24. ChemGraph-XANES: An Agentic Framework for XANES Simulation and Analysis


25. JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models


26. AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency


27. Can LLMs Understand the Impact of Trauma? Costs and Benefits of LLMs Coding the Interviews of Firearm Violence Survivors


28. SCRIPT: Implementing an Intelligent Tutoring System for Programming in a German University Context


29. The Relic Condition: When Published Scholarship Becomes Material for Its Own Replacement


30. Mind’s Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs


31. Towards Intrinsic Interpretability of Large Language Models:A Survey of Design Principles and Architectures


32. Where does output diversity collapse in post-training?


33. Neurosymbolic Repo-level Code Localization


34. AgentV-RL: Scaling Reward Modeling with Agentic Verifier


35. Polarization by Default: Auditing Recommendation Bias in LLM-Based Content Curation


36. DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition


37. QuantSightBench: Evaluating LLM Quantitative Forecasting with Prediction Intervals


38. DPrivBench: Benchmarking LLMs’ Reasoning for Differential Privacy


39. Beyond a Single Frame: Multi-Frame Spatially Grounded Reasoning Across Volumetric MRI


40. Self-Distillation as a Performance Recovery Mechanism for LLMs: Counteracting Compression and Catastrophic Forgetting


41. EVIL: Evolving Interpretable Algorithms for Zero-Shot Inference on Event Sequences and Time Series with LLMs


42. DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference


43. Learning Uncertainty from Sequential Internal Dispersion in Large Language Models


44. Privacy-Preserving LLMs Routing


45. Just Type It in Isabelle! AI Agents Drafting, Mechanizing, and Generalizing from Human Hints


46. CodeMMR: Bridging Natural Language, Code, and Image for Unified Retrieval


47. HYPERHEURIST: A Simulated Annealing-Based Control Framework for LLM-Driven Code Generation in Optimized Hardware Design


48. Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking


49. DALM: A Domain-Algebraic Language Model via Three-Phase Structured Generation


50. LLM attribution analysis across different fine-tuning strategies and model scales for automated code compliance


51. “Excuse me, may I say something…” CoLabScience, A Proactive AI Assistant for Biomedical Discovery and LLM-Expert Collaborations


52. Why Fine-Tuning Encourages Hallucinations and How to Fix It


53. Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)


54. LLMbench: A Comparative Close Reading Workbench for Large Language Models


55. PolicyBank: Evolving Policy Understanding for LLM Agents


56. FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models


57. Harmonizing Multi-Objective LLM Unlearning via Unified Domain Representation and Bidirectional Logit Distillation


58. Ragged Paged Attention: A High-Performance and Flexible LLM Inference Kernel for TPU


59. The Crutch or the Ceiling? How Different Generations of LLMs Shape EFL Student Writings


60. StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models


61. HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?


62. PRL-Bench: A Comprehensive Benchmark Evaluating LLMs’ Capabilities in Frontier Physics Research


63. The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference


64. Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation


65. Analyzing Chain of Thought (CoT) Approaches in Control Flow Code Deobfuscation Tasks


66. Exploring LLM-based Verilog Code Generation with Data-Efficient Fine-Tuning and Testbench Automation


67. Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models


68. VeriCWEty: Embedding enabled Line-Level CWE Detection in Verilog


69. The Synthetic Media Shift: Tracking the Rise, Virality, and Detectability of AI-Generated Multimodal Misinformation


70. Applied Explainability for Large Language Models: A Comparative Study


71. Taming Asynchronous CPU-GPU Coupling for Frequency-aware Latency Estimation on Mobile Edge


72. Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit


73. SocialWise: LLM-Agentic Conversation Therapy for Individuals with Autism Spectrum Disorder to Enhance Communication Skills


74. To LLM, or Not to LLM: How Designers and Developers Navigate LLMs as Tools or Teammates


75. When the Loop Closes: Architectural Limits of In-Context Isolation, Metacognitive Co-option, and the Two-Target Design Problem in Human-LLM Systems


76. MRGEN: A Conceptual Framework for LLM-Powered Mixed Reality Authoring Tools for Education


77. Facial-Expression-Aware Prompting for Empathetic LLM Tutoring


78. Automating Crash Diagram Generation Using Vision-Language Models: A Case Study on Multi-Lane Roundabouts


79. How people use Copilot for Health


80. Evaluating LLMs as Human Surrogates in Controlled Experiments


81. Eco-Bee: A Personalised Multi-Modal Agent for Advancing Student Climate Awareness and Sustainable Behaviour in Campus Ecosystems


82. Explainable Iterative Data Visualisation Refinement via an LLM Agent


83. Anthropomorphism and Trust in Human-Large Language Model interactions