LLM 관련 주요 논문 - 2026-03-27

1. Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment


2. Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?


3. EcoThink: A Green Adaptive Inference Framework for Sustainable and Accessible Agents


4. Cross-Model Disagreement as a Label-Free Correctness Signal


5. Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models


6. Evaluating Language Models for Harmful Manipulation


7. DAGverse: Building Document-Grounded Semantic DAGs from Scientific Papers


8. SliderQuant: Accurate Post-Training Quantization for LLMs


9. Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills


10. RubricEval: A Rubric-Level Meta-Evaluation Benchmark for LLM Judges in Instruction Following


11. ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents


12. Sparse Visual Thought Circuits in Vision-Language Models


13. Mechanistically Interpreting Compression in Vision-Language Models


14. From Stateless to Situated: Building a Psychological World for LLM-Based Emotional Support


15. The Anatomy of Uncertainty in LLMs


16. Can MLLMs Read Students’ Minds? Unpacking Multimodal Error Analysis in Handwritten Math


17. Shopping with a Platform AI Assistant: Who Adopts, When in the Journey, and What For


18. FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol


19. LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metrics


20. How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning


21. ReLope: KL-Regularized LoRA Probes for Multimodal LLM Routing


22. Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design


23. Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach


24. AutoSAM: an Agentic Framework for Automating Input File Generation for the SAM Code with Multi-Modal Retrieval-Augmented Generation


25. When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs


26. The Kitchen Loop: User-Spec-Driven Development for a Self-Evolving Codebase


27. Measuring What Matters – or What’s Convenient?: Robustness of LLM-Based Scoring Systems to Construct-Irrelevant Factors


28. A Mentalistic Interface for Probing Folk-Psychological Attribution to Non-Humanoid Robots


29. Beyond Via: Analysis and Estimation of the Impact of Large Language Models in Academic Papers


30. Demographic Fairness in Multimodal LLMs: A Benchmark of Gender and Ethnicity Bias in Face Verification


31. Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes


32. Shape and Substance: Dual-Layer Side-Channel Attacks on Local Vision-Language Models


33. GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs


34. Adaptive Chunking: Optimizing Chunking-Method Selection for RAG


35. How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models


36. AD-CARE: A Guideline-grounded, Modality-agnostic LLM Agent for Real-world Alzheimer’s Disease Diagnosis with Multi-cohort Assessment, Fairness Analysis, and Reader Study


37. CRAFT: Grounded Multi-Agent Coordination Under Partial Information


38. MolQuest: A Benchmark for Agentic Evaluation of Abductive Reasoning in Chemical Structure Elucidation


39. Activation Matters: Test-time Activated Negative Labels for OOD Detection with Vision-Language Models


40. FluxEDA: A Unified Execution Infrastructure for Stateful Agentic EDA


41. WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing


42. A Decade-Scale Benchmark Evaluating LLMs’ Clinical Practice Guidelines Detection and Adherence in Multi-turn Conversations


43. Probing the Lack of Stable Internal Beliefs in LLMs


44. Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model


45. PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems


46. Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models


47. Factors Influencing the Quality of AI-Generated Code: A Synthesis of Empirical Evidence


48. Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory


49. Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Optimization


50. TopoPilot: Reliable Conversational Workflow Automation for Topological Data Analysis and Visualization


51. The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities


52. Closing the Confidence-Faithfulness Gap in Large Language Models


53. Imperative Interference: Social Register Shapes Instruction Topology in Large Language Models


54. Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model


55. Rethinking Health Agents: From Siloed AI to Collaborative Decision Mediators


56. Self-Corrected Image Generation with Explainable Latent Rewards


57. Toward domain-specific machine translation and quality estimation systems


58. Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-integrated programming learning system


59. NeuroVLM-Bench: Evaluation of Vision-Enabled Large Language Models for Clinical Reasoning in Neurological Disorders


60. Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models


61. Learning From Developers: Towards Reliable Patch Validation at Scale for Linux


62. GoldiCLIP: The Goldilocks Approach for Balancing Explicit Supervision for Language-Image Pretraining


63. From Untestable to Testable: Metamorphic Testing in the Age of LLMs


64. Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Validated Dataset


65. Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models


66. Experiential Reflective Learning for Self-Improving LLM Agents


67. Sketch2Simulation: Automating Flowsheet Generation via Multi Agent Large Language Models


68. X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs


69. Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels


70. Malicious LLM-Based Conversational AI Makes Users Reveal Personal Information