LLM 관련 주요 논문 - 2026-03-20

1. Box Maze: A Process-Control Architecture for Reliable LLM Reasoning


2. cuGenOpt: A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization


3. Implicit Patterns in LLM-Based Binary Analysis


4. How Uncertainty Estimation Scales with Sampling in Reasoning Models


5. Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity


6. Behavioral Fingerprints for LLM Endpoint Stability and Identity


7. Evaluating 5W3H Structured Prompting for Intent Alignment in Human-AI Interaction


8. Secure Linear Alignment of Large Language Models


9. I Can’t Believe It’s Corrupt: Evaluating Corruption in Multi-Agent Governance Systems


10. Quantitative Introspection in Language Models: Tracking Internal States Across Conversation


11. Reasoning over mathematical objects: on-policy reward modeling and test time aggregation


12. Bridging Network Fragmentation: A Semantic-Augmented DRL Framework for UAV-aided VANETs


13. RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models


14. ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents


15. Can LLM generate interesting mathematical research problems?


16. dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models


17. Memento-Skills: Let Agents Design Agents


18. Analysis Of Linguistic Stereotypes in Single and Multi-Agent Generative AI Architectures


19. MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution


20. Thinking with Constructions: A Benchmark and Policy Optimization for Visual-Text Interleaved Geometric Reasoning


21. Balanced Thinking: Improving Chain of Thought Training in Vision Language Models


22. D-Mem: A Dual-Process Memory System for LLM Agents


23. Agentic Flow Steering and Parallel Rollout Search for Spatially Grounded Text-to-Image Generation


24. ZEBRAARENA: A Diagnostic Simulation Environment for Studying Reasoning-Action Coupling in Tool-Augmented LLMs


25. Interplay: Training Independent Simulators for Reference-Free Conversational Recommendation


26. Expert Personas Improve LLM Alignment but Damage Accuracy: Bootstrapping Intent-Based Persona Routing with PRISM


27. Cross-Domain Demo-to-Code via Neurosymbolic Counterfactual Reasoning


28. Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding


29. Reflection in the Dark: Exposing and Escaping the Black Box in Reflective Prompt Optimization


30. From Weak Cues to Real Identities: Evaluating Inference-Driven De-Anonymization in LLM Agents


31. Interpretability without actionability: mechanistic methods cannot correct language model errors despite near-perfect internal representations


32. Large-Scale Analysis of Political Propaganda on Moltbook


33. MemArchitect: A Policy Driven Memory Governance Layer


34. FaithSteer-BENCH: A Deployment-Aligned Stress-Testing Benchmark for Inference-Time Steering


35. The Validity Gap in Health AI Evaluation: A Cross-Sectional Analysis of Benchmark Composition


36. EDM-ARS: A Domain-Specific Multi-Agent System for Automated Educational Data Mining Research


37. Retrieval-Augmented LLM Agents: Learning to Learn from Experience


38. TeachingCoach: A Fine-Tuned Scaffolding Chatbot for Instructional Guidance to Instructors


39. Continually self-improving AI


40. DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models


41. FinTradeBench: A Financial Reasoning Benchmark for LLMs


42. F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World


43. Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation


44. Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language Navigation


45. VEPO: Variable Entropy Policy Optimization for Low-Resource Language Foundation Models


46. UGID: Unified Graph Isomorphism for Debiasing Large Language Models


47. SAVeS: Steering Safety Judgments in Vision-Language Models via Semantic Cues


48. Parallelograms Strike Back: LLMs Generate Better Analogies than People


49. SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models


50. What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?


51. Security awareness in LLM agents: the NDAI zone case


52. Hypothesis-Conditioned Query Rewriting for Decision-Useful Retrieval


53. AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science


54. Act While Thinking: Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution


55. MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language Model


56. Evaluating LLM-Generated Lessons from the Language Learning Students’ Perspective: A Short Case Study on Duolingo


57. Motion-o: Trajectory-Grounded Video Reasoning


58. Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation


59. Functional Subspace Watermarking for Large Language Models


60. Mi:dm K 2.5 Pro


61. Automatic Configuration of LLM Post-Training Pipelines


62. Are complicated loss functions necessary for teaching LLMs to reason?


63. Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review


64. CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks


65. HISR: Hindsight Information Modulated Segmental Process Rewards For Multi-turn Agentic Reinforcement Learning


66. Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation


67. REST: Receding Horizon Explorative Steiner Tree for Zero-Shot Object-Goal Navigation


68. Learning to Self-Evolve


69. AutORAN: LLM-driven Natural Language Programming for Agile xApp Development


70. SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding


71. HiMu: Hierarchical Multimodal Frame Selection for Long Video Question Answering


72. CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models


73. Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds


74. When Names Change Verdicts: Intervention Consistency Reveals Systematic Bias in LLM Decision-Making


75. Counting Circuits: Mechanistic Interpretability of Visual Reasoning in Large Vision-Language Models


76. Do Vision Language Models Understand Human Engagement in Games?


77. WASD: Locating Critical Neurons as Sufficient Conditions for Explaining and Controlling LLM Behavior


78. Discounted Beta–Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards


79. Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation


80. Mind the Rarities: Can Rare Skin Diseases Be Reliably Diagnosed via Diagnostic Reasoning?


81. The Spillover Effects of Peer AI Rinsing on Corporate Green Innovation


82. TARo: Token-level Adaptive Routing for LLM Test-time Alignment


83. PlanTwin: Privacy-Preserving Planning Abstractions for Cloud-Assisted LLM Agents


84. PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching


85. From Noise to Signal: When Outliers Seed New Topics


86. Can LLMs Reason Like Automated Theorem Provers for Rust Verification? VCoT-Bench: Evaluating via Verification Chain of Thought


87. DriveVLM-RL: Neuroscience-Inspired Reinforcement Learning with Vision-Language Models for Safe and Deployable Autonomous Driving


88. Auditing Preferences for Brands and Cultures in LLMs


89. Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails


90. Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization


91. MolRGen: A Training and Evaluation Setting for De Novo Molecular Generation with Reasonning Models


92. Retrieval-Augmented LLMs for Security Incident Analysis


93. VLM-AutoDrive: Post-Training Vision-Language Models for Safety-Critical Autonomous Driving Events


94. How LLMs Distort Our Written Language


95. Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models


96. LLM-Augmented Computational Phenotyping of Long Covid


97. VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models


98. A Trace-Based Assurance Framework for Agentic AI Orchestration: Contracts, Testing, and Governance


99. SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training


100. Lightweight Adaptation for LLM-based Technical Service Agent: Latent Logic Augmentation and Robust Noise Reduction


101. MCP-38: A Comprehensive Threat Taxonomy for Model Context Protocol Systems (v1.0)


102. NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference


103. The Provenance Paradox in Multi-Agent LLM Routing: Delegation Contracts and Attested Identity in LDP


104. Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems


105. Quine: Realizing LLM Agents as Native POSIX Processes


106. BenchBrowser – Collecting Evidence for Evaluating Benchmark Validity


107. MineDraft: A Framework for Batch Parallel Speculative Decoding


108. DynaRAG: Bridging Static and Dynamic Knowledge in Retrieval-Augmented Generation


109. Agentic Framework for Political Biography Extraction


110. How Confident Is the First Token? An Uncertainty-Calibrated Prompt Optimization Framework for Large Language Model Classification and Understanding


111. TherapyGym: Evaluating and Aligning Clinical Fidelity and Safety in Therapy Chatbots


112. Do Large Language Models Possess a Theory of Mind? A Comparative Evaluation Using the Strange Stories Paradigm