LLM 관련 주요 논문 - 2026-04-03

1. Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models


2. De Jure: Iterative LLM Self-Refinement for Structured Extraction of Regulatory Rules


3. Do Emotions in Prompts Matter? Effects of Emotional Framing on Large Language Models


4. Answering the Wrong Question: Reasoning Trace Inversion for Abstention in LLMs


5. When to ASK: Uncertainty-Gated Language Assistance for Reinforcement Learning


6. VISTA: Visualization of Token Attribution via Efficient Analysis


7. Blinded Radiologist and LLM-Based Evaluation of LLM-Generated Japanese Translations of Chest CT Reports: Comparative Study


8. Quantifying Self-Preservation Bias in Large Language Models


9. TRACE-Bot: Detecting Emerging LLM-Driven Social Bots via Implicit Semantic Representations and AIGC-Enhanced Behavioral Patterns


10. MTI: A Behavior-Based Temperament Profiling System for AI Agents


11. LLM-as-a-Judge for Time Series Explanations


12. AI in Insurance: Adaptive Questionnaires for Improved Risk Profiling


13. ATBench: A Diverse and Realistic Trajectory Benchmark for Long-Horizon Agent Safety


14. ProCeedRL: Process Critic with Exploratory Demonstration Reinforcement Learning for LLM Agentic Reasoning


15. SenseMath: Do LLMs Have Number Sense? Evaluating Shortcut Use, Judgment, and Generation


16. Abnormal Head Movements in Neurological Conditions: A Knowledge-Based Dataset with Application to Cervical Dystonia


17. Bayesian Elicitation with LLMs: Model Size Helps, Extra “Reasoning” Doesn’t Always


18. Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models


19. AeroTherm-GPT: A Verification-Centered LLM Framework for Thermal Protection System Engineering Workflows


20. The AnIML Ontology: Enabling Semantic Interoperability for Large-Scale Experimental Data in Interconnected Scientific Labs


21. EvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification


22. Can Heterogeneous Language Models Be Fused?


23. ContextBudget: Budget-Aware Context Management for Long-Horizon Search Agents


24. CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery


25. Exploring Robust Multi-Agent Workflows for Environmental Data Management


26. OSCAR: Orchestrated Self-verification and Cross-path Refinement


27. Analysis of LLM Performance on AWS Bedrock: Receipt-item Categorisation Case Study


28. GraphWalk: Enabling Reasoning in Large Language Models through Tool-Based Graph Navigation


29. CRaFT: Circuit-Guided Refusal Feature Selection via Cross-Layer Transcoders


30. MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction


31. ByteRover: Agent-Native Memory Through LLM-Curated Hierarchical Context


32. Do Large Language Models Mentalize When They Teach?


33. ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement


34. NED-Tree: Bridging the Semantic Gap with Nonlinear Element Decomposition Tree for LLM Nonlinear Optimization Modeling


35. Does Your Optimizer Care How You Normalize? Normalization-Optimizer Coupling in LLM Training


36. PHMForge: A Scenario-Driven Agentic Benchmark for Industrial Asset Lifecycle Maintenance


37. A Role-Based LLM Framework for Structured Information Extraction from Healthy Food Policies


38. LLM Agents as Social Scientists: A Human-AI Collaborative Platform for Social Science Automation


39. AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks


40. A Self-Evolving Agentic Framework for Metasurface Inverse Design


41. Reducing Hallucinations in LLM-based Scientific Literature Analysis Using Peer Context Outlier Detection


42. Infeasibility Aware Large Language Models for Combinatorial Optimization


43. A Multi-Agent Human-LLM Collaborative Framework for Closed-Loop Scientific Literature Summarization


44. RIFT: A RubrIc Failure Mode Taxonomy and Automated Diagnostics


45. CogBias: Measuring and Mitigating Cognitive Bias in Large Language Models


46. Crashing Waves vs. Rising Tides: Preliminary Findings on AI Automation from Thousands of Worker Evaluations of Labor Market Tasks


47. IDEA2: Expert-in-the-loop competency question elicitation for collaborative ontology engineering


48. Runtime Burden Allocation for Structured LLM Routing in Agentic Expert Systems: A Full-Factorial Cross-Backend Methodology


49. Steerable Visual Representations


50. Grounded Token Initialization for New Vocabulary in LMs for Generative Recommendation


51. Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning


52. VOID: Video Object and Interaction Deletion


53. Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation


54. Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing


55. Retrieval-Augmented Question Answering over Scientific Literature for the Electron-Ion Collider


56. Impact of Multimodal and Conversational AI on Learning Outcomes and Experience


57. Multi-Agent Video Recommenders: Evolution, Patterns, and Open Challenges


58. Neuro-RIT: Neuron-Guided Instruction Tuning for Robust Retrieval-Augmented Language Model


59. The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level


60. Optimizing RAG Rerankers with LLM Feedback via Reinforcement Learning


61. Mining Instance-Centric Vision-Language Contexts for Human-Object Interaction Detection


62. Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding


63. BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs


64. SAFE: Stepwise Atomic Feedback for Error correction in Multi-hop Reasoning


65. Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation


66. RuleForge: Automated Generation and Validation for Web Vulnerability Detection at Scale


67. Ego-Grounding for Personalized Question-Answering in Egocentric Videos


68. Do We Need Bigger Models for Science? Task-Aware Retrieval with Small Language Models


69. Captioning Daily Activity Images in Early Childhood Education: Benchmark and Algorithm


70. Reliable News or Propagandist News? A Neurosymbolic Model Using Genre, Topic, and Persuasion Techniques to Improve Robustness in Classification


71. ImplicitBBQ: Benchmarking Implicit Bias in Large Language Models through Characteristic Based Cues


72. Combating Data Laundering in LLM Training


73. DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning


74. FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models


75. LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches


76. Development and multi-center evaluation of domain-adapted speech recognition for human-AI teaming in real-world gastrointestinal endoscopy


77. MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning


78. Bridging Large-Model Reasoning and Real-Time Control via Agentic Fast-Slow Planning


79. GPA: Learning GUI Process Automation from Demonstrations


80. AromaGen: Interactive Generation of Rich Olfactory Experiences with Multimodal Language Models


81. Seclens: Role-specific Evaluation of LLM’s for security vulnerablity detection


82. DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72


83. SHOE: Semantic HOI Open-Vocabulary Evaluation Metric


84. Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging


85. ProdCodeBench: A Production-Derived Benchmark for Evaluating AI Coding Agents


86. Magic, Madness, Heaven, Sin: LLM Output Diversity is Everything, Everywhere, All at Once


87. CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe


88. Type-Checked Compliance: Deterministic Guardrails for Agentic Financial Systems Using Lean 4 Theorem Proving


89. DISCO-TAB: A Hierarchical Reinforcement Learning Framework for Privacy-Preserving Synthesis of Complex Clinical Data


90. SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits


91. The Newton-Muon Optimizer


92. Low-Burden LLM-Based Preference Learning: Personalizing Assistive Robots from Natural Language Feedback for Users with Paralysis


93. Reproducible, Explainable, and Effective Evaluations of Agentic AI for Software Engineering


94. Adaptive Stopping for Multi-Turn LLM Reasoning


95. Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models



97. AffordTissue: Dense Affordance Prediction for Tool-Action Specific Tissue Interaction


98. No Attacker Needed: Unintentional Cross-User Contamination in Shared-State LLM Agents


99. Safety, Security, and Cognitive Risks in World Models


100. Preference learning in shades of gray: Interpretable and bias-aware reward modeling for human preferences


101. Look Twice: Training-Free Evidence Highlighting in Multimodal Large Language Models


102. The Overlooked Repetitive Lengthening Form in Sentiment Analysis


103. DarwinNet: An Evolutionary Network Architecture for Agent-Driven Protocol Synthesis