LLM 관련 주요 논문 - 2026-03-18

1. SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models


2. Internalizing Agency from Reflective Experience


3. Learning to Present: Inverse Specification Rewards for Agentic Slide Generation


4. Prompt Programming for Cultural Bias and Alignment of Large Language Models


5. Surg$Σ$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence


6. Is Conformal Factuality for RAG-based LLMs Robust? Novel Metrics and Systematic Insights


7. MedCL-Bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning


8. Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure


9. IQuest-Coder-V1 Technical Report


10. Machines acquire scientific taste from institutional traces


11. When AI Navigates the Fog of War


12. Runtime Governance for AI Agents: Policies on Paths


13. V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge in Vision Language Models


14. BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs


15. Designing for Disagreement: Front-End Guardrails for Assistance Allocation in LLM-Enabled Robots


16. Exploring different approaches to customize language models for domain-specific text-to-code generation


17. ExpressMind: A Multimodal Pretrained Large Language Model for Expressway Operation


18. Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures


19. Follow the Clues, Frame the Truth: Hybrid-evidential Deductive Reasoning in Open-Vocabulary Multimodal Emotion Recognition


20. RetailBench: Evaluating Long-Horizon Autonomous Decision-Making and Strategy Stability of LLM Agents in Realistic Retail Environments


21. Visual Distraction Undermines Moral Reasoning in Vision-Language Models


22. From Natural Language to Executable Option Strategies via Large Language Models


23. Via Negativa for AI Alignment: Why Negative Constraints Are Structurally Superior to Positive Preferences


24. FactorEngine: A Program-level Knowledge-Infused Factor Mining Framework for Quantitative Investment


25. Learning to Predict, Discover, and Reason in High-Dimensional Discrete Event Sequences


26. NeSy-Route: A Neuro-Symbolic Benchmark for Constrained Route Planning in Remote Sensing


27. Adaptive Theory of Mind for LLM-based Multi-Agent Coordination


28. MOSAIC: Composable Safety Alignment with Modular Control Tokens


29. Proactive Rejection and Grounded Execution: A Dual-Stage Intent Analysis Paradigm for Safe and Efficient AIoT Smart Homes


30. Are Large Language Models Truly Smarter Than Humans?


31. NeuronSpark: A Spiking Neural Network Language Model with Selective State Space Dynamics


32. ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning


33. A Context Alignment Pre-processor for Enhancing the Coherence of Human-LLM Dialog


34. POaaS: Minimal-Edit Prompt Optimization as a Service to Lift Accuracy and Cut Hallucinations on On-Device sLLMs


35. Enhancing Linguistic Generalization of VLA: Fine-Tuning OpenVLA via Synthetic Instruction Augmentation


36. Selective Memory for Artificial Intelligence: Write-Time Gating with Hierarchical Archiving


37. An Agentic Evaluation Framework for AI-Generated Scientific Code in PETSc


38. MAC: Multi-Agent Constitution Learning


39. Protein Design with Agent Rosetta: A Case Study for Specialized Scientific Agents


40. Prompt Engineering for Scale Development in Generative Psychometrics


41. AsgardBench - Evaluating Visually Grounded Interactive Planning Under Minimal Feedback


42. Persona-Conditioned Risk Behavior in Large Language Models: A Simulated Gambling Study with GPT-4.1


43. Prose2Policy (P2P): A Practical LLM Pipeline for Translating Natural-Language Access Policies into Executable Rego


44. Context-Length Robustness in Question Answering Models: A Comparative Empirical Study


45. I Know What I Don’t Know: Latent Posterior Factor Models for Multi-Evidence Probabilistic Reasoning


46. QV May Be Enough: Toward the Essence of Attention in LLMs


47. DynaTrust: Defending Multi-Agent Systems Against Sleeper Agents via Dynamic Trust Graphs


48. GSI Agent: Domain Knowledge Enhancement for Large Language Models in Green Stormwater Infrastructure


49. CraniMem: Cranial Inspired Gated and Bounded Memory for Agentic Systems


50. NextMem: Towards Latent Factual Memory for LLM-based Agents


51. InCoder-32B: Code Foundation Model for Industrial Scenarios


52. IOSVLM: A 3D Vision-Language Model for Unified Dental Diagnosis from Intraoral Scans


53. TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities


54. Finding Common Ground in a Sea of Alternatives


55. Retrieving Counterfactuals Improves Visual In-Context Learning


56. When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making


57. Kestrel: Grounding Self-Refinement for LVLM Hallucination Mitigation



59. Omanic: Towards Step-wise Evaluation of Multi-hop Reasoning in Large Language Models


60. MLLM-based Textual Explanations for Face Comparison


61. BATQuant: Outlier-resilient MXFP4 Quantization via Learnable Block-wise Optimization


62. Characterizing Delusional Spirals through Human-LLM Chat Logs


63. EmoLLM: Appraisal-Grounded Cognitive-Emotional Co-Reasoning in Large Language Models


64. EngGPT2: Sovereign, Efficient and Open Intelligence


65. An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU


66. IndexRAG: Bridging Facts for Cross-Document Reasoning at Index Time


67. Trained Persistent Memory for Frozen Encoder–Decoder LLMs: Six Architectural Methods


68. PlotTwist: A Creative Plot Generation Framework with Small Language Models


69. Who Benchmarks the Benchmarks? A Case Study of LLM Evaluation in Icelandic


70. Fanar 2.0: Arabic Generative AI Stack


71. Toward Experimentation-as-a-Service in 5G/6G: The Plaza6G Prototype for AI-Assisted Trials


72. Detecting Sentiment Steering Attacks on RAG-enabled Large Language Models


73. An Interpretable Machine Learning Framework for Non-Small Cell Lung Cancer Drug Response Analysis


74. A Human-Centred Architecture for Large Language Models-Cognitive Assistants in Manufacturing within Quality Management Systems


75. Attention-guided Evidence Grounding for Spoken Question Answering


76. VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents


77. CoMAI: A Collaborative Multi-Agent Framework for Robust and Equitable Interview Evaluation


78. A Scoping Review of AI-Driven Digital Interventions in Mental Health Care: Mapping Applications Across Screening, Support, Monitoring, Prevention, and Clinical Education


79. 360° Image Perception with MLLMs: A Comprehensive Benchmark and a Training-Free Method


80. DyJR: Preserving Diversity in Reinforcement Learning with Verifiable Rewards via Dynamic Jensen-Shannon Replay


81. HIPO: Instruction Hierarchy via Constrained Reinforcement Learning


82. Structure-Aware Multimodal LLM Framework for Trustworthy Near-Field Beam Prediction


83. SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding


84. PathGLS: Evaluating Pathology Vision-Language Models without Ground Truth through Multi-Dimensional Consistency


85. ASDA: Automated Skill Distillation and Adaptation for Financial Reasoning


86. Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization


87. Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective


88. LICA: Layered Image Composition Annotations for Graphic Design Research


89. Parallel In-context Learning for Large Vision Language Models


90. RecBundle: A Next-Generation Geometric Paradigm for Explainable Recommender Systems


91. Interact3D: Compositional 3D Generation of Interactive Objects


92. SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia


93. Resource Consumption Threats in Large Language Models


94. Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models


95. Understanding Moral Reasoning Trajectories in Large Language Models: Toward Probing-Based Explainability


96. Evaluating Agentic Optimization on Large Codebases


97. RadAnnotate: Large Language Models for Efficient and Reliable Radiology Report Annotation


98. Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement Learning


99. ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors


100. MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale


101. A Family of LLMs Liberated from Static Vocabularies


102. Data-Local Autonomous LLM-Guided Neural Architecture Search for Multiclass Multimodal Time-Series Classification


103. VIBEPASS: Can Vibe Coders Really Pass the Vibe Check?


104. Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 Experiments


105. The Agentic Researcher: A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning


106. COGNAC at SemEval-2026 Task 5: LLM Ensembles for Human-Level Word Sense Plausibility Rating in Challenging Narratives


107. Interpretative Interfaces: Designing for AI-Mediated Reading Practices and the Knowledge Commons


108. FlashSampling: Fast and Memory-Efficient Exact Sampling


109. When Stability Fails: Hidden Failure Modes Of LLMS in Data-Constrained Scientific Decision-Making


110. Don’t Trust Stubborn Neighbors: A Security Framework for Agentic Networks


111. OMNIFLOW: A Physics-Grounded Multimodal Agent for Generalized Scientific Reasoning


112. Morphemes Without Borders: Evaluating Root-Pattern Morphology in Arabic Tokenizers and LLMs


113. CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving


114. ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems


115. How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition


116. Embedding-Aware Feature Discovery: Bridging Latent Representations and Interpretable Features in Event Sequences


117. LLM-Driven Discovery of High-Entropy Catalysts via Retrieval-Augmented Generation


118. SEMAG: Self-Evolutionary Multi-Agent Code Generation


119. This Is Taking Too Long - Investigating Time as a Proxy for Energy Consumption of LLMs


120. BadLLM-TG: A Backdoor Defender powered by LLM Trigger Generator


121. Loosely-Structured Software: Engineering Context, Structure, and Evolution Entropy in Runtime-Rewired Multi-Agent Systems


122. DASH: Dynamic Audio-Driven Semantic Chunking for Efficient Omnimodal Token Compression


123. State-Dependent Safety Failures in Multi-Turn Language Model Interaction


124. Automated Self-Testing as a Quality Gate: Evidence-Driven Release Management for LLM Applications


125. DRCY: Agentic Hardware Design Reviews


126. Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context


127. Steering Frozen LLMs: Adaptive Social Alignment via Online Prompt Routing


128. Exploring the Use of VLMs for Navigation Assistance for People with Blindness and Low Vision