LLM 관련 주요 논문 - 2026-05-14

1. History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions


2. Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs


3. ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles


4. RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation


5. Assessing the Creativity of Large Language Models: Testing, Limits, and New Frontiers


6. TRIAGE: Evaluating Prospective Metacognitive Control in LLMs under Resource Constraints


7. RS-Claw: Progressive Active Tool Exploration via Hierarchical Skill Trees for Remote Sensing Agents


8. VERA-MH: Validation of Ethical and Responsible AI in Mental Health


9. IdeaForge: A Knowledge Graph-Grounded Multi-Agent Framework for Cross-Methodology Innovation Analysis and Patent Claim Generation


10. Respecting Self-Uncertainty in On-Policy Self-Distillation for Efficient LLM Reasoning


11. It’s not the Language Model, it’s the Tool: Deterministic Mediation for Scientific Workflows


12. An Agentic AI Framework with Large Language Models and Chain-of-Thought for UAV-Assisted Logistics Scheduling with Mobile Edge Computing


13. An Agentic LLM-Based Framework for Population-Scale Mental Health Screening


14. MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning


15. Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education


16. Useful Memories Become Faulty When Continuously Updated by LLMs


17. Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation


18. When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction


19. Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents


20. Multimodal Hidden Markov Models for Persistent Emotional State Tracking


21. PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models


22. CHAL: Council of Hierarchical Agentic Language


23. DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models


24. Learning Transferable Latent User Preferences for Human-Aligned Decision Making


25. Revealing Interpretable Failure Modes of VLMs


26. Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents


27. WARDEN: Endangered Indigenous Language Transcription and Translation with 6 Hours of Training Data


28. Neurosymbolic Auditing of Natural-Language Software Requirements


29. Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling


30. LMPath: Language-Mediated Priors and Path Generation for Aerial Exploration


31. (How) Do Large Language Models Understand High-Level Message Sequence Charts?


32. Where Does Reasoning Break? Step-Level Hallucination Detection via Hidden-State Transport Geometry


33. High-Rate Quantized Matrix Multiplication II


34. KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving


35. Children’s English Reading Story Generation via Supervised Fine-Tuning of Compact LLMs with Controllable Difficulty and Safety


36. Identifying AI Web Scrapers Using Canary Tokens


37. RTLC – Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning


38. A Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoning


39. Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training


40. NAACA: Training-Free NeuroAuditory Attentive Cognitive Architecture with Oscillatory Working Memory for Salience-Driven Attention Gating


41. OpenAaaS: An Open Agent-as-a-Service Framework for Distributed Materials-Informatics Research


42. Locale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Models


43. HLS-Seek: QoR-Aware Code Generation for High-Level Synthesis via Proxy Comparative Reward Reinforcement Learning


44. Towards Unified Surgical Scene Understanding:Bridging Reasoning and Grounding via MLLMs


45. Many-Shot CoT-ICL: Making In-Context Learning Truly Learn


46. Discovery of Hidden Miscalibration Regimes


47. LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics


48. GRIP-VLM: Group-Relative Importance Pruning for Efficient Vision-Language Models


49. Query-Conditioned Test-Time Self-Training for Large Language Models


50. Probing Persona-Dependent Preferences in Language Models


51. Tracing Persona Vectors Through LLM Pretraining


52. CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution


53. IndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languages


54. The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code


55. Teacher-Guided Policy Optimization for LLM Distillation


56. STAR: Semantic-Temporal Adaptive Representation Learning for Few-Shot Action Recognition


57. CLIP Tricks You: Training-free Token Pruning for Efficient Pixel Grounding in Large VIsion-Language Models


58. AcquisitionSynthesis: Targeted Data Generation using Acquisition Functions


59. Towards Long-horizon Embodied Agents with Tool-Aligned Vision-Language-Action Models


60. A Multi-Agent Orchestration Framework for Venture Capital Due Diligence


61. Context Training with Active Information Seeking


62. Revealing the Gap in Human and VLM Scene Perception through Counterfactual Semantic Saliency


63. No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills


64. Understanding and Accelerating the Training of Masked Diffusion Language Models


65. Rethinking Efficient Graph Coarsening via a Non-Selfishness Principle


66. Not Just RLHF: Why Alignment Alone Won’t Fix Multi-Agent Sycophancy


67. Controlling Logical Collapse in LLMs via Algebraic Ontology Projection over F2


68. Seg-Agent: Test-Time Multimodal Reasoning for Training-Free Language-Guided Segmentation


69. When Should an AI Workflow Release? Always-Valid Inference for Black-Box Generate-Verify Systems


70. The Expressivity Boundary of Probabilistic Circuits: A Comparison with Large Language Models


71. Embodied Multi-Agent Coordination by Aligning World Models Through Dialogue


72. Data Difficulty and the Generalization–Extrapolation Tradeoff in LLM Fine-Tuning


73. EcoGEO: Trajectory-Aware Evidence Ecosystems for Web-Enabled LLM Search Agents


74. Quantifying LLM Safety Degradation Under Repeated Attacks Using Survival Analysis


75. Persona-Model Collapse in Emergent Misalignment


76. Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion


77. Mechanism Plausibility in Generative Agent-Based Modeling


78. Training Large Language Models to Predict Clinical Events


79. REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations


80. Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces


81. WriteSAE: Sparse Autoencoders for Recurrent State


82. Uncovering Symmetry Transfer in Large Language Models via Layer-Peeled Optimization


83. Simulating Students or Sycophantic Problem Solving? On Misconception Faithfulness of LLM Simulators


84. CoT-Guard: Small Models for Strong Monitoring


85. Large Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safety


86. Grid-Orch: An LLM-Powered Orchestrator for Distribution Grid Simulation and Analytics


87. Agentic Interpretation: Lattice-Structured Evidence for LLM-Based Program Analysis


88. Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?


89. ODRPO: Ordinal Decompositions of Discrete Rewards for Robust Policy Optimization


90. Multi-Rollout On-Policy Distillation via Peer Successes and Failures


91. Training LLMs with Reinforcement Learning for Intent-Aware Personalized Question Answering


92. 3D Primitives are a Spatial Language for VLMs


93. DistractMIA: Black-Box Membership Inference on Vision-Language Models via Semantic Distraction


94. SSDA: Bridging Spectral and Structural Gaps via Dual Adaptation for Vision-Based Time Series Forecasting


95. AgenticAITA: A Proof-Of-Concept About Deliberative Multi-Agent Reasoning for Autonomous Trading Systems


96. In-Situ Behavioral Evaluation for LLM Fairness, Not Standardized-Test Scores


97. PERCEIVE: A Benchmark for Personalized Emotion and Communication Behavior Understanding on Social Media


98. Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism


99. Differences in Text Generated by Diffusion and Autoregressive Language Models


100. BoostTaxo: Zero-Shot Taxonomy Induction via Boosting-Style Agentic Reasoning and Constraint-Aware Calibration


101. Correct Answers from Sound Reasoning: Verifiable Process Supervision for Language Models


102. TimelineReasoner: Advancing Timeline Summarization with Large Reasoning Models


103. Bridging the Missing-Modality Gap: Improving Text-Only Calibration of Vision Language Models


104. Domain Adaptation of Large Language Models for Polymer-Composite Additive Manufacturing Using Retrieval-Augmented Generation and Fine-Tuning


105. Beyond Individual Mimicry: Constructing Human-Like Social network with Graph-Augmented LLM Agents


106. Can LLM Agents Simulate Dynamic Networks? A Case Study on Email Networks with Phishing Synthesis