LLM 관련 주요 논문 - 2026-04-13

1. Strategic Algorithmic Monoculture:Experimental Evidence from Coordination Games


2. E3-TIR: Enhanced Experience Exploitation for Tool-Integrated Reasoning


3. Constraint-Aware Corrective Memory for Language-Based Drug Discovery Agents


4. SAGE: A Service Agent Graph-guided Evaluation Benchmark


5. SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment


6. PilotBench: A Benchmark for General Aviation Agents with Safety Constraints


7. Enhancing LLM Problem Solving via Tutor-Student Multi-Agent Interaction


8. StaRPO: Stability-Augmented Reinforcement Policy Optimization


9. SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks


10. Model Space Reasoning as Search in Feedback Space for Planning Domain Generation


11. From Business Events to Auditable Decisions: Ontology-Governed Graph Simulation for Enterprise AI


12. Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism


13. Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise


14. VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images


15. VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning


16. VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning


17. BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation


18. RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval


19. ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion


20. Many-Tier Instruction Hierarchy in LLM Agents


21. On the Representational Limits of Quantum-Inspired 1024-D Document Embeddings: An Experimental Evaluation Framework


22. LLM-Rosetta: A Hub-and-Spoke Intermediate Representation for Cross-Provider LLM API Translation


23. Visually-Guided Policy Optimization for Multimodal Reasoning


24. SkillMOO: Multi-Objective Optimization of Agent Skills for Software Engineering


25. Mosaic: Multimodal Jailbreak against Closed-Source VLMs via Multi-View Ensemble Optimization


26. GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking


27. Persona-E$^2$: A Human-Grounded Dataset for Personality-Shaped Emotional Responses to Textual Events


28. Structuring versus Problematizing: How LLM-based Agents Scaffold Learning in Diagnostic Reasoning


29. CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation


30. Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition


31. PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing


32. TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training


33. CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion


34. DeepGuard: Secure Code Generation via Multi-Layer Semantic Aggregation


35. Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures


36. CONDESION-BENCH: Conditional Decision-Making of Large Language Models in Compositional Action Space


37. Skill-Conditioned Visual Geolocation for Vision-Language


38. Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection


39. Noise-Aware In-Context Learning for Hallucination Mitigation in ALLMs


40. ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering


41. PinpointQA: A Dataset and Benchmark for Small Object-Centric Spatial Understanding in Indoor Videos


42. PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment


43. MuTSE: A Human-in-the-Loop Multi-use Text Simplification Evaluator


44. Beyond Relevance: Utility-Centric Retrieval in the LLM Era


45. HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing


46. AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Models


47. Scalable High-Recall Constraint-Satisfaction-Based Information Retrieval for Clinical Trials Matching


48. Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs


49. HiFloat4 Format for Language Model Pre-training on Ascend NPUs


50. Lessons Without Borders? Evaluating Cultural Alignment of LLMs Using Multilingual Story Moral Generation


51. Cards Against LLMs: Benchmarking Humor Alignment in Large Language Models


52. LLMs Underperform Graph-Based Parsers on Supervised Relation Extraction for Complex Graphs


53. Decomposing the Delta: What Do Models Actually Learn from Preference Pairs?


54. Demystifying the Silence of Correctness Bugs in PyTorch Compiler


55. LMGenDrive: Bridging Multimodal Understanding and Generative World Modeling for End-to-End Driving


56. Every Response Counts: Quantifying Uncertainty of LLM-based Multi-Agent Systems through Tensor Decomposition


57. 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding


58. Evidential Transformation Network: Turning Pretrained Models into Evidential Models for Post-hoc Uncertainty Estimation


59. SkillForge: Forging Domain-Specific, Self-Evolving Agent Skills in Cloud Technical Support


60. MARINER: A 3E-Driven Benchmark for Fine-Grained Perception and Complex Reasoning in Open-Water Environments


61. Detection of Hate and Threat in Digital Forensics: A Case-Driven Multimodal Approach


62. Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines


63. Extrapolating Volition with Recursive Information Markets


64. TiAb Review Plugin: A Browser-Based Tool for AI-Assisted Title and Abstract Screening


65. STIndex: A Context-Aware Multi-Dimensional Spatiotemporal Information Extraction System


66. Adaptive Rigor in AI System Evaluation using Temperature-Controlled Verdict Aggregation via Generalized Power Mean


67. AlphaLab: Autonomous Multi-Agent Research Across Optimization Domains with Frontier LLMs


68. Act or Escalate? Evaluating Escalation Behavior in Automation with Language Models


69. QCFuse: Query-Centric Cache Fusion for Efficient RAG Inference


70. CSAttention: Centroid-Scoring Attention for Accelerating LLM Inference


71. Structured Exploration and Exploitation of Label Functions for Automated Data Annotation


72. Distributionally Robust Token Optimization in RLHF


73. Robust Reasoning Benchmark


74. QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation


75. Can We Still Hear the Accent? Investigating the Resilience of Native Language Signals in the LLM Era


76. Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models


77. Neural networks for Text-to-Speech evaluation


78. Medical Reasoning with Large Language Models: A Survey and MR-Bench


79. Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models


80. EMA Is Not All You Need: Mapping the Boundary Between Structure and Content in Recurrent Context


81. Drift and selection in LLM text ecosystems


82. GNN-as-Judge: Unleashing the Power of LLMs for Graph Learning with GNN Feedback


83. Automated Standardization of Legacy Biomedical Metadata Using an Ontology-Constrained LLM Agent


84. Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces