LLM 관련 주요 논문 - 2026-05-04

1. Position: agentic AI orchestration should be Bayes-consistent


2. To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling


3. Thinking in Text and Images: Interleaved Vision–Language Reasoning Traces for Long-Horizon Robot Manipulation


4. AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning


5. ARMOR 2025: A Military-Aligned Benchmark for Evaluating Large Language Model Safety Beyond Civilian Contexts


6. TUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimization


7. Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents


8. Minimal, Local, Causal Explanations for Jailbreak Success in Large Language Models


9. AgentReputation: A Decentralized Agentic AI Reputation Framework


10. TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data


11. Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs


12. Can Coding Agents Reproduce Findings in Computational Materials Science?


13. When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI


14. Make Your LVLM KV Cache More Lightweight


15. GeoContra: From Fluent GIS Code to Verifiable Spatial Analysis with Geography-Grounded Repair


16. AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments


17. Jailbreaking Vision-Language Models Through the Visual Modality


18. Structure Liberates: How Constrained Sensemaking Produces More Novel Research Output


19. Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation


20. SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters


21. Silicon Showdown: Performance, Efficiency, and Ecosystem Barriers in Consumer-Grade LLM Inference


22. Space Network of Experts: Architecture and Expert Placement


23. LLM-Oriented Information Retrieval: A Denoising-First Perspective


24. Impact of Task Phrasing on Presumptions in Large Language Models


25. Escaping Mode Collapse in LLM Generation via Geometric Regulation


26. Improving LLM Code Generation via Requirement-Aware Curriculum Reinforcement Learning


27. Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes


28. BWLA: Breaking the Barrier of W1AX Post-Training Quantization for LLMs


29. RadLite: Multi-Task LoRA Fine-Tuning of Small Language Models for CPU-Deployable Radiology AI


30. Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines


31. Social Bias in LLM-Generated Code: Benchmark and Mitigation


32. AlphaInventory: Evolving White-Box Inventory Policies via Large Language Models with Deployment Guarantees


33. MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents


34. Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning


35. Budget-Aware Routing for Long Clinical Text


36. DynamicPO: Dynamic Preference Optimization for Recommendation


37. Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis


38. Caracal: Causal Architecture via Spectral Mixing


39. Are You the A-hole? A Fair, Multi-Perspective Ethical Reasoning Framework


40. Jailbroken Frontier Models Retain Their Capabilities


41. Retrieval-Augmented Reasoning for Chartered Accountancy


42. Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving


43. Attention Is Where You Attack



45. RSAT: Structured Attribution Makes Small Language Models Faithful Table Reasoners


46. The $\textit{Silicon Society}$ Cookbook: Design Space of LLM-based Social Simulations



48. How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in System-Prompted Responses


49. DeGenTWeb: A First Look at LLM-dominant Websites


50. CRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shift


51. XekRung Technical Report


52. A Survey of Reasoning-Intensive Retrieval: Progress and Challenges


53. SiriusHelper: An LLM Agent-Based Operations Assistant for Big Data Platforms


54. Exploring LLM biases to manipulate AI search overview


55. Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation