LLM 관련 주요 논문 - 2026-02-27

1. Toward Expert Investment Teams:A Multi-Agent LLM System with Fine-Grained Trading Tasks


2. LLM Novice Uplift on Dual-Use, In Silico Biology Tasks


3. CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays


4. Mitigating Legibility Tax with Decoupled Prover-Verifier Games


5. Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive


6. SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation


7. ESAA: Event Sourcing for Autonomous Agents in LLM-Based Software Engineering


8. A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring


9. PATRA: Pattern-Aware Alignment and Balanced Reasoning for Time Series Question Answering


10. Multi-Agent Large Language Model Based Emotional Detoxification Through Personalized Intensity Control for Consumer Protection


11. Three AI-agents walk into a bar . . . . `Lord of the Flies’ tribalism emerges among smart AI-Agents


12. Enhancing CVRP Solver through LLM-driven Automatic Heuristic Design



14. Modeling Expert AI Diagnostic Alignment via Immutable Inference Snapshots


15. SPM-Bench: Benchmarking Large Language Models for Scanning Probe Microscopy


16. FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning


17. Towards LLM-Empowered Knowledge Tracing via LLM-Student Hierarchical Behavior Alignment in Hyperbolic Space


18. MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks


19. ClinDet-Bench: Beyond Abstention, Evaluating Judgment Determinability of LLMs in Clinical Decision-Making


20. AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications


21. RLHFless: Serverless Computing for Efficient RLHF


22. Toward Personalized LLM-Powered Agents: Foundations, Evaluation, and Future Directions


23. MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios


24. SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning


25. CourtGuard: A Model-Agnostic Framework for Zero-Shot Policy Adaptation in LLM Safety


26. Requesting Expert Reasoning: Augmenting LLM Agents with Learned Collaborative Intervention


27. Agentic AI for Intent-driven Optimization in Cell-free O-RAN


28. Cognitive Models and AI Algorithms Provide Templates for Designing Language Agents


29. A Mathematical Theory of Agency and Intelligence


30. Mirroring the Mind: Distilling Human-Like Metacognitive Strategies into Large Language Models


31. Mapping the Landscape of Artificial Intelligence in Life Cycle Assessment Using Large Language Models


32. VeRO: An Evaluation Harness for Agents to Optimize Agents


33. ConstraintBench: Benchmarking LLM Constraint Reasoning on Direct Optimization


34. CWM: Contrastive World Models for Action Feasibility Learning in Embodied Agent Pipelines


35. A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines


36. Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents


37. Towards Autonomous Memory Agents


38. Agent Behavioral Contracts: Formal Specification and Runtime Enforcement for Reliable Autonomous AI Agents


39. Graph Your Way to Inspiration: Integrating Co-Author Graphs with Retrieval-Augmented Generation for Large Language Model Based Scientific Idea Generation


40. SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport


41. Understanding Usage and Engagement in AI-Powered Scientific Research Tools: The Asta Interaction Dataset


42. Utilizing LLMs for Industrial Process Automation


43. Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction


44. Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments


45. MovieTeller: Tool-augmented Movie Synopsis with ID Consistent Progressive Abstraction


46. Why Diffusion Language Models Struggle with Truly Parallel (Non-Autoregressive) Decoding?


47. Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs


48. MoDora: Tree-Based Semi-Structured Document Analysis System


49. Affine-Scaled Attention: Towards Flexible and Stable Transformer Attention


50. LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure


51. Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization


52. Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability


53. Discovery of Interpretable Physical Laws in Materials via Language-Model-Guided Symbolic Regression


54. Test-Time Scaling with Diffusion Language Models via Reward-Guided Stitching


55. TCM-DiffRAG: Personalized Syndrome Differentiation Reasoning Method for Traditional Chinese Medicine based on Knowledge Graph and Chain of Thought


56. Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks


57. Natural Language Declarative Prompting (NLD-P): A Modular Governance Method for Prompt Design Under Model Drift


58. Probing for Knowledge Attribution in Large Language Models


59. Distributed LLM Pretraining During Renewable Curtailment Windows: A Feasibility Study


60. Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction


61. AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification


62. SoPE: Spherical Coordinate-Based Positional Embedding for Enhancing Spatial Perception of 3D LVLMs


63. IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation


64. Tokenization, Fusion and Decoupling: Bridging the Granularity Mismatch Between Large Language Models and Knowledge Graphs


65. Reinforcing Real-world Service Agents: Balancing Utility and Cost in Task-oriented Dialogue


66. SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses


67. ViCLIP-OT: The First Foundation Vision-Language Model for Vietnamese Image-Text Retrieval with Optimal Transport


68. dLLM: Simple Diffusion Language Modeling


69. Instruction-based Image Editing with Planning, Reasoning, and Generation


70. Transformers converge to invariant algorithmic cores


71. TabDLM: Free-Form Tabular Data Generation via Joint Numerical-Language Diffusion


72. Addressing Climate Action Misperceptions with Generative AI


73. DrivePTS: A Progressive Learning Framework with Textual and Structural Enhancement for Driving Scene Generation


74. Ruyi2 Technical Report


75. Generative Agents Navigating Digital Libraries


76. SignVLA: A Gloss-Free Vision-Language-Action Framework for Real-Time Sign Language-Guided Robotic Manipulation


77. Reinforcement-aware Knowledge Distillation for LLM Reasoning


78. Importance of Prompt Optimisation for Error Detection in Medical Notes Using Language Models


79. Sydney Telling Fables on AI and Humans: A Corpus Tracing Memetic Transfer of Persona between LLMs


80. Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models


81. Automating the Detection of Requirement Dependencies Using Large Language Models


82. Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace


83. HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems


84. Contextual Memory Virtualisation: DAG-Based State Management and Structurally Lossless Trimming for LLM Agents


85. EyeLayer: Integrating Human Attention Patterns into LLM-Based Code Summarization


86. Scaling In, Not Up? Testing Thick Citation Context Analysis with GPT-5 and Fragile Prompts


87. Decoder-based Sense Knowledge Distillation


88. Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory


89. Decoding the Hook: A Multimodal LLM Framework for Analyzing the Hooking Period of Video Ads


90. UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs


91. Manifold of Failure: Behavioral Attraction Basins in Language Models


92. Integrating Machine Learning Ensembles and Large Language Models for Heart Disease Prediction Using Voting Fusion


93. Analysis of LLMs Against Prompt Injection and Jailbreak Attacks


94. From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation


95. To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning


96. SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG


97. Misinformation Exposure in the Chinese Web: A Cross-System Evaluation of Search Engines, LLMs, and AI Overviews


98. Comparative Analysis of Neural Retriever-Reranker Pipelines for Retrieval-Augmented Generation over Knowledge Graphs in E-commerce Applications


99. RAGdb: A Zero-Dependency, Embeddable Architecture for Multimodal Retrieval-Augmented Generation on the Edge


100. Enriching Taxonomies Using Large Language Models


101. Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences