LLM 관련 주요 논문 - 2025-09-26

1. SAGE: A Realistic Benchmark for Semantic Understanding


2. VC-Agent: An Interactive Agent for Customized Video Dataset Collection


3. Grounding AI Explanations in Experience: A Reflective Cognitive Architecture for Clinical Decision Support


4. What Do LLM Agents Do When Left Alone? Evidence of Spontaneous Meta-Cognitive Patterns


5. A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA


6. Distributed Specialization: Rare-Token Neurons in Large Language Models


7. ToMPO: Training LLM Strategic Decision Making from a Multi-Agent Perspective


8. RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs


9. TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them


10. Disagreements in Reasoning: How a Model’s Thinking Process Dictates Persuasion in Multi-Agent Systems


11. Combinatorial Creativity: A New Frontier in Generalization Abilities


12. Who Gets Cited Most? Benchmarking Long-Context Language Models on Scientific Articles


13. CORE: Full-Path Evaluation of LLM Agents Beyond Final State



15. Beyond Stars: Bridging the Gap Between Ratings and Review Sentiment with LLM


16. GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine


17. DeFacto: Counterfactual Thinking with Images for Enforcing Evidence-Grounded and Faithful Reasoning


18. LogReasoner: Empowering LLMs with Expert-like Coarse-to-Fine Reasoning for Log Analysis Tasks


19. Meta-Memory: Retrieving and Integrating Semantic-Spatial Memories for Robot Spatial Reasoning


20. Parallel Thinking, Sequential Answering: Bridging NAR and AR for Efficient Reasoning


21. An Automated Retrieval-Augmented Generation LLaMA-4 109B-based System for Evaluating Radiotherapy Treatment Plans


22. Accelerate Creation of Product Claims Using Generative AI


23. SAMULE: Self-Learning Agents Enhanced by Multi-level Reflection


24. InsightGUIDE: An Opinionated AI Assistant for Guided Critical Reading of Scientific Literature


25. LATTS: Locally Adaptive Test-Time Scaling


26. An Approach to Checking Correctness for Agentic Systems


27. RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards


28. DisCoCLIP: A Distributional Compositional Tensor Network Encoder for Vision-Language Understanding


29. It’s Not You, It’s Clipping: A Soft Trust-Region via Probability Smoothing for LLM RL


30. Data-Centric Elastic Pipeline Parallelism for Efficient Long-Context LLM Training


31. Semantic Edge-Cloud Communication for Real-Time Urban Traffic Surveillance with ViT and LLMs over Mobile Networks


32. Instruction-tuned Self-Questioning Framework for Multimodal Reasoning


33. Learning to Look: Cognitive Attention Alignment with Vision-Language Models


34. Explaining Fine Tuned LLMs via Counterfactuals A Knowledge Graph Driven Framework


35. Tree Search for LLM Agent Reinforcement Learning


36. Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning


37. Human-like Navigation in a World Built for Humans


38. Adoption, usability and perceived clinical value of a UK AI clinical reference platform (iatroX): a mixed-methods formative evaluation of real-world usage and a 1,223-respondent user survey


39. Can Less Precise Be More Reliable? A Systematic Evaluation of Quantization’s Impact on CLIP Beyond Accuracy


40. Fine-Tuning LLMs to Analyze Multiple Dimensions of Code Review: A Maximum Entropy Regulated Long Chain-of-Thought Approach


41. UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice


42. Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning


43. Cross-Modal Instructions for Robot Motion Generation


44. Best-of-$\infty$ – Asymptotic Performance of Test-Time Compute


45. Which Cultural Lens Do Models Adopt? On Cultural Positioning Bias and Agentic Mitigation in LLMs


46. Communication Bias in Large Language Models: A Regulatory Perspective


47. GeoRef: Referring Expressions in Geometry via Task Formulation, Synthetic Supervision, and Reinforced MLLM-based Solutions


48. Reinforcement Learning Fine-Tuning Enhances Activation Intensity and Diversity in the Internal Circuitry of LLMs


49. Generative AI for FFRDCs


50. SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization


51. Predicting LLM Reasoning Performance with Small Proxy Model


52. Mechanism of Task-oriented Information Removal in In-context Learning


53. Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools


54. Binary Autoencoder for Mechanistic Interpretability of Large Language Models


55. Analysis of instruction-based LLMs’ capabilities to score and judge text-input problems in an academic setting


56. Knowledgeable Language Models as Black-Box Optimizers for Personalized Medicine


57. Unlocking Financial Insights: An advanced Multimodal Summarization with Multimodal Output Framework for Financial Advisory Videos


58. On Theoretical Interpretations of Concept-Based In-Context Learning


59. SCRA-VQA: Summarized Caption-Rerank for Augmented Large Language Models in Visual Question Answering


60. StyleBench: Evaluating thinking styles in Large Language Models


61. Verification Limits Code LLM Training


62. CaTS-Bench: Can Language Models Describe Numeric Time Series?


63. Leveraging What’s Overfixed: Post-Correction via LLM Grammatical Error Overcorrection


64. DAC-LoRA: Dynamic Adversarial Curriculum for Efficient and Robust Few-Shot Adaptation


65. Towards Atoms of Large Language Models


66. Provenance Analysis of Archaeological Artifacts via Multimodal RAG Systems


67. Measuring LLM Sensitivity in Transformer-based Tabular Data Synthesis


68. Seeing Through Words, Speaking Through Pixels: Deep Representational Alignment Between Vision and Language Models


69. Incorporating LLM Embeddings for Variation Across the Human Genome


70. Look Before you Leap: Estimating LLM Benchmark Scores from Descriptions


71. A Framework for Rapidly Developing and Deploying Protection Against Large Language Model Attacks


72. Recidivism and Peer Influence with LLM Text Embeddings in Low Security Correctional Facilities


73. FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models


74. Experience Deploying Containerized GenAI Services at an HPC Center


75. An LLM-based Agentic Framework for Accessible Network Control


76. Dynamic Reasoning Chains through Depth-Specialized Mixture-of-Experts in Transformer Architectures


77. Perspectra: Choosing Your Experts Enhances Critical Thinking in Multi-Agent Research Ideation


78. InstructVTON: Optimal Auto-Masking and Natural-Language-Guided Interactive Style Control for Inpainting-Based Virtual Try-On


79. CHOIR: A Chatbot-mediated Organizational Memory Leveraging Communication in University Research Labs


80. MARS: toward more efficient multi-agent collaboration for LLM reasoning


81. Boosting Zero-Shot VLN via Abstract Obstacle Map-Based Waypoint Prediction with TopoGraph-and-VisitInfo-Aware Prompting


82. Wartime Media Dynamics in Emerging Democracies: Case Study of Pakistani Media in May 2025 Indo-Pak Conflict


83. Adversarial Defense in Cybersecurity: A Systematic Review of GANs for Threat Detection and Mitigation


84. The Secret Agenda: LLMs Strategically Lie and Our Current Safety Tools Are Blind


85. Dynamic ReAct: Scalable Tool Selection for Large-Scale MCP Environments


86. R1-Fuzz: Specializing Language Models for Textual Fuzzing via Reinforcement Learning


87. USB-Rec: An Effective Framework for Improving Conversational Recommendation Capability of Large Language Model


88. ACCeLLiuM: Supervised Fine-Tuning for Automated OpenACC Pragma Generation


89. Beyond Global Emotion: Fine-Grained Emotional Speech Synthesis with Dynamic Word-Level Modulation


90. SKILL-RAG: Self-Knowledge Induced Learning and Filtering for Retrieval-Augmented Generation


91. ConceptViz: A Visual Analytics Approach for Exploring Concepts in Large Language Models


92. Assessing Classical Machine Learning and Transformer-based Approaches for Detecting AI-Generated Research Text


93. CFD-LLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid Dynamics


94. AI-driven formative assessment and adaptive learning in data-science education: Evaluating an LLM-powered virtual teaching assistant


95. Interpreting Public Sentiment in Diplomacy Events: A Counterfactual Analysis Framework Using Large Language Models