LLM 관련 주요 논문 - 2025-10-03

1. The Reasoning Boundary Paradox: How Reinforcement Learning Constrains Language Models


2. UpSafe$^\circ$C: Upcycling for Controllable Safety in Large Language Models


3. A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports


4. Demystifying the Roles of LLM Layers in Retrieval, Knowledge, and Reasoning


5. ReTabAD: A Benchmark for Restoring Semantic Context in Tabular Anomaly Detection


6. To Mask or to Mirror: Human-AI Alignment in Collective Reasoning


7. Constrained Adaptive Rejection Sampling


8. Learning a Dense Reasoning Reward Model from Expert Demonstration via Inverse Reinforcement Learning


9. Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning


10. REBot: From RAG to CatRAG with Semantic Enrichment and Graph Routing


11. MetaboT: AI-based agent for natural language-based interaction with metabolomics knowledge graphs


12. VaPR – Vision-language Preference alignment for Reasoning


13. Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness


14. GuruAgents: Emulating Wise Investors with Prompt-Guided LLM Agents


15. Understanding the Geospatial Reasoning Capabilities of LLMs: A Trajectory Recovery Perspective


16. Learning to Decide with Just Enough: Information-Theoretic Context Summarization for CDMPs


17. PychoBench: Evaluating the Psychology Intelligence of Large Language Models


18. AgentRec: Next-Generation LLM-Powered Multi-Agent Collaborative Recommendation with Adaptive Intelligence


19. AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning


20. InvThink: Towards AI Safety via Inverse Reasoning


21. Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models


22. Information Seeking for Robust Decision Making under Partial Observability


23. LOGicalThought: Logic-Based Ontological Grounding of LLMs for High-Assurance Reasoning


24. Towards Interpretable and Inference-Optimal COT Reasoning with Sparse Autoencoder-Guided Generation


25. AIReg-Bench: Benchmarking Language Models That Assess AI Regulation Compliance


26. VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning


27. A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining


28. OntoLogX: Ontology-Guided Knowledge Graph Extraction from Cybersecurity Logs with Large Language Models


29. Automating Data-Driven Modeling and Analysis for Engineering Applications using Large Language Model Agents


30. Fine-tuning with RAG for Improving LLM Learning of New Skills


31. Retrieval-Augmented Framework for LLM-Based Clinical Decision Support


32. Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models


33. The Social Laboratory: A Psychometric Framework for Multi-Agent LLM Evaluation


34. Modeling Others’ Minds as Code


35. OR-Toolformer: Modeling and Solving Operations Research Problems with Tool Augmented Large Language Models


36. VideoNSA: Native Sparse Attention Scales Video Understanding


37. F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data


38. Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks


39. Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation


40. InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents


41. microCLIP: Unsupervised CLIP Adaptation via Coarse-Fine Token Fusion for Fine-Grained Image Classification


42. DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing


43. Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation


44. ExGRPO: Learning to Reason from Experience


45. RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning


46. More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration


47. DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning


48. ARUQULA – An LLM based Text2SPARQL Approach using ReAct and Knowledge Graph Exploration Utilities


49. GRACE: A Language Model Framework for Explainable Inverse Reinforcement Learning


50. Learning to Reason for Hallucination Span Detection


51. Unlocking Vision-Language Models for Video Anomaly Detection via Fine-Grained Prompting


52. BioinfoMCP: A Unified Platform Enabling MCP Interfaces in Agentic Bioinformatics


53. The Disparate Impacts of Speculative Decoding


54. Clarifying Semantics of In-Context Examples for Unit Test Generation


55. Are LLMs Better GNN Helpers? Rethinking Robust Graph Learning under Deficiencies with Iterative Refinement


56. FINCH: Financial Intelligence using Natural language for Contextualized SQL Handling


57. REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration


58. TACOS: Task Agnostic COordinator of a multi-drone System


59. Pre-Hoc Predictions in AutoML: Leveraging LLMs to Enhance Model Selection and Benchmarking for Tabular datasets


60. Nav-EE: Navigation-Guided Early Exiting for Efficient Vision-Language Models in Autonomous Driving


61. Comparison of Unsupervised Metrics for Evaluating Judicial Decision Extraction


62. Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks


63. Representational Alignment Across Model Layers and Brain Regions with Hierarchical Optimal Transport


64. Format Inertia: A Failure Mechanism of LLMs in Medical Pre-Consultation


65. How Do Language Models Compose Functions?


66. Look Less, Reason More: Rollout-Guided Adaptive Pixel-Space Reasoning


67. Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning


68. The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM


69. Source-Free Cross-Domain Continual Learning


70. Position: Privacy Is Not Just Memorization!


71. NLP Methods for Detecting Novel LLM Jailbreaks and Keyword Analysis with BERT


72. Towards Human-Centered RegTech: Unpacking Professionals’ Strategies and Needs for Using LLMs Safely


73. Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls


74. Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead


75. LLM4Rec: Large Language Models for Multimodal Generative Recommendation with Causal Debiasing


76. Bridging Collaborative Filtering and Large Language Models with Dynamic Alignment, Multimodal Fusion and Evidence-grounded Explanations


77. A Comparison of Independent and Joint Fine-tuning Strategies for Retrieval-Augmented Generation


78. Guiding Multimodal Large Language Models with Blind and Low Vision People Visual Questions for Proactive Visual Interpretations


79. From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning?


80. POLAR: Automating Cyber Threat Prioritization through LLM-Powered Assessment


81. WALT: Web Agents that Learn Tools


82. Predictive Modeling and Explainable AI for Veterinary Safety Profiles, Residue Assessment, and Health Outcomes Using Real-World Data and Physicochemical Properties


83. Beyond Majority Voting: LLM Aggregation by Leveraging Higher-Order Information


84. Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed


85. VL-KnG: Visual Scene Understanding for Navigation Goal Identification using Spatiotemporal Knowledge Graphs


86. From keywords to semantics: Perceptions of large language models in data discovery


87. GeoSURGE: Geo-localization using Semantic Fusion with Hierarchy of Geographic Embeddings


88. BioVERSE: Representation Alignment of Biomedical Modalities to LLMs for Multi-Modal Reasoning


89. Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks


90. HiSpec: Hierarchical Speculative Decoding for LLMs


91. Enhancing the development of Cherenkov Telescope Array control software with Large Language Models


92. Microsaccade-Inspired Probing: Positional Encoding Perturbations Reveal LLM Misbehaviours


93. Emergent evaluation hubs in a decentralizing large language model ecosystem


94. LLM-based Multi-Agent Blackboard System for Information Discovery in Data Science


95. TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture


96. LLM Based Sentiment Classification From Bangladesh E-Commerce Reviews


97. Think Twice, Generate Once: Safeguarding by Progressive Self-Reflection


98. AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees


99. IoT-MCP: Bridging LLMs and IoT Systems Through Model Context Protocol


100. Measuring Algorithmic Partisanship via Zero-Shot Classification and Its Implications on Political Discourse


101. RJE: A Retrieval-Judgment-Exploration Framework for Efficient Knowledge Graph Question Answering with LLMs


102. Kant: An Efficient Unified Scheduling System for Large-Scale AI Clusters


103. Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs


104. GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models


105. Let’s Play Across Cultures: A Large Multilingual, Multicultural Benchmark for Assessing Language Models’ Understanding of Sports


106. Redundancy-as-Masking: Formalizing the Artificial Age Score (AAS) to Model Memory Aging in Generative AI


107. Confidence-Aware Routing for Large Language Model Reliability Enhancement: A Multi-Signal Approach to Pre-Generation Hallucination Mitigation


108. Automated Extraction of Material Properties using LLM-based AI Agents


109. Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks


110. Trustworthy Summarization via Uncertainty Quantification and Risk Awareness in Large Language Models


111. Enhancing Transformer-Based Rerankers with Synthetic Data and LLM-Based Supervision


112. ClaimCheck: Real-Time Fact-Checking with Small Language Models


113. Utilizing Modern Large Language Models (LLM) for Financial Trend Analysis and Digest Creation


114. Context Matters: Comparison of commercial large language tools in veterinary medicine


115. Discourse vs emissions: Analysis of corporate narratives, symbolic practices, and mimicry through LLMs


116. Towards Open-Ended Discovery for Low-Resource NLP


117. Uncovering Implicit Bias in Large Language Models with Concept Learning Dataset


118. Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs


119. Mamba Outpaces Reformer in Stock Prediction with Sentiments from Top Ten LLMs


120. An Anthropologist LLM to Elicit Users’ Moral Preferences through Role-Play