LLM 관련 주요 논문 - 2025-09-12

1. The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs


2. Compositional Concept Generalization with Variational Quantum Circuits


3. TORSO: Template-Oriented Reasoning Towards General Tasks


4. Curriculum-Based Multi-Tier Semantic Exploration via Deep Reinforcement Learning


5. Towards Adaptive ML Benchmarks: Web-Agent-Driven Construction, Domain Expansion, and Metric Optimization


6. LightAgent: Production-level Open-source Agentic AI Framework


7. Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning


8. Fusing Knowledge and Language: A Comparative Study of Knowledge Graph-Based Question Answering with LLMs



10. Enabling Regulatory Multi-Agent Collaboration: Architecture, Challenges, and Solutions


11. Understanding Economic Tradeoffs Between Human and AI Agents in Bargaining Games


12. Instructional Prompt Optimization for Few-Shot LLM-Based Recommendations on Cold-Start Users


13. Global Constraint LLM Agents for Text-to-Model Translation


14. Automated Unity Game Template Generation from GDDs via NLP and Multi-Modal LLMs


15. ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms


16. CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models


17. LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering


18. Fluent but Unfeeling: The Emotional Blind Spots of Language Models


19. ENSI: Efficient Non-Interactive Secure Inference for Large Language Models


20. LLMs Don’t Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations


21. MetaLLMix : An XAI Aided LLM-Meta-learning Based Approach for Hyper-parameters Optimization


22. Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization


23. On Integrating Large Language Models and Scenario-Based Programming for Improving Software Reliability


24. Probing Pre-trained Language Models on Code Changes: Insights from ReDef, a High-Confidence Just-in-Time Defect Prediction Dataset


25. EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs


26. Character-Level Perturbations Disrupt LLM Watermarks


27. DP-FedLoRA: Privacy-Enhanced Federated Fine-Tuning for On-Device Large Language Models


28. Towards Confidential and Efficient LLM Inference with Dual Privacy Protection


29. Improving LLM Safety and Helpfulness using SFT and DPO: A Study on OPT-350M


30. Stated Preference for Interaction and Continued Engagement (SPICE): Evaluating an LLM’s Willingness to Re-engage in Conversation


31. Can Vision-Language Models Solve Visual Math Equations?


32. Open-sci-ref-0.01: open and reproducible reference baselines for language model and dataset comparison


33. PromptGuard: An Orchestrated Prompting Framework for Principled Synthetic Text Generation for Vulnerable Populations using LLMs with Enhanced Safety, Fairness, and Controllability


34. Recurrence Meets Transformers for Universal Multimodal Retrieval


35. Benchmarking Energy Efficiency of Large Language Models Using vLLM


36. Investigating Student Interaction Patterns with Large Language Model-Powered Course Assistants in Computer Science Courses


37. PerFairX: Is There a Balance Between Fairness and Personality in Large Language Model Recommendations?