LLM 관련 주요 논문 - 2025-08-22

1. Language-Guided Tuning: Enhancing Numeric Optimization with Textual Feedback


2. Response and Prompt Evaluation to Prevent Parasocial Relationships with Chatbots


3. NiceWebRL: a Python library for human subject experiments with reinforcement learning environments


4. DeepThink3D: Enhancing Large Language Models with Programmatic Reasoning in Complex 3D Situated Reasoning Tasks


5. Super-additive Cooperation in Language Model Agents


6. Think in Blocks: Adaptive Reasoning from Direct Response to Deep Reasoning


7. From Bits to Boardrooms: A Cutting-Edge Multi-Agent LLM Framework for Business Excellence


8. GraSP: A Unified Graph-Based Framework for Scalable Generation, Quality Tagging, and Management of Synthetic Data for SFT and DPO


9. DiagECG: An LLM-Driven Framework for Diagnostic Reasoning via Discretized ECG Tokenization


10. RETAIL: Towards Real-world Travel Planning for Large Language Models


11. Coarse-to-Fine Grounded Memory for LLM Agent Planning


12. Multiple Memory Systems for Enhancing the Long-term Memory of Agent


13. See it. Say it. Sorted: Agentic System for Compositional Diagram Generation


14. R-ConstraintBench: Evaluating LLMs on NP-Complete Scheduling


15. LLM4Sweat: A Trustworthy Large Language Model for Hyperhidrosis Support


16. PuzzleClone: An SMT-Powered Framework for Synthesizing Verifiable Data


17. LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries


18. Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis


19. End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning


20. EcomMMMU: Strategic Utilization of Visuals for Robust Multimodal E-Commerce Models


21. Tutorial on the Probabilistic Unification of Estimation Theory, Machine Learning, and Generative AI


22. StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding


23. Row-Column Hybrid Grouping for Fault-Resilient Multi-Bit Weight Representation on IMC Arrays


24. Mind and Motion Aligned: A Joint Evaluation IsaacSim Benchmark for Task Planning and Low-Level Policies in Mobile Manipulation


25. Benchmarking Computer Science Survey Generation


26. Trained Miniatures: Low cost, High Efficacy SLMs for Sales & Marketing


27. LLM-Driven Self-Refinement for Embodied Drone Task Planning


28. Subjective Behaviors and Preferences in LLM: Language of Browsing


29. Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection


30. Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets


31. Test-time Corpus Feedback: From Retrieval to RAG


32. An Empirical Study of Knowledge Distillation for Code Understanding Tasks


33. LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model


34. When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models


35. Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation


36. IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents


37. DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding


38. M-$LLM^3$REC: A Motivation-Aware User-Item Interaction Framework for Enhancing Recommendation Accuracy with LLMs


39. Conflict-Aware Soft Prompting for Retrieval-Augmented Generation


40. VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models


41. GenTune: Toward Traceable Prompts to Improve Controllability of Image Refinement in Environment Design


42. SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning


43. SemToken: Semantic-Aware Tokenization for Efficient Long-Context Language Modeling