LLM 관련 주요 논문 - 2025-10-21

1. PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold


2. Demo: Guide-RAG: Evidence-Driven Corpus Curation for Retrieval-Augmented Generation in Long COVID


3. Self-evolving expertise in complex non-verifiable subject domains: dialogue as implicit meta-RL


4. Direct Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences


5. Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism


6. JudgeSQL: Reasoning over SQL Candidates with Weighted Consensus Tournament


7. Adaptive Minds: Empowering Agents with LoRA-as-Tools


8. MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games


9. WebGen-V Bench: Structured Representation for Enhancing Visual Design in LLM-based Web Generation and Evaluation


10. AUGUSTUS: An LLM-Driven Multimodal Agent System with Contextualized User Memory


11. Experience-Driven Exploration for Efficient API-Free AI Agents


12. Multi-dimensional Data Analysis and Applications Basing on LLM Agents and Knowledge Graph Interactions


13. HugAgent: Evaluating LLMs in Simulating Human-Like Individual Reasoning on Open-Ended Tasks


14. OpenEstimate: Evaluating LLMs on Reasoning Under Uncertainty with Real-World Data


15. OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM


16. PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction


17. InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training


18. SNOO: Step-K Nesterov Outer Optimizer - The Surprising Effectiveness of Nesterov Momentum Applied to Pseudo-Gradients


19. LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation


20. Attention Sinks in Diffusion Language Models


21. ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations


22. Exploring the Synergy of Quantitative Factors and Newsflow Representations from Large Language Models for Stock Return Prediction


23. CarBoN: Calibrated Best-of-N Sampling Improves Test-time Reasoning


24. Enhance Large Language Models as Recommendation Systems with Collaborative Filtering


25. The Spark Effect: On Engineering Creative Diversity in Multi-Agent AI Systems


26. SpikeVox: Towards Energy-Efficient Speech Therapy Framework with Spike-driven Generative Language Models


27. KITE: A Benchmark for Evaluating Korean Instruction-Following Abilities in Large Language Models


28. Think Parallax: Solving Multi-Hop Problems via Multi-View Knowledge-Graph-Based Retrieval-Augmented Generation


29. Rethinking Cross-lingual Gaps from a Statistical Viewpoint


30. TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs


31. MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval


32. Language Models are Injective and Hence Invertible


33. The Road Less Traveled: Enhancing Exploration in LLMs via Sequential Sampling


34. DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios


35. An Experimental Study of Real-Life LLM-Proposed Performance Improvements


36. Selecting and Combining Large Language Models for Scalable Code Clone Detection


37. SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models


38. A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning


39. Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning


40. Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models


41. Fine-Tuning MedGemma for Clinical Captioning to Enhance Multimodal RAG over Malaysia CPGs


42. When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling


43. BeLLMan: Controlling LLM Congestion


44. DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing


45. Exemplar-Guided Planing: Enhanced LLM Agent for KGQA


46. TraceCoder: Towards Traceable ICD Coding via Multi-Source Knowledge Integration


47. DRO-InstructZero: Distributionally Robust Prompt Optimization for Large Language Models


48. Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning


49. Extending Audio Context for Long-Form Understanding in Large Audio-Language Models


50. ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning


51. Structure-R1: Dynamically Leveraging Structural Knowledge in LLM Reasoning through Reinforcement Learning


52. XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models


53. FarsiMCQGen: a Persian Multiple-choice Question Generation Framework


54. Latent Topic Synthesis: Leveraging LLMs for Electoral Ad Analysis


55. DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning


56. Continual Learning via Sparse Memory Finetuning


57. Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling


58. The Coverage Principle: How Pre-training Enables Post-Training


59. Active Honeypot Guardrail System: Probing and Confirming Multi-Turn LLM Jailbreaks


60. Rethinking Toxicity Evaluation in Large Language Models: A Multi-Label Perspective


61. Automated Snippet-Alignment Data Augmentation for Code Translation


62. VaultGemma: A Differentially Private Gemma Model