LLM 관련 주요 논문 - 2025-12-22

1. Towards Explainable Conversational AI for Early Diagnosis with Large Language Models


2. Large Language Models as Pokémon Battle Agents: Strategic Play and Content Generation


3. Accelerating Multi-modal LLM Gaming Performance via Input Prediction and Mishit Correction


4. MMRAG-RFT: Two-stage Reinforcement Fine-tuning for Explainable Multi-modal Retrieval-augmented Generation


5. Solomonoff-Inspired Hypothesis Ranking with LLMs for Prediction Under Uncertainty


6. Reinforcement Learning for Self-Improving Agent with Skill Library


7. A Solver-in-the-Loop Framework for Improving LLMs on Answer Set Programming for Logic Puzzle Solving


8. Realistic threat perception drives intergroup conflict: A causal, dynamic analysis using generative-agent simulations


9. UniRel-R1: RL-tuned LLM Reasoning for Knowledge Graph Relational Question Answering


10. PAACE: A Plan-Aware Automated Agent Context Engineering Framework


11. AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning


12. ShareChat: A Dataset of Chatbot Conversations in the Wild


13. LLM-based Behaviour Driven Development for Hardware Design


14. Easy Adaptation: An Efficient Task-Specific Knowledge Injection Method for Large Models in Resource-Constrained Environments


15. AncientBench: Towards Comprehensive Evaluation on Excavated and Transmitted Chinese Corpora


16. Trust-Region Adaptive Policy Optimization


17. GreedySnake: Accelerating SSD-Offloaded LLM Training with Efficient Scheduling and Optimizer Step Overlapping


18. Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding


19. Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models


20. Learning What to Write: Write-Gated KV for Efficient Long-Context Inference


21. SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories


22. RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering


23. A Benchmark for Ultra-High-Resolution Remote Sensing MLLMs


24. Understanding Generalization in Role-Playing Models via Information Theory


25. Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling Autonomous LLM Systems


26. AlignDP: Hybrid Differential Privacy with Rarity-Aware Protection for LLMs


27. Incorporating Error Level Noise Embedding for Improving LLM-Assisted Robustness in Persian Speech Recognition


28. PILAR: Personalizing Augmented Reality Interactions with LLM-based Human-Centric and Trustworthy Explanations for Daily Use Cases


29. When F1 Fails: Granularity-Aware Evaluation for Dialogue Topic Segmentation


30. Can Large Reasoning Models Improve Accuracy on Mathematical Tasks Using Flawed Thinking?


31. On the Role of Contextual Information and Ego States in LLM Agent Behavior for Transactional Analysis Dialogues


32. Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL


33. A Women’s Health Benchmark for Large Language Models


34. MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval


35. Lights, Camera, Consistency: A Multistage Pipeline for Character-Stable AI Video Stories


36. V-Agent: An Interactive Video Search System Using Vision-Language Models