LLM 관련 주요 논문 - 2025-12-29

1. Accelerating Scientific Discovery with Autonomous Goal-evolving Agents


2. Towards Responsible and Explainable AI Agents with Consensus-Driven Reasoning


3. Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing


4. AMS-IO-Bench and AMS-IO-Agent: Benchmarking and Structured Reasoning for Analog and Mixed-Signal Integrated Circuit Input/Output Design


5. A Medical Multimodal Diagnostic Framework Integrating Vision-Language Models and Logic Tree Reasoning


6. NEMO-4-PAYPAL: Leveraging NVIDIA’s Nemo Framework for empowering PayPal’s Commerce Agent


7. From Visual Perception to Deep Empathy: An Automated Assessment Framework for House-Tree-Person Drawings Using Multimodal LLMs and Multi-Agent Collaboration



9. Introducing TrGLUE and SentiTurca: A Comprehensive Benchmark for Turkish General Language Understanding and Sentiment Analysis


10. Unifying Learning Dynamics and Generalization in Transformers Scaling Law


11. LVLM-Aided Alignment of Task-Specific Vision Models


12. Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model


13. Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models


14. MASFIN: A Multi-Agent System for Decomposed Financial Reasoning and Forecasting


15. CricBench: A Multilingual Benchmark for Evaluating LLMs in Cricket Analytics



17. A Comedy of Estimators: On KL Regularization in RL Training of LLMs


18. HeartBench: Probing Core Dimensions of Anthropomorphic Intelligence in LLMs


19. Five Years of SciCap: What We Learned and Future Directions for Scientific Figure Captioning


20. HELP: Hierarchical Embodied Language Planner for Household Tasks


21. An Information Theoretic Perspective on Agentic System Design


22. CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation


23. Do Latent Tokens Think? A Causal and Adversarial Analysis of Chain-of-Continuous-Thought


24. Detecting AI-Generated Paraphrases in Bengali: A Comparative Study of Zero-Shot and Fine-Tuned Transformers


25. LLM-I2I: Boost Your Small Item2Item Recommendation Model with Large Language Model


26. A Unified Definition of Hallucination, Or: It’s the World Model, Stupid


27. Towards Long-window Anchoring in Vision-Language Model Distillation


28. Hierarchy-Aware Fine-Tuning of Vision-Language Models


29. Selective LLM-Guided Regularization for Enhancing Recommendation Models


30. MotionTeller: Multi-modal Integration of Wearable Time-Series with LLMs for Health and Behavioral Understanding


31. Oogiri-Master: Benchmarking Humor Understanding via Oogiri


32. dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning


33. Morality is Contextual: Learning Interpretable Moral Contexts from Human Data with Probabilistic Clustering and Large Language Models


34. Teaching People LLM’s Errors and Getting it Right


35. LLM-Driven Feature-Level Adversarial Attacks on Android Malware Detectors


36. AInsteinBench: Benchmarking Coding Agents on Scientific Repositories


37. Reflection-Driven Control for Trustworthy Code Agents


38. Multi-Agent LLM Committees for Autonomous Software Beta Testing


39. CosmoCore-Evo: Evolutionary Dream-Replay Reinforcement Learning for Adaptive Code Generation


40. Query Carefully: Detecting the Unanswerables in Text-to-SQL Tasks


41. From Questions to Clinical Recommendations: Large Language Models Driving Evidence-Based Clinical Decision Making