LLM 관련 주요 논문 - 2025-08-14

1. Mathematical Computation and Reasoning Errors by Large Language Models


2. RAGulating Compliance: A Multi-Agent Knowledge Graph for Regulatory QA


3. AWorld: Dynamic Multi-Agent System with Stable Maneuvering for Robust GAIA Problem Solving


4. The PacifAIst Benchmark:Would an Artificial Intelligence Choose to Sacrifice Itself for Human Safety?


5. UDA: Unsupervised Debiasing Alignment for Pair-wise LLM-as-a-Judge


6. MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement


7. EvoCurr: Self-evolving Curriculum with Behavior Code Generation for Complex Decision-making


8. An Automated Multi-Modal Evaluation Framework for Mobile Intelligent Assistants


9. January Food Benchmark (JFB): A Public Benchmark Dataset and Evaluation Suite for Multimodal Food Analysis


10. Specialised or Generic? Tokenization Choices for Radiology Language Models


11. VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models


12. A Comprehensive Evaluation framework of Alignment Techniques for LLMs


13. Beyond Naïve Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs


14. Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning


15. Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models


16. Speed Always Wins: A Survey on Efficient Architectures for Large Language Models


17. Exploring the Potential of Large Language Models in Fine-Grained Review Comment Classification


18. Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study


19. Improving ARDS Diagnosis Through Context-Aware Concept Bottleneck Models



21. On Negative-aware Preference Optimization for Recommendation


22. AmbiGraph-Eval: Can LLMs Effectively Handle Ambiguous Graph Queries?


23. TimeMKG: Knowledge-Infused Causal Reasoning for Multivariate Time Series Modeling


24. Interpretable Robot Control via Structured Behavior Trees and Large Language Models


25. GoViG: Goal-Conditioned Visual Navigation Instruction Generation


26. Your Coding Intent is Secretly in the Context and You Should Deliberately Infer It Before Completion


27. AI Blob! LLM-Driven Recontextualization of Italian Television Archives


28. Episodic Memory Representation for Long-form Video Understanding


29. NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs


30. DeepFeatIoT: Unifying Deep Learned, Randomized, and LLM Features for Enhanced IoT Time Series Sensor Data Classification in Smart Industries


31. Hallucination vs interpretation: rethinking accuracy and precision in AI-assisted data extraction for knowledge synthesis


32. Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference