LLM 관련 주요 논문 - 2025-12-12

1. SCOPE: Language Models as One-Time Teacher for Hierarchical Planning in Text Environments


2. RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning


3. An End-to-end Planning Framework with Agentic LLMs and PDDL



5. SDialog: A Python Toolkit for End-to-End Agent Building, User Simulation, Dialog Generation, and Evaluation


6. A Categorical Analysis of Large Language Models and Why LLMs Circumvent the Symbol Grounding Problem


7. Calibrated Trust in Dealing with LLM Hallucinations: A Qualitative Study


8. Provably Learning from Modern Language Models via Low Logit Rank


9. FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning


10. MedForget: Hierarchy-Aware Multimodal Unlearning Testbed for Medical AI



12. The Ky Fan Norms and Beyond: Dual Norms and Combinations for Matrix Optimization


13. Can LLMs Evaluate What They Cannot Annotate? Revisiting LLM Reliability in Hate Speech Detection


14. Rethinking Chain-of-Thought Reasoning for Videos


15. Auto-BenchmarkCard: Automated Synthesis of Benchmark Documentation


16. System Report for CCL25-Eval Task 10: Prompt-Driven Large Language Model Merge for Fine-Grained Chinese Hate Speech Detection


17. SWEnergy: An Empirical Study on Energy Efficiency in Agentic Issue Resolution Frameworks with SLMs


18. Representation Invariance and Allocation: When Subgroup Balance Matters


19. RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning


20. Advancing LLM-Based Security Automation with Customized Group Relative Policy Optimization for Zero-Touch Networks


21. Advancing Research via Human-AI Interactive Theorem Proving


22. Representation Calibration and Uncertainty Guidance for Class-Incremental Learning based on Vision Language Model


23. CourtPressGER: A German Court Decision to Press Release Summarization Dataset


24. ODMA: On-Demand Memory Allocation Framework for LLM Serving on LPDDR-Class Accelerators


25. GLACIA: Instance-Aware Positional Reasoning for Glacial Lake Segmentation via Multimodal Large Language Model


26. CORE: A Conceptual Reasoning Layer for Large Language Models


27. LLMs for Analog Circuit Design Continuum (ACDC)


28. WOLF: Werewolf-based Observations for LLM Deception and Falsehoods


29. Prompt-Based Continual Compositional Zero-Shot Learning


30. MindShift: Analyzing Language Models’ Reactions to Psychological Prompts


31. Detecting Hallucinations in Graph Retrieval-Augmented Generation via Attention Patterns and Semantic Alignment


32. Knowledge-Guided Large Language Model for Automatic Pediatric Dental Record Understanding and Safe Antibiotic Recommendation


33. Evolving Excellence: Automated Optimization of LLM-based Agents


34. ORCA: Open-ended Response Correctness Assessment for Audio Question Answering


35. Towards Lossless Ultimate Vision Token Compression for VLMs


36. Llama-based source code vulnerability detection: Prompt engineering vs Fine tuning


37. DermETAS-SNA LLM: A Dermatology Focused Evolutionary Transformer Architecture Search with StackNet Augmented LLM Assistant


38. RAG-HAR: Retrieval Augmented Generation-based Human Activity Recognition


39. Mitigating Bias with Words: Inducing Demographic Ambiguity in Face Recognition Templates by Text Encoding


40. Peek-a-Boo Reasoning: Contrastive Region Masking in MLLMs


41. CluCERT: Certifying LLM Robustness via Clustering-Guided Denoising Smoothing


42. Financial Instruction Following Evaluation (FIFE)


43. LLM4XCE: Large Language Models for Extremely Large-Scale Massive MIMO Channel Estimation


44. AI Co-Artist: A LLM-Powered Framework for Interactive GLSL Shader Animation Evolution


45. The Linguistic Architecture of Reflective Thought: Evaluation of a Large Language Model as a Tool to Isolate the Formal Structure of Mentalization


46. Noise-Robust Abstractive Compression in Retrieval-Augmented Language Models


47. Assessing the Human-Likeness of LLM-Driven Digital Twins in Simulating Health Care System Trust


48. When AI Gives Advice: Evaluating AI and Human Responses to Online Advice-Seeking for Well-Being


49. A Principle-based Framework for the Development and Evaluation of Large Language Models for Health and Wellness


50. Motion2Meaning: A Clinician-Centered Framework for Contestable LLM in Parkinson’s Disease Gait Interpretation