LLM 관련 주요 논문 - 2025-11-27

1. Fighting AI with AI: Leveraging Foundation Models for Assuring AI-Enabled Safety-Critical Systems



3. Beyond Generation: Multi-Hop Reasoning for Factual Accuracy in Vision-Language Models


4. Assessing LLMs’ Performance: Insights from the Chinese Pharmacist Exam


5. Universe of Thoughts: Enabling Creative Reasoning with Large Language Models


6. DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs


7. NNGPT: Rethinking AutoML with Large Language Models


8. Improving Language Agents through BREW


9. SMoG: Schema Matching on Graph


10. Towards Benign Memory Forgetting for Selective Multimodal Large Language Model Unlearning


11. “Are We Done Yet?”: A Vision-Based Judge for Autonomous Task Completion of Computer Use Agents


12. Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design


13. M$^3$Prune: Hierarchical Communication Graph Pruning for Efficient Multi-Modal Multi-Agent Retrieval-Augmented Generation


14. A System-Level Taxonomy of Failure Modes in Large Language Model Applications


15. Semantic-KG: Using Knowledge Graphs to Construct Benchmarks for Measuring Semantic Similarity


16. RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation


17. Simulated Self-Assessment in Large Language Models: A Psychometric Approach to AI Self-Efficacy


18. KOM: A Multi-Agent Artificial Intelligence System for Precision Management of Knee Osteoarthritis (KOA)


19. NOEM$^{3}$A: A Neuro-Symbolic Ontology-Enhanced Method for Multi-Intent Understanding in Mobile Agents


20. Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs


21. Scaling Item-to-Standard Alignment with Large Language Models: Accuracy, Limits, and Solutions


22. FISCAL: Financial Synthetic Claim-document Augmented Learning for Efficient Fact-Checking


23. Using Wearable Devices to Improve Chronic PainTreatment among Patients with Opioid Use Disorder


24. Latent Collaboration in Multi-Agent Systems


25. ROOT: Robust Orthogonalized Optimizer for Neural Network Training


26. DiFR: Inference Verification Despite Nondeterminism


27. Can Vibe Coding Beat Graduate CS Students? An LLM vs. Human Coding Tournament on Market-driven Strategic Planning


28. On Evaluating LLM Alignment by Evaluating LLMs as Judges


29. DesignPref: Capturing Personal Preferences in Visual Design Generation


30. The Text Aphasia Battery (TAB): A Clinically-Grounded Benchmark for Aphasia-Like Deficits in Language Models


31. MTBBench: A Multimodal Sequential Clinical Decision-Making Benchmark in Oncology


32. Generation, Evaluation, and Explanation of Novelists’ Styles with Single-Token Prompts


33. Object-Centric Vision Token Pruning for Vision Language Models


34. LLMs for Automated Unit Test Generation and Assessment in Java: The AgoneTest Framework


35. BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali


36. Soft Adaptive Policy Optimization


37. Geometry of Decision Making in Language Models


38. Can LLMs Make (Personalized) Access Control Decisions?


39. HVAdam: A Full-Dimension Adaptive Optimizer


40. Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits


41. DUO-TOK: Dual-Track Semantic Music Tokenizer for Vocal-Accompaniment Generation


42. Beluga: A CXL-Based Memory Architecture for Scalable and Efficient LLM KVCache Management


43. “When Data is Scarce, Prompt Smarter”… Approaches to Grammatical Error Correction in Low-Resource Settings


44. R3A: Reliable RTL Repair Framework with Multi-Agent Fault Localization and Stochastic Tree-of-Thoughts Patch Generation


45. WaymoQA: A Multi-View Visual Question Answering Dataset for Safety-Critical Reasoning in Autonomous Driving


46. BERT-APC: A Reference-free Framework for Automatic Pitch Correction via Musical Context Inference


47. On the Feasibility of Hijacking MLLMs’ Decision Chain via One Perturbation


48. EmoFeedback2: Reinforcement of Continuous Emotional Image Generation via LVLM-based Reward and Textual Feedback


49. LLM-EDT: Large Language Model Enhanced Cross-domain Sequential Recommendation with Dual-phase Training


50. MAPS: Preserving Vision-Language Representations via Module-Wise Proximity Scheduling for Better Vision-Language-Action Generalization


51. CodeFuse-CommitEval: Towards Benchmarking LLM’s Power on Commit Message and Code Change Inconsistency Detection


52. Cross-LLM Generalization of Behavioral Backdoor Detection in AI Agent Supply Chains


53. A Systematic Analysis of Large Language Models with RAG-enabled Dynamic Prompting for Medical Error Detection and Correction


54. Beyond Relational: Semantic-Aware Multi-Modal Analytics with LLM-Native Query Optimization


55. Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models


56. CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception


57. Prune-Then-Plan: Step-Level Calibration for Stable Frontier Exploration in Embodied Question Answering


58. Prompt Fencing: A Cryptographic Approach to Establishing Security Boundaries in Large Language Model Prompts


59. A Layered Protocol Architecture for the Internet of Agents


60. Accuracy and Efficiency Trade-Offs in LLM-Based Malware Detection and Explanation: A Comparative Study of Parameter Tuning vs. Full Fine-Tuning



62. Robot-Powered Data Flywheels: Deploying Robots in the Wild for Continual Data Collection and Foundation Model Adaptation


63. HunyuanOCR Technical Report


64. The Semiotic Channel Principle: Measuring the Capacity for Meaning in LLM Communication


65. Cross-Domain Generalization of Multimodal LLMs for Global Photovoltaic Assessment


66. AttackPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents


67. Towards Efficient VLMs: Information-Theoretic Driven Compression via Adaptive Structural Pruning


68. Hierarchical Dual-Strategy Unlearning for Biomedical and Healthcare Intelligence Using Imperfect and Privacy-Sensitive Medical Data


69. Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM


70. A Systematic Study of Compression Ordering for Large Language Models


71. Evolution without an Oracle: Driving Effective Evolution with LLM Judges


72. Building Resilient Information Ecosystems: Large LLM-Generated Dataset of Persuasion Attacks


73. Efficient Inference Using Large Language Models with Limited Human Data: Fine-Tuning then Rectification


74. Z-Space: A Multi-Agent Tool Orchestration Framework for Enterprise-Grade LLM Automation


75. Exploiting the Experts: Unauthorized Compression in MoE-LLMs


76. WavefrontDiffusion: Dynamic Decoding Schedule or Improved Reasoning


77. BlockCert: Certified Blockwise Extraction of Transformer Mechanisms