LLM 관련 주요 논문 - 2025-08-26

1. LLM-Based Agents for Competitive Landscape Mapping in Drug Asset Due Diligence


2. Modular Embedding Recomposition for Incremental Learning



4. AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications


5. Graph RAG as Human Choice Model: Building a Data-Driven Mobility Agent with Preference Chain


6. Bridging the Gap in Ophthalmic AI: MM-Retinal-Reason Dataset and OphthaReason Model toward Dynamic Multimodal Reasoning


7. Extending FKG.in: Towards a Food Claim Traceability Network


8. IR-Agent: Expert-Inspired LLM Agents for Structure Elucidation from Infrared Spectra


9. Integrating Time Series into LLMs via Multi-layer Steerable Embedding Fusion for Enhanced Forecasting


10. Generative Foundation Model for Structured and Unstructured Electronic Health Records


11. Sparse but Wrong: Incorrect L0 Leads to Incorrect Features in Sparse Autoencoders


12. RL Is Neither a Panacea Nor a Mirage: Understanding Supervised vs. Reinforcement Learning Fine-Tuning for LLMs


13. Towards Open World Detection: A Survey


14. FLAMES: Improving LLM Math Reasoning via a Fine-Grained Analysis of the Data Synthesis Pipeline


15. Post Hoc Regression Refinement via Pairwise Rankings


16. PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark


17. OPERA: A Reinforcement Learning–Enhanced Orchestrated Planner-Executor Architecture for Reasoning-Oriented Multi-Hop Retrieval


18. Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish


19. MedQARo: A Large-Scale Benchmark for Medical Question Answering in Romanian



21. Confusion is the Final Barrier: Rethinking Jailbreak Evaluation and Investigating the Real Misuse Threat of LLMs


22. LLMSymGuard: A Symbolic Safety Guardrail Framework Leveraging Interpretable Jailbreak Concepts


23. Retrieval Enhanced Feedback via In-context Neural Error-book


24. From Confidence to Collapse in LLM Factual Robustness


25. MCPVerse: An Expansive, Real-World Benchmark for Agentic Tool Use


26. SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning


27. LLM-Assisted Semantic Alignment and Integration in Collaborative Model-Based Systems Engineering Using SysML v2


28. Towards Recommending Usability Improvements with Multimodal Large Language Models


29. Beyond Human-prompting: Adaptive Prompt Tuning with Semantic Alignment for Anomaly Detection


30. Take That for Me: Multimodal Exophora Resolution with Interactive Questioning for Ambiguous Out-of-View Instructions


31. CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing


32. The Fools are Certain; the Wise are Doubtful: Exploring LLM Confidence in Code Completion


33. CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle Consistency


34. Cooperative Design Optimization through Natural Language Interaction


35. OpenWHO: A Document-Level Parallel Corpus for Health Translation in Low-Resource Languages


36. ASIC-Agent: An Autonomous Multi-Agent System for ASIC Design with Benchmark Evaluation


37. Noise, Adaptation, and Strategy: Assessing LLM Fidelity in Decision-Making


38. HyperFlexis: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling


39. Evaluating Structured Decoding for Text-to-Table Generation: Evidence from Three Datasets


40. Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search


41. Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs


42. Annif at the GermEval-2025 LLMs4Subjects Task: Traditional XMTC Augmented by Efficient LLMs


43. DeepMEL: A Multi-Agent Collaboration Framework for Multimodal Entity Linking


44. NEAT: Concept driven Neuron Attribution in LLMs


45. CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning


46. Building and Measuring Trust between Large Language Models


47. Coarse-to-Fine Personalized LLM Impressions for Streamlined Radiology Reports


48. CIA+TA Risk Assessment for AI Reasoning Vulnerabilities


49. Alvorada-Bench: Can Language Models Solve Brazilian University Entrance Exams?


50. Who’s Asking? Investigating Bias Through the Lens of Disability Framed Queries in LLMs


51. DAIQ: Auditing Demographic Attribute Inference from Question in LLMs


52. An Auditable Pipeline for Fuzzy Full-Text Screening in Systematic Reviews: Integrating Contrastive Semantic Highlighting and LLM Judgment


53. Research on intelligent generation of structural demolition suggestions based on multi-model collaboration


54. User-Assistant Bias in LLMs


55. SCOPE: A Generative Approach for LLM Prompt Compression


56. From Clicks to Preference: A Multi-stage Alignment Framework for Generative Query Suggestion in Conversational System


57. Detecting Hope, Hate, and Emotion in Arabic Textual Speech and Multi-modal Memes Using Large Language Models


58. Chain-of-Query: Unleashing the Power of LLMs in SQL-Aided Table Understanding via Multi-Agent Collaboration


59. KL-based self-distillation for large language models


60. SurfaceLogicKV: Surface and Logic Attention Behaviors are All You Need for Robust KV Cache Compression


61. ALAS: Autonomous Learning Agent for Self-Updating Language Models


62. ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks


63. MAC: A Live Benchmark for Multimodal Large Language Models in Scientific Understanding


64. LingVarBench: Benchmarking LLM for Automated Named Entity Recognition in Structured Synthetic Spoken Transcriptions


65. Persuasiveness and Bias in LLM: Investigating the Impact of Persuasiveness and Reinforcement of Bias in Language Models


66. Benchmarking the Medical Understanding and Reasoning of Large Language Models in Arabic Healthcare Tasks



68. InteChar: A Unified Oracle Bone Character List for Ancient Chinese Language Modeling


69. KG-o1: Enhancing Multi-hop Question Answering in Large Language Models via Knowledge Graph Integration