LLM 관련 주요 논문 - 2025-12-11

1. Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs


2. A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows


3. Towards Foundation Models with Native Multi-Agent Intelligence


4. Multi-Agent Intelligence for Multidisciplinary Decision-Making in Gastrointestinal Oncology


5. See-Control: A Multimodal Agent Framework for Smartphone Interaction with a Robotic Arm


6. CogMCTS: A Novel Cognitive-Guided Monte Carlo Tree Search Framework for Iterative Heuristic Evolution with Large Language Models


7. Principles2Plan: LLM-Guided System for Operationalising Ethical Principles into Plans


8. Autonomous Issue Resolver: Towards Zero-Touch Code Maintenance


9. DeepFeature: Iterative Context-aware Feature Generation for Wearable Biosignals


10. Reflecting with Two Voices: A Co-Adaptive Dual-Strategy Framework for LLM-Based Agent Decision Making


11. The High Cost of Incivility: Quantifying Interaction Inefficiency via Multi-Agent Monte Carlo Simulations


12. rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection


13. Towards a Science of Scaling Agent Systems


14. AgentEval: Generative Agents as Reliable Proxies for Human Evaluation of AI-Generated Content


15. Reasoning Models Ace the CFA Exams


16. Beyond Traditional Diagnostics: Transforming Patient-Side Information into Predictive Insights with Knowledge Graphs and Prototypes


17. Large Language Models for Education and Research: An Empirical and User Survey-based Analysis


18. Toward an AI Reasoning-Enabled System for Patient-Clinical Trial Matching


19. Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training


20. Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders


21. No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers


22. When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation


23. Fed-SE: Federated Self-Evolution for Privacy-Constrained Multi-Environment LLM Agents


24. InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models


25. Training-Free Dual Hyperbolic Adapters for Better Cross-Modal Reasoning


26. Emovectors: assessing emotional content in jazz improvisations for creativity evaluation


27. Multicalibration for LLM-based Code Generation


28. PrivTune: Efficient and Privacy-Preserving Fine-Tuning of Large Language Models via Device-Cloud Collaboration


29. Can TabPFN Compete with GNNs for Node Classification via Graph Tabularization?


30. A Systematic Evaluation of Preference Aggregation in Federated RLHF for Pluralistic Alignment of LLMs


31. Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages


32. Mind to Hand: Purposeful Robotic Control via Embodied Reasoning


33. A Hybrid Model for Stock Market Forecasting: Integrating News Sentiment and Time Series Data with Graph Neural Networks


34. Bridging Scale Discrepancies in Robotic Control via Language-Based Action Representations


35. Curriculum Guided Massive Multi Agent System Solving For Robust Long Horizon Tasks


36. LLM-based Vulnerable Code Augmentation: Generate or Refactor?


37. Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models III: Implementing the Bacterial Biothreat Benchmark (B3) Dataset


38. Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models II: Benchmark Generation Process


39. Argus: A Multi-Agent Sensitive Information Leakage Detection Framework Based on Hierarchical Reference Relationships


40. Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem


41. Empowering smart app development with SolidGPT: an edge-cloud hybrid AI agent framework


42. HybridToken-VLM: Hybrid Token Compression for Vision-Language Models


43. SpeechQualityLLM: LLM-Based Multimodal Assessment of Speech Quality


44. MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models


45. ClinicalTrialsHub: Bridging Registries and Literature for Comprehensive Clinical Trial Access


46. A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties


47. Information-Dense Reasoning for Efficient and Auditable Security Alert Triage


48. Chat with UAV – Human-UAV Interaction Based on Large Language Models


49. Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models I: The Task-Query Architecture


50. Balanced Accuracy: The Right Metric for Evaluating LLM Judges - Explained through Youden’s J statistic


51. Training LLMs for Honesty via Confessions


52. Short-Context Dominance: How Much Local Context Natural Language Actually Needs?


53. FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models


54. DeepCode: Open Agentic Coding


55. CFD-copilot: leveraging domain-adapted large language model and model context protocol to enhance simulation automation


56. MARINE: Theoretical Optimization and Design for Multi-Agent Recursive IN-context Enhancement


57. LLM-Generated Counterfactual Stress Scenarios for Portfolio Risk Simulation via Hybrid Prompt-RAG Pipeline


58. SABER: Small Actions, Big Errors - Safeguarding Mutating Steps in LLM Agents


59. MixLM: High-Throughput and Effective LLM Ranking via Text-Embedding Mix-Interaction


60. AudioScene: Integrating Object-Event Audio into 3D Scenes


61. ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models


62. Automating High Energy Physics Data Analysis with LLM-Powered Agents


63. MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs