LLM 관련 주요 논문 - 2025-10-29

1. Advancing site-specific disease and pest management in precision agriculture: From reasoning-driven foundation models to adaptive, feedback-based learning


2. FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling


3. Generative AI for Healthcare: Fundamentals, Challenges, and Perspectives


4. From Cross-Task Examples to In-Task Prompts: A Graph-Based Pseudo-Labeling Framework for In-context Learning



6. Human-Level Reasoning: A Comparative Study of Large Language Models on Logical and Abstract Reasoning


7. OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows


8. APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training


9. Improving LLM Reasoning via Dependency-Aware Query Decomposition and Logic-Parallel Content Expansion


10. A Unified Geometric Space Bridging AI Models and the Human Brain


11. VDSAgents: A PCS-Guided Multi-Agent System for Veridical Data Science Automation


12. Generative Large Language Models (gLLMs) in Content Analysis: A Practical Guide for Communication Research


13. Retrieval and Argumentation Enhanced Multi-Agent LLMs for Judgmental Forecasting


14. Verifying Large Language Models’ Reasoning Paths via Correlation Matrix Rank


15. MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools


16. MGA: Memory-Driven GUI Agent for Observation-Centric Interaction


17. BLM$_1$: A Boundless Large Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning


18. HistoLens: An Interactive XAI Toolkit for Verifying and Mitigating Flaws in Vision-Language Models for Histopathology


19. LLMLogAnalyzer: A Clustering-Based Log Analysis Chatbot using Large Language Models


20. Discovering Heuristics with Large Language Models (LLMs) for Mixed-Integer Programs: Single-Machine Scheduling


21. The Sign Estimator: LLM Alignment in the Face of Choice Heterogeneity


22. Latent Chain-of-Thought for Visual Reasoning


23. Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges


24. Hybrid Modeling, Sim-to-Real Reinforcement Learning, and Large Language Model Driven Control for Digital Twins


25. Decentralized Multi-Agent Goal Assignment for Path Planning using Large Language Models


26. ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents


27. Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra


28. ComboBench: Can LLMs Manipulate Physical Devices to Play Virtual Reality Games?


29. Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents


30. Tongyi DeepResearch Technical Report


31. Greedy Sampling Is Provably Efficient for RLHF


32. AgentFold: Long-Horizon Web Agents with Proactive Context Management


33. Repurposing Synthetic Data for Fine-grained Search Agent Supervision


34. Dissecting Role Cognition in Medical LLMs via Neuronal Ablation


35. Zero-Shot Cross-Lingual Transfer using Prefix-Based Adaptation


36. A word association network methodology for evaluating implicit biases in LLMs compared to humans


37. Mitigating Hallucination in Large Language Models (LLMs): An Application-Oriented Survey on RAG, Reasoning, and Agentic Systems


38. Iterative Critique-Refine Framework for Enhancing LLM Personalization


39. Charting the European LLM Benchmarking Landscape: A New Taxonomy and a Set of Best Practices


40. Rethinking Visual Intelligence: Insights from Video Pretraining


41. Can LLMs Write Faithfully? An Agent-Based Evaluation of LLM-generated Islamic Content


42. MiniOneRec: An Open-Source Framework for Scaling Generative Recommendation


43. Metadata-Driven Retrieval-Augmented Generation for Financial Question Answering


44. LongWeave: A Long-Form Generation Benchmark Bridging Real-World Relevance and Verifiability


45. Beyond MCQ: An Open-Ended Arabic Cultural QA Benchmark with Dialect Variants


46. Few-Shot Remote Sensing Image Scene Classification with CLIP and Prompt Learning


47. Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning


48. ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model


49. Enabling Near-realtime Remote Sensing via Satellite-Ground Collaboration of Large Vision-Language Models


50. PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling


51. MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations


52. Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning


53. Ko-MuSR: A Multistep Soft Reasoning Benchmark for LLMs Capable of Understanding Korean


54. Beyond Line-Level Filtering for the Pretraining Corpora of LLMs


55. Compositional Image Synthesis with Inference-Time Scaling


56. FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic


57. SpecKD: Speculative Decoding for Effective Knowledge Distillation of LLMs


58. Teaching LLMs to Abstain via Fine-Grained Semantic Confidence Reward


59. Lifecycle-Aware code generation: Leveraging Software Engineering Phases in LLMs


60. Mars-Bench: A Benchmark for Evaluating Foundation Models for Mars Science Tasks


61. Uncovering the Potential Risks in Unlearning: Danger of English-only Unlearning in Multilingual LLMs


62. ChessQA: Evaluating Large Language Models for Chess Understanding


63. Auto prompting without training labels: An LLM cascade for product quality assessment in e-commerce catalogs


64. Agent-based Automated Claim Matching with Instruction-following LLMs


65. Key and Value Weights Are Probably All You Need: On the Necessity of the Query, Key, Value weight Triplet in Decoder-Only Transformers


66. Evaluating the effectiveness of LLM-based interoperability


67. PRO: Enabling Precise and Robust Text Watermark for Open-Source LLMs


68. OraPlan-SQL: A Planning-Centric Framework for Complex Bilingual NL2SQL Reasoning


69. Can LLMs Narrate Tabular Data? An Evaluation Framework for Natural Language Representations of Text-to-SQL System Outputs


70. CRADLE Bench: A Clinician-Annotated Benchmark for Multi-Faceted Mental Health Crisis and Safety Risk Detection


71. Explainable Detection of AI-Generated Images with Artifact Localization Using Faster-Than-Lies and Vision-Language Models for Edge Devices


72. TDFlow: Agentic Workflows for Test Driven Software Engineering


73. Debiasing Reward Models by Representation Learning with Guarantees


74. Beyond Prompt Engineering: Neuro-Symbolic-Causal Architecture for Robust Multi-Objective AI Agents


75. QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents


76. RefleXGen:The unexamined code is not worth using


77. MCPGuard : Automatically Detecting Vulnerabilities in MCP Servers


78. Sparsity and Superposition in Mixture of Experts


79. Aligning Diffusion Language Models via Unpaired Preference Optimization


80. The Structural Scalpel: Automated Contiguous Layer Pruning for Large Language Models


81. Efficient Low Rank Attention for Long-Context Inference in Large Language Models


82. VisCoder2: Building Multi-Language Visualization Coding Agents


83. Flight Delay Prediction via Cross-Modality Adaptation of Large Language Models and Aircraft Trajectory Representation


84. LLMComp: A Language Modeling Paradigm for Error-Bounded Scientific Data Compression


85. Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling


86. NUM2EVENT: Interpretable Event Reasoning from Numerical time-series


87. Chain of Execution Supervision Promotes General Reasoning in Large Language Models


88. From Detection to Discovery: A Closed-Loop Approach for Simultaneous and Continuous Medical Knowledge Expansion and Depression Detection on Social Media


89. Fine-tuning Large Language Models with Limited Data: A Survey and Practical Guide