LLM 관련 주요 논문 - 2025-09-18

1. CrowdAgent: Multi-Agent Managed Multi-Source Annotation System


2. MIRA: Empowering One-Touch AI Services on Smartphones with MLLM-based Instruction Recommendation


3. THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning


4. InfraMind: A Novel Exploration-based GUI Agentic Framework for Mission-critical Industrial Management


5. Programmable Cognitive Bias in Social Agents


6. AI Agents with Human-Like Collaborative Tools: Adaptive Strategies for Enhanced Problem-Solving


7. The Art of Saying “Maybe”: A Conformal Lens for Uncertainty Benchmarking in VLMs


8. $Agent^2$: An Agent-Generates-Agent Framework for Reinforcement Learning Automation


9. Semantic Fusion with Fuzzy-Membership Features for Controllable Language Modelling


10. Agentic UAVs: LLM-Driven Autonomy with Integrated Tool-Calling and Cognitive Reasoning


11. Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning


12. FRIT: Using Causal Importance to Improve Chain-of-Thought Faithfulness


13. Evaluation Awareness Scales Predictably in Open-Weights Large Language Models


14. Explicit Reasoning Makes Better Judges: A Systematic Study on Accuracy, Efficiency, and Robustness


15. Apertus: Democratizing Open and Compliant LLMs for Global Language Environments


16. Language models’ activations linearly encode training-order recency


17. A Universal Banach–Bregman Framework for Stochastic Iterations: Unifying Stochastic Mirror Descent, Learning and LLM Training


18. Dense Video Understanding with Gated Residual Tokenization


19. Synthesizing Behaviorally-Grounded Reasoning Chains: A Data-Generation Framework for Personal Finance LLMs


20. TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning


21. Reasoning Efficiently Through Adaptive Chain-of-Thought Compression: A Self-Optimizing Framework


22. Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency


23. LLM Agents for Interactive Workflow Provenance: Reference Architecture and Evaluation Methodology


24. An Empirical Study on Failures in Automated Issue Solving


25. DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models


26. Do Large Language Models Understand Word Senses?


27. Synthetic Data Generation for Screen Time and App Usage


28. Combating Biomedical Misinformation through Multi-modal Claim Detection and Evidence-based Verification


29. Combining Evidence and Reasoning for Biomedical Fact-Checking


30. Teaching According to Talents! Instruction Tuning LLMs with Competence-Aware Curriculum Learning


31. Who is Introducing the Failure? Automatically Attributing Failures of Multi-Agent Systems via Spectrum Analysis


32. Exploring Data and Parameter Efficient Strategies for Arabic Dialect Identifications


33. Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning


34. Automated Triaging and Transfer Learning of Incident Learning Safety Reports Using Large Language Representational Models


35. DSCC-HS: A Dynamic Self-Reinforcing Framework for Hallucination Suppression in Large Language Models


36. Improving Context Fidelity via Native Retrieval-Augmented Reasoning


37. Re-purposing SAM into Efficient Visual Projectors for MLLM-Based Referring Image Segmentation


38. CL$^2$GEC: A Multi-Discipline Benchmark for Continual Learning in Chinese Literature Grammatical Error Correction


39. DREAM: Domain-aware Reasoning for Efficient Autonomous Underwater Monitoring


40. Sparse Neurons Carry Strong Signals of Question Ambiguity in LLMs


41. Modernizing Facebook Scoped Search: Keyword and Embedding Hybrid Retrieval with LLM Evaluation


42. Agentic JWT: A Secure Delegation Protocol for Autonomous AI Agents


43. Intelligent Healthcare Imaging Platform An VLM-Based Framework for Automated Medical Image Analysis and Clinical Report Generation


44. Prompt2DAG: A Modular Methodology for LLM-Based Data Enrichment Pipeline Generation



46. Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews


47. EdiVal-Agent: An Object-Centric Framework for Automated, Scalable, Fine-Grained Evaluation of Multi-Turn Editing


48. The threat of analytic flexibility in using large language models to simulate human data: A call to attention


49. ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy


50. An Empirical Analysis of VLM-based OOD Detection: Mechanisms, Advantages, and Sensitivity


51. Accuracy Paradox in Large Language Models: Regulating Hallucination Risks in Generative AI