LLM 관련 주요 논문 - 2025-10-23

1. Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents


2. RLIE: Rule Generation with Logistic Regression, Iterative Refinement, and Evaluation for Large Language Models


3. AgentSense: LLMs Empower Generalizable and Explainable Web-Based Participatory Urban Sensing


4. NeSyPr: Neurosymbolic Proceduralization For Efficient Embodied Reasoning


5. MSC-Bench: A Rigorous Benchmark for Multi-Server Tool Orchestration


6. Learning to Make Friends: Coaching LLM Agents toward Emergent Social Ties


7. A Multi-faceted Analysis of Cognitive Abilities: Evaluating Prompt Methods with Large Language Models on the CONSORT Checklist


8. The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMS


9. Rectifying Shortcut Behaviors in Preference-based Reward Learning


10. Timely Clinical Diagnosis through Active Test Selection


11. Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality


12. Semantic World Models


13. Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning


14. Integrating Transparent Models, LLMs, and Practitioner-in-the-Loop: A Case of Nonprofit Program Evaluation


15. On Controlled Change: Generative AI’s Impact on Professional Authority in Journalism


16. AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders


17. SmartSwitch: Advancing LLM Reasoning by Overcoming Underthinking via Promoting Deeper Thought Exploration


18. Are Large Language Models Sensitive to the Motives Behind Communication?


19. I Spy With My Model’s Eye: Visual Search as a Behavioural Test for MLLMs


20. Unraveling Emotions with Pre-Trained Models


21. XBench: A Comprehensive Benchmark for Visual-Language Explanations in Chest Radiography


22. Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark


23. A Matter of Time: Revealing the Structure of Time in Vision-Language Models


24. Modeling realistic human behavior using generative agents in a multimodal transport system: Software architecture and Application to Toulouse


25. CARES: Context-Aware Resolution Selector for VLMs


26. KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge


27. Monitoring LLM-based Multi-Agent Systems Against Corruptions via Node Evaluation


28. ToMMeR – Efficient Entity Mention Detection from Large Language Models


29. ColorAgent: Building A Robust, Personalized, and Interactive OS Agent


30. AgenticMath: Enhancing LLM Reasoning via Agentic-based Math Data Generation


31. M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models


32. Metadata Extraction Leveraging Large Language Models


33. SORA-ATMAS: Adaptive Trust Management and Multi-LLM Aligned Governance for Future Smart Cities


34. Balancing Rewards in Text Summarization: Multi-Objective Reinforcement Learning via HyperVolume Optimization


35. LAPRAD: LLM-Assisted PRotocol Attack Discovery


36. See, Think, Act: Online Shopper Behavior Simulation with VLM Agents


37. PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning


38. Interpretable Question Answering with Knowledge Graphs


39. Imbalanced Gradients in RL Post-Training of Multi-Task LLMs


40. News-Aware Direct Reinforcement Trading for Financial Markets


41. When Facts Change: Probing LLMs on Evolving Knowledge with evolveQA


42. That’s Deprecated! Understanding, Detecting, and Steering Knowledge Conflicts in Language Models for Code Generation


43. What Makes a Good Curriculum? Disentangling the Effects of Data Ordering on LLM Mathematical Reasoning


44. PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions


45. CLiVR: Conversational Learning System in Virtual Reality with AI-Powered Patients


46. FlexiDataGen: An Adaptive LLM Framework for Dynamic Semantic Dataset Generation in Sensitive Domains


47. Prior-informed optimization of treatment recommendation via bandit algorithms trained on large language model-processed historical records


48. Robust Driving QA through Metadata-Grounded Context and Task-Specific Prompts


49. ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge


50. A Justice Lens on Fairness and Ethics Courses in Computing Education: LLM-Assisted Multi-Perspective and Thematic Evaluation


51. BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping


52. Benchmarking On-Device Machine Learning on Apple Silicon with MLX


53. Misinformation Detection using Large Language Models with Explainability


54. Context-aware Fairness Evaluation and Mitigation in LLMs


55. Learning from the Best, Differently: A Diversity-Driven Rethinking on Data Selection



57. 3D Optimization for AI Inference Scaling: Balancing Accuracy, Cost, and Latency


58. DuoLens: A Framework for Robust Detection of Machine-Generated Multilingual Text and Code


59. Evaluating LLMs for Career Guidance: Comparative Analysis of Computing Competency Recommendations Across Ten African Countries


60. AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And Simulators


61. CosmoCore Affective Dream-Replay Reinforcement Learning for Code Generation


62. CodeCRDT: Observation-Driven Coordination for Multi-Agent LLM Code Generation


63. Small Language Models Offer Significant Potential for Science Community


64. Contextual Augmentation for Entity Linking using Large Language Models


65. LLM Bazaar: A Service Design for Supporting Collaborative Learning with an LLM-Powered Multi-Party Collaboration Infrastructure