LLM 관련 주요 논문 - 2025-09-24

1. Cross-Cultural Transfer of Commonsense Reasoning in LLMs: Evidence from the Arab World


2. AgentInit: Initializing LLM-based Multi-Agent Systems via Diversity and Expertise Orchestration for Effective and Efficient Collaboration


3. Code Driven Planning with Domain-Adaptive Critic


4. From latent factors to language: a user study on LLM-generated explanations for an inherently interpretable matrix-based recommender system


5. LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions


6. Data Efficient Adaptation in Large Language Models via Continuous Low-Rank Fine-Tuning


7. How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective


8. Memory in Large Language Models: Mechanisms, Evaluation and Evolution


9. Conf-Profile: A Confidence-Driven Reasoning Paradigm for Label-Free User Profiling


10. Model selection meets clinical semantics: Optimizing ICD-10-CM prediction via LLM-as-Judge evaluation, redundancy-aware sampling, and section-aware fine-tuning


11. Bounded PCTL Model Checking of Large Language Model Outputs


12. The AGNTCY Agent Directory Service: Architecture and Implementation


13. Experience Scaling: Post-Deployment Evolution For Large Language Models


14. Autonomous Data Agents: A New Opportunity for Smart Data


15. Advances in Large Language Models for Medicine


16. TERAG: Token-Efficient Graph-Based Retrieval-Augmented Generation


17. Solving Math Word Problems Using Estimation Verification and Equation Generation


18. LLMZ+: Contextual Prompt Whitelist Principles for Agentic LLMs


19. FERA: Foil Fencing Referee Assistant Using Pose-Based Multi-Label Move Recognition and Rule Reasoning


20. Instruction-Following Evaluation in Function Calling for Large Language Models


21. Gödel Test: Can Large Language Models Solve Easy Conjectures?


22. Evaluating the Safety and Skill Reasoning of Large Reasoning Models Under Compute Constraints


23. Towards General Computer Control with Hierarchical Agents and Multi-Level Action Spaces


24. From “What to Eat?” to Perfect Recipe: ChefMind’s Chain-of-Exploration for Ambiguous User Intent in Recipe Recommendation


25. Multimodal Health Risk Prediction System for Chronic Diseases via Vision-Language Fusion and Large Language Models


26. Similarity Field Theory: A Mathematical Framework for Intelligence


27. Synthesizing Attitudes, Predicting Actions (SAPA): Behavioral Theory-Guided LLMs for Ridesourcing Mode Choice Modeling


28. Large Language Models and Operations Research: A Structured Survey


29. SPADE: A Large Language Model Framework for Soil Moisture Pattern Recognition and Anomaly Detection in Precision Agriculture


30. A Cost-Benefit Analysis of On-Premise Large Language Model Deployment: Breaking Even with Commercial LLM Services


31. Reinforcement Learning on Pre-Training Data


32. Systematic Comparative Analysis of Large Pretrained Language Models on Contextualized Medication Event Extraction


33. Steering Multimodal Large Language Models Decoding for Context-Aware Safety


34. Anecdoctoring: Automated Red-Teaming Across Language and Place


35. On the Soundness and Consistency of LLM Agents for Executing Test Cases Written in Natural Language


36. Analysis on distribution and clustering of weight


37. FUNCanon: Learning Pose-Aware Action Primitives via Functional Object Canonicalization for Generalizable Robotic Manipulation


38. Algorithms for Adversarially Robust Deep Learning


39. Pathways of Thoughts: Multi-Directional Thinking for Long-form Personalized Question Answering


40. A Mega-Study of Digital Twins Reveals Strengths, Weaknesses and Opportunities for Further Improvement


41. Pure Vision Language Action (VLA) Models: A Comprehensive Survey


42. VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction


43. No Labels Needed: Zero-Shot Image Classification with Collaborative Self-Learning


44. Diversity Boosts AI-Generated Text Detection


45. When Ads Become Profiles: Large-Scale Audit of Algorithmic Biases and LLM Profiling Risks


46. NGRPO: Negative-enhanced Group Relative Policy Optimization


47. Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions


48. Detection of security smells in IaC scripts through semantics-aware code and language processing


49. AECBench: A Hierarchical Benchmark for Knowledge Evaluation of Large Language Models in the AEC Field


50. When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models


51. Security smells in infrastructure as code: a taxonomy update beyond the seven sins


52. COLT: Enhancing Video Large Language Models with Continual Tool Usage


53. MemOrb: A Plug-and-Play Verbal-Reinforcement Memory Layer for E-Commerce Customer Service


54. RSVG-ZeroOV: Exploring a Training-Free Framework for Zero-Shot Open-Vocabulary Visual Grounding in Remote Sensing Images


55. NaviSense: A Multimodal Assistive Mobile application for Object Retrieval by Persons with Visual Impairment


56. Learning neuroimaging models from health system-scale data


57. FlexSED: Towards Open-Vocabulary Sound Event Detection


58. VLN-Zero: Rapid Exploration and Cache-Enabled Neurosymbolic Vision-Language Planning for Zero-Shot Transfer in Robot Navigation


59. LCMF: Lightweight Cross-Modality Mambaformer for Embodied Robotics VQA


60. The Ranking Blind Spot: Decision Hijacking in LLM-based Text Ranking


61. Explore the Reinforcement Learning for the LLM based ASR and TTS system


62. CCQA: Generating Question from Solution Can Improve Inference-Time Reasoning in SLMs


63. Automatic coherence-driven inference on arguments


64. APRIL: Active Partial Rollouts in Reinforcement Learning to tame long-tail generation


65. Coherence-driven inference for cybersecurity


66. CogniLoad: A Synthetic Natural Language Reasoning Benchmark With Tunable Length, Intrinsic Difficulty, and Distractor Density


67. Check Field Detection Agent (CFD-Agent) using Multimodal Large Language and Vision Language Models


68. An Artificial Intelligence Value at Risk Approach: Metrics and Models


69. Align Where the Words Look: Cross-Attention-Guided Patch Alignment with Contrastive and Transport Regularization for Bengali Captioning


70. FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction


71. Brittleness and Promise: Knowledge Graph Based Reward Modeling for Diagnostic Reasoning


72. Evaluating Large Language Models for Detecting Antisemitism


73. PEEK: Guiding and Minimal Image Representations for Zero-Shot Generalization of Robot Manipulation Policies


74. Conversational Orientation Reasoning: Egocentric-to-Allocentric Navigation with Multimodal Chain-of-Thought


75. Qianfan-VL: Domain-Enhanced Universal Vision-Language Models


76. Sparse Training Scheme for Multimodal LLM


77. From Parameters to Performance: A Data-Driven Study on LLM Structure and Development


78. Self-Evolving LLMs via Continual Instruction Tuning


79. Safe-SAIL: Towards a Fine-grained Safety Landscape of Large Language Models via Sparse Autoencoder Interpretation Framework


80. MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents


81. Prompt Optimization Meets Subspace Representation Learning for Few-shot Out-of-Distribution Detection


82. Solve it with EASE