LLM 관련 주요 논문 - 2025-08-28

1. The Ramon Llull’s Thinking Machine for Automated Ideation


2. MATRIX: Multi-Agent simulaTion fRamework for safe Interactions and conteXtual clinical conversational evaluation


3. Reasoning LLMs in the Medical Domain: A Literature Survey


4. Trustworthy Agents for Electronic Health Records through Confidence Estimation


5. Can Structured Templates Facilitate LLMs in Tackling Harder Tasks? : An Exploration of Scaling Laws by Difficulty


6. A Concurrent Modular Agent: Framework for Autonomous LLM Agents


7. Investigating Advanced Reasoning of Large Language Models via Black-Box Interaction


8. Sense of Self and Time in Borderline Personality. A Comparative Robustness Study with Generative AI


9. AI Models Exceed Individual Human Accuracy in Predicting Everyday Social Norms


10. Enabling MoE on the Edge via Importance-Driven Expert Scheduling


11. Novel Approaches to Artificial Intelligence Development Based on the Nearest Neighbor Method


12. VISION: Robust and Interpretable Code Vulnerability Detection Leveraging Counterfactual Augmentation


13. FormaRL: Enhancing Autoformalization with no Labeled Data


14. Interactive Evaluation of Large Language Models for Multi-Requirement Software Engineering Tasks



16. STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning


17. CausalMACE: Causality Empowered Multi-Agents in Minecraft Cooperative Tasks


18. Dynamic Collaboration of Multi-Language Models based on Minimal Complete Semantic Units


19. Reflection-Enhanced Meta-Optimization Integrating TextGrad-style Prompt Optimization with Memory-Driven Self-Evolution


20. CAC-CoT: Connector-Aware Compact Chain-of-Thought for Efficient Reasoning Data Synthesis Across Dual-System Cognitive Tasks


21. Bias Mitigation Agent: Optimizing Source Selection for Fair and Balanced Knowledge Retrieval


22. VistaWise: Building Cost-Effective Agent with Cross-Modal Knowledge Graph for Minecraft


23. AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration and User Assistance


24. MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use


25. Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented Roadmap


26. RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing


27. SchemaCoder: Automatic Log Schema Extraction Coder with Residual Q-Tree Boosting


28. A Database-Driven Framework for 3D Level Generation with LLMs


29. Language Models For Generalised PDDL Planning: Synthesising Sound and Programmatic Policies


30. The AI in the Mirror: LLM Self-Recognition in an Iterated Public Goods Game


31. PKG-DPO: Optimizing Domain-Specific AI systems with Physics Knowledge Graphs and Direct Preference Optimization


32. AI LLM Proof of Self-Consciousness and User-Specific Attractors


33. Generative Interfaces for Language Models


34. Understanding Tool-Integrated Reasoning


35. ZeST: an LLM-based Zero-Shot Traversability Navigation for Unknown Environments


36. APT-LLM: Exploiting Arbitrary-Precision Tensor Core Computing for LLM Acceleration


37. HiPlan: Hierarchical Planning for LLM-Based Agents with Adaptive Global-Local Guidance


38. An LLM-powered Natural-to-Robotic Language Translation Framework with Correctness Guarantees


39. Automatic Prompt Optimization with Prompt Distillation


40. PAX-TS: Model-agnostic multi-granular explanations for time series forecasting via localized perturbations


41. Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework


42. Enhancing Model Privacy in Federated Learning with Random Masking and Quantization


43. pyFAST: A Modular PyTorch Framework for Time Series Modeling with Multi-source and Sparse Data


44. HAEPO: History-Aggregated Exploratory Policy Optimization



46. ReflectivePrompt: Reflective evolution in autoprompting algorithms


47. ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive


48. ConfTuner: Training Large Language Models to Express Their Confidence Verbally


49. A Survey on Cloud-Edge-Terminal Collaborative Intelligence in AIoT Networks


50. Insights into User Interface Innovations from a Design Thinking Workshop at deRSE25


51. FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation


52. Tailored Teaching with Balanced Difficulty: Elevating Reasoning in Multimodal Chain-of-Thought via Prompt Curriculum


53. Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks


54. Membership Inference Attacks on LLM-based Recommender Systems


55. Breaking the Trade-Off Between Faithfulness and Expressiveness for Large Language Models


56. PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality


57. LaQual: A Novel Framework for Automated Evaluation of LLM App Quality


58. Scaling Laws for Task-Stratified Knowledge in Post-Training Quantized Large Language Models


59. What do language models model? Transformers, automata, and the format of thought


60. A Case Study on the Effectiveness of LLMs in Verification with Proof Assistants


61. DrugReasoner: Interpretable Drug Approval Prediction with a Reasoning-augmented Language Model


62. Collaborative Intelligence: Topic Modelling of Large Language Model use in Live Cybersecurity Operations


63. Principled Detection of Hallucinations in Large Language Models via Multiple Testing


64. VERIRL: Boosting the LLM-based Verilog Code Generation via Reinforcement Learning


65. How Reliable are LLMs for Reasoning on the Re-ranking task?


66. A Systematic Approach to Predict the Impact of Cybersecurity Vulnerabilities Using LLMs


67. CLARIFY: A Specialist-Generalist Framework for Accurate and Lightweight Dermatological Visual Question Answering


68. Toward Generalized Autonomous Agents: A Neuro-Symbolic AI Framework for Integrating Social and Technical Support in Education


69. Latent Self-Consistency for Reliable Majority-Set Selection in Short- and Long-Answer Reasoning


70. Backprompting: Leveraging Synthetic Production Data for Health Advice Guardrails


71. EAI-Avatar: Emotion-Aware Interactive Talking Head Generation


72. LLMs Can’t Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions


73. ProtoEHR: Hierarchical Prototype Learning for EHR-based Healthcare Predictions


74. What Matters in Data for DPO?


75. SALMAN: Stability Analysis of Language Models Through the Maps Between Graph-based Manifolds


76. Can VLMs Recall Factual Associations From Visual References?


77. H-PRM: A Pluggable Hotword Pre-Retrieval Module for Various Speech Recognition Systems


78. Consensus Is All You Need: Gossip-Based Reasoning Among Large Language Models


79. Multi-Modal Drift Forecasting of Leeway Objects via Navier-Stokes-Guided CNN and Sequence-to-Sequence Attention-Based Models