LLM 관련 주요 논문 - 2025-09-25

1. Scan-do Attitude: Towards Autonomous CT Protocol Management using a Large Language Model Agent


2. PEPS: Quantum-Inspired Reinforcement Learning for Coherent Reasoning Traces in LLMs


3. MACD: Multi-Agent Clinical Diagnosis with Self-Learned Knowledge for LLM


4. Embodied AI: From LLMs to World Models


5. CON-QA: Privacy-Preserving QA using cloud LLMs in Contract Domain


6. LatentGuard: Controllable Latent Steering for Robust Refusal of Attacks and Reliable Response Generation


7. The Conductor and the Engine: A Path Towards Co-Designed Reasoning


8. SteinerSQL: Graph-Guided Mathematical Reasoning for Text-to-SQL Generation


9. Nano Bio-Agents (NBA): Small Language Model Agents for Genomics


10. Score the Steps, Not Just the Goal: VLM-Based Subgoal Evaluation for Robotic Manipulation


11. Cognitive Load Limits in Large Language Models: Benchmarking Multi-Hop Reasoning


12. Estimating the Self-Consistency of LLMs


13. The Indispensable Role of User Simulation in the Pursuit of AGI


14. EmbeddingGemma: Powerful and Lightweight Text Representations


15. Video models are zero-shot learners and reasoners


16. RAG Security and Privacy: Formalizing the Threat Model and Attack Surface


17. DRES: Benchmarking LLMs for Disfluency Removal


18. SIM-CoT: Supervised Implicit Chain-of-Thought


19. When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity


20. Investigating Security Implications of Automatically Generated Code on the Software Supply Chain


21. Beyond Sharp Minima: Robust LLM Unlearning via Feedback-Guided Multi-Point Optimization


22. Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment


23. Play by the Type Rules: Inferring Constraints for LLM Functions in Declarative Programs


24. STAF: Leveraging LLMs for Automated Attack Tree-Based Security Test Generation


25. CyberSOCEval: Benchmarking LLMs Capabilities for Malware Analysis and Threat Intelligence Reasoning


26. Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation


27. Affective Computing and Emotional Data: Challenges and Implications in Privacy Regulations, The AI Act, and Ethics in Large Language Models


28. EchoBench: Benchmarking Sycophancy in Medical Large Vision-Language Models


29. Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving


30. Integrated Framework for LLM Evaluation with Answer Generation


31. One Filters All: A Generalist Filter for State Estimation


32. The Knowledge-Behaviour Disconnect in LLM-based Chatbots


33. Choosing to Be Green: Advancing Green AI via Dynamic Model Selection


34. When Words Can’t Capture It All: Towards Video-Based User Complaint Text Generation with Multimodal Video Complaint Dataset


35. Do Before You Judge: Self-Reference as a Pathway to Better LLM Evaluation


36. Adaptive Guidance Semantically Enhanced via Multimodal LLM for Edge-Cloud Object Detection


37. CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks


38. Eliminating stability hallucinations in llm-based tts models via attention guidance


39. TianHui: A Domain-Specific Large Language Model for Diverse Traditional Chinese Medicine Scenarios


40. Polarity Detection of Sustainable Detection Goals in News Text


41. bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs


42. PolicyPad: Collaborative Prototyping of LLM Policies


43. Thinking While Listening: Simple Test Time Scaling For Audio Classification


44. Large Language Models for Pedestrian Safety: An Application to Predicting Driver Yielding Behavior at Unsignalized Intersections


45. Are We Scaling the Right Thing? A System Perspective on Test-Time Scaling


46. Advancing Speech Summarization in Multi-modal LLMs with Reinforcement Learning


47. GuessingGame: Measuring the Informativeness of Open-Ended Questions in Large Language Models


48. Frame-Stacked Local Transformers For Efficient Multi-Codebook Speech Generation


49. Reverse Engineering User Stories from Code using Large Language Models


50. A Foundation Chemical Language Model for Comprehensive Fragment-Based Drug Discovery


51. Learning Dynamics of Deep Learning – Force Analysis of Deep Neural Networks


52. Semantic-Aware Fuzzing: An Empirical Framework for LLM-Guided, Reasoning-Driven Input Mutation


53. Uncertainty Quantification of Large Language Models using Approximate Bayesian Computation


54. How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models


55. SLM-Based Agentic AI with P-C-G: Optimized for Korean Tool Use


56. Pipeline Parallelism is All You Need for Optimized Early-Exit Based Self-Speculative Decoding


57. The Inadequacy of Offline LLM Evaluations: A Need to Account for Personalization in Model Behavior


58. Semantic Representation Attack against Aligned Large Language Models


59. Benchmarking and Improving LLM Robustness for Personalized Generation


60. RoadMind: Towards a Geospatial AI Expert for Disaster Response


61. Cognitive-Level Adaptive Generation via Capability-Aware Retrieval and Style Adaptation


62. Quantifying Compositionality of Classic and State-of-the-Art Embeddings


63. A systematic review of trial-matching pipelines using large language models


64. Unveiling the Merits and Defects of LLMs in Automatic Review Generation for Scientific Papers


65. Readme_AI: Dynamic Context Construction for Large Language Models


66. FHIR-AgentBench: Benchmarking LLM Agents for Realistic Interoperable EHR Question Answering


67. Automated Item Neutralization for Non-Cognitive Scales: A Large Language Model Approach to Reducing Social-Desirability Bias


68. LLMs as verification oracles for Solidity


69. GAUSS: Benchmarking Structured Mathematical Skills for Large Language Models