LLM 관련 주요 논문 - 2025-10-02

1. Generalized Parallel Scaling with Interdependent Generations


2. Exploring Network-Knowledge Graph Duality: A Case Study in Agentic Supply Chain Risk Analysis


3. Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense


4. Typed Chain-of-Thought: A Curry-Howard Framework for Verifying LLM Reasoning


5. Uncovering the Computational Ingredients of Human-Like Representations in LLMs


6. Shape Happens: Automatic Feature Manifold Discovery in LLMs via Supervised Multi-Dimensional Scaling


7. QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL


8. On Discovering Algorithms for Adversarial Imitation Learning


9. Learning Compact Representations of LLM Abilities via Item Response Theory


10. AI in data science education: experiences from the classroom


11. EvolProver: Advancing Automated Theorem Proving by Evolving Formalized Problems via Symmetry and Difficulty


12. ACPO: Adaptive Curriculum Policy Optimization for Aligning Vision-Language Models in Complex Reasoning


13. Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution


14. Is Model Editing Built on Sand? Revealing Its Illusory Success and Fragile Foundation


15. ACON: Optimizing Context Compression for Long-horizon LLM Agents


16. Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability


17. Data Quality Challenges in Retrieval-Augmented Generation


18. VIRTUE: Visual-Interactive Text-Image Universal Embedder


19. Rethinking Reward Models for Multi-Domain Test-Time Scaling


20. Towards Self-Evolving Benchmarks: Synthesizing Agent Trajectories via Test-Time Exploration under Validate-by-Reproduce Paradigm


21. BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models


22. ICL Optimized Fragility


23. DualTune: Decoupled Fine-Tuning for On-Device Agentic Systems


24. Drones that Think on their Feet: Sudden Landing Decisions with Embodied AI


25. Judging by Appearances? Auditing and Intervening Vision-Language Models for Bail Prediction


26. ARS: Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models


27. ToolBrain: A Flexible Reinforcement Learning Framework for Agentic Tools


28. TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments


29. COM-BOM: Bayesian Exemplar Search for Efficiently Exploring the Accuracy-Calibration Pareto Frontier


30. Code2Video: A Code-centric Paradigm for Educational Video Generation


31. Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity


32. Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards


33. GRAD: Generative Retrieval-Aligned Demonstration Sampler for Efficient Few-Shot Reasoning


34. Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare


35. Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?


36. mR3: Multilingual Rubric-Agnostic Reward Reasoning Models


37. A Practitioner’s Guide to Multi-turn Agentic Reinforcement Learning



39. Hybrid Dialogue State Tracking for Persian Chatbots: A Language Model-Based Approach


40. GEM: A Gym for Agentic LLMs


41. Interpreting Language Models Through Concept Descriptions: A Survey


42. CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs


43. TextCAM: Explaining Class Activation Map with Text


44. Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving


45. Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers


46. RiskPO: Risk-based Policy Optimization via Verifiable Reward for LLM Post-Training


47. Bridging Language Gaps: Advances in Cross-Lingual Information Retrieval with Multilingual LLMs


48. Span-level Detection of AI-generated Scientific Text via Contrastive Learning and Structural Calibration


49. Advancing Automated Ethical Profiling in SE: a Zero-Shot Evaluation of LLM Reasoning


50. Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs


51. Can World Models Benefit VLMs for World Dynamics?


52. Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning


53. Solar PV Installation Potential Assessment on Building Facades Based on Vision and Language Foundation Models


54. Multi-Objective Task-Aware Predictor for Image-Text Alignment



56. Inclusive Easy-to-Read Generation for Individuals with Cognitive Impairments


57. Facilitating Cognitive Accessibility with LLMs: A Multi-Task Approach to Easy-to-Read Text Generation


58. Hybrid Training for Vision-Language-Action Models


59. PromptPilot: Improving Human-AI Collaboration Through LLM-Enhanced Prompt Engineering


60. On Predictability of Reinforcement Learning Dynamics for Large Language Models


61. EMR-AGENT: Automating Cohort and Feature Extraction from EMR Databases


62. Copy-Paste to Mitigate Large Language Model Hallucinations


63. Graph2Eval: Automatic Multimodal Task Generation for Agents via Knowledge Graphs


64. MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance


65. Exploring System 1 and 2 communication for latent reasoning in LLMs


66. Make a Video Call with LLM: A Measurement Campaign over Five Mainstream Apps


67. Analyzing Latent Concepts in Code Language Models


68. Cloud Investigation Automation Framework (CIAF): An AI-Driven Approach to Cloud Forensics


69. A Call to Action for a Secure-by-Design Generative AI Paradigm


70. Plug-and-Play Prompt Refinement via Latent Feedback for Diffusion Model Alignment


71. Automated Structured Radiology Report Generation with Rich Clinical Context


72. David and Goliath in Medical Vision: Convolutional Networks vs Biomedical Vision Language Models


73. AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features


74. Combining Large Language Models and Gradient-Free Optimization for Automatic Control Policy Synthesis


75. In-Context Curiosity: Distilling Exploration for Decision-Pretrained Transformers on Bandit Tasks


76. Navigating the Synchrony-Stability Frontier in Adaptive Chatbots


77. Reasoning-Aware Prompt Orchestration: A Foundation Model for Multi-Agent Language Model Coordination


78. DecepChain: Inducing Deceptive Reasoning in Large Language Models


79. Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models


80. o-MEGA: Optimized Methods for Explanation Generation and Analysis


81. Data driven approaches in nanophotonics: A review of AI-enabled metadevices


82. Efficient Layer-wise LLM Fine-tuning for Revision Intention Prediction


83. Retrieval-Augmented Generation for Electrocardiogram-Language Models


84. Can AI agents understand spoken conversations about data visualizations in online meetings?


85. SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence


86. BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses


87. The Pitfalls of KV Cache Compression


88. LoRAFusion: Efficient LoRA Fine-Tuning for LLMs


89. GRPO-$λ$: Credit Assignment improves LLM Reasoning


90. PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning


91. Why Can’t Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls


92. A Systematic Study of Large Language Models for Task and Motion Planning With PDDLStream


93. CHAI: Command Hijacking against embodied AI


94. Personalized Reasoning: Just-In-Time Personalization and Why LLMs Fail At It


95. Direct Token Optimization: A Self-contained Approach to Large Language Model Unlearning


96. Geo-R1: Unlocking VLM Geospatial Reasoning with Cross-View Reinforcement Learning


97. Intelligent 5S Audit: Application of Artificial Intelligence for Continuous Improvement in the Automotive Industry


98. AstroMMBench: A Benchmark for Evaluating Multimodal Large Language Models Capabilities in Astronomy


99. Less is More: Lean yet Powerful Vision-Language Model for Autonomous Driving


100. HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling


101. Explanation-Driven Counterfactual Testing for Faithfulness in Vision-Language Model Explanations


102. Reinforcement Learning-Based Prompt Template Stealing for Text-to-Image Models


103. Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness


104. Uncovering Intrinsic Capabilities: A Paradigm for Data Curation in Vision-Language Models


105. AutoPK: Leveraging LLMs and a Hybrid Similarity Metric for Advanced Retrieval of Pharmacokinetic Data from Complex Tables and Documents


106. DexBench: Benchmarking LLMs for Personalized Decision Making in Diabetes Management


107. WaveMind: Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities


108. Rethinking RoPE Scaling in Quantized LLM: Theory, Outlier, and Channel-Band Analysis with Weight Rescaling


109. EpidemIQs: Prompt-to-Paper LLM Agents for Epidemic Modeling and Analysis


110. Methodological Framework for Quantifying Semantic Test Coverage in RAG Systems