LLM 관련 주요 논문 - 2025-10-14

1. LiveOIBench: Can Large Language Models Outperform Human Contestants in Informatics Olympiads?


2. GraphMERT: Efficient and Scalable Distillation of Reliable Knowledge Graphs from Unstructured Data


3. Agentic Systems in Radiology: Design, Applications, Evaluation, and Challenges


4. Toward Mechanistic Explanation of Deductive Reasoning in Language Models


5. Localist LLMs – A Mathematical Framework for Dynamic Locality Control


6. Fundamentals of Building Autonomous LLM Agents


7. RegexPSPACE: A Benchmark for Evaluating LLM Reasoning on PSPACE-complete Regex Problems


8. Dr. Bias: Social Disparities in AI-Powered Medical Guidance


9. Leading the Follower: Learning Persuasive Agents in Social Deduction Games


10. MEC$^3$O: Multi-Expert Consensus for Code Time Complexity Prediction


11. Humanoid Artificial Consciousness Designed with Large Language Model Based on Psychoanalysis and Personality Theory


12. Repairing Regex Vulnerabilities via Localization-Guided Instructions


13. TripScore: Benchmarking and rewarding real-world travel planning with fine-grained evaluation


14. Tiny-R1V: Lightweight Multimodal Unified Reasoning Model via Model Merging


15. Semantic-Condition Tuning: Fusing Graph Context with Large Language Models for Knowledge Graph Completion


16. DualResearch: Entropy-Gated Dual-Graph Retrieval for Answer Reconstruction


17. FATHOMS-RAG: A Framework for the Assessment of Thinking and Observation in Multimodal Systems that use Retrieval Augmented Generation


18. RADAR: Mechanistic Pathways for Detecting Data Contamination in LLM Evaluation


19. LM Fight Arena: Benchmarking Large Multimodal Models via Game Competition


20. GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare


21. ReviewerToo: Should AI Join The Program Committee? A Look At The Future of Peer Review


22. What Is Your Agent’s GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment


23. COMPASS: Enhancing Agent Long-Horizon Reasoning with Evolving Context


24. Robust Heuristic Algorithm Design with LLMs


25. Optimizing delivery for quick commerce factoring qualitative assessment of generated routes


26. Hypothesis Hunting with Evolving Networks of Autonomous Scientific Agents


27. StreamingVLM: Real-Time Understanding for Infinite Video Streams


28. Prompting Test-Time Scaling Is A Strong LLM Reasoning Data Augmentation


29. Dyna-Mind: Learning to Simulate from Experience for Better AI Agents


30. SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models


31. Multimodal Policy Internalization for Conversational Agents


32. Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols


33. The Speech-LLM Takes It All: A Truly Fully End-to-End Spoken Dialogue State Tracking Approach


34. On the Representations of Entities in Auto-regressive Large Language Models


35. ChoirRec: Semantic User Grouping via LLMs for Conversion Rate Prediction of Low-Activity Users


36. The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton


37. FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference


38. Verifying Chain-of-Thought Reasoning via Its Computational Graph


39. CapGeo: A Caption-Assisted Approach to Geometric Reasoning


40. CLARity: Reasoning Consistency Alone Can Teach Reinforced Experts


41. Inflated Excellence or True Performance? Rethinking Medical Diagnostic Benchmarks with Dynamic Evaluation


42. Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models


43. CrisiText: A dataset of warning messages for LLM training in emergency communication


44. Diagnosing Shoulder Disorders Using Multimodal Large Language Models and Consumer-Grade Cameras


45. Clear Roads, Clear Vision: Advancements in Multi-Weather Restoration for Smart Transportation


46. DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction


47. Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs


48. Alif: Advancing Urdu Large Language Models via Multilingual Synthetic Data Distillation


49. Cost-Efficient Long Code Translation using LLMs while Leveraging Identifier Replacements


50. DiTSinger: Scaling Singing Voice Synthesis with Diffusion Transformer and Implicit Alignment


51. On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models


52. SEER: Sustainability Enhanced Engineering of Software Requirements


53. SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management


54. RO-Bench: Large-scale robustness evaluation of MLLMs with text-driven counterfactual videos


55. A Unified Biomedical Named Entity Recognition Framework with Large Language Models


56. Exploring Multi-Temperature Strategies for Token- and Rollout-Level Control in RLVR


57. Designing and Evaluating an AI-driven Immersive Multidisciplinary Simulation (AIMS) for Interprofessional Education


58. Vector Graph-Based Repository Understanding for Issue-Driven File Retrieval


59. Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models


60. Time-Aware Feature Selection: Adaptive Temporal Masking for Stable Sparse Autoencoder Training


61. Repository-Aware File Path Retrieval via Fine-Tuned LLMs


62. CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization


63. McMining: Automated Discovery of Misconceptions in Student Code


64. D-CoDe: Scaling Image-Pretrained VLMs to Video via Dynamic Compression and Question Decomposition


65. Benchmarking Chinese Commonsense Reasoning with a Multi-hop Reasoning Perspective


66. MLLM as a UI Judge: Benchmarking Multimodal LLMs for Predicting Human Perception of User Interfaces


67. Guiding Exploration in Reinforcement Learning Through LLM-Augmented Observations


68. Measuring Moral LLM Responses in Multilingual Capacities


69. Struc-EMB: The Potential of Structure-Aware Encoding in Language Embeddings


70. Graph Diffusion Transformers are In-Context Molecular Designers


71. Coordinates from Context: Using LLMs to Ground Complex Location References


72. When to Reason: Semantic Router for vLLM


73. ConPoSe: LLM-Guided Contact Point Selection for Scalable Cooperative Object Pushing


74. BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution


75. RAG4Tickets: AI-Powered Ticket Resolution via Retrieval-Augmented Generation on JIRA and GitHub Data


76. dInfer: An Efficient Inference Framework for Diffusion Language Models


77. RA-Gen: A Controllable Code Generation Framework Using ReAct for Multi-Agent Task Execution


78. Faver: Boosting LLM-based RTL Generation with Function Abstracted Verifiable Middleware


79. A Novel Framework for Augmenting Rating Scale Tests with LLM-Scored Text Data


80. Formalizing Style in Personal Narratives


81. Inverse-Free Wilson Loops for Transformers: A Practical Diagnostic for Invariance and Order Sensitivity


82. Upfront Chain-of-Thought: A Cooperative Framework for Chain-of-Thought Compression


83. Energy-Driven Steering: Reducing False Refusals in Large Language Models


84. Automating Android Build Repair: Bridging the Reasoning-Execution Gap in LLM Agents with Domain-Specific Tools


85. From What to Why: Thought-Space Recommendation with Small Language Models


86. Impact of LLMs on Team Collaboration in Software Development


87. Relative Positioning Based Code Chunking Method For Rich Context Retrieval In Repository Level Code Completion Task With Code Language Model


88. MMA-ASIA: A Multilingual and Multimodal Alignment Framework for Culturally-Grounded Evaluation


89. Toward a Safer Web: Multilingual Multi-Agent LLMs for Mitigating Adversarial Misinformation Attacks


90. LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback


91. Mnemosyne: An Unsupervised, Human-Inspired Long-Term Memory Architecture for Edge-Based LLMs


92. Recover-LoRA: Data-Free Accuracy Recovery of Degraded Language Models via Low-Rank Adaptation


93. Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models


94. Beyond CNNs: Efficient Fine-Tuning of Multi-Modal LLMs for Object Detection on Low-Data Regimes


95. Evaluating Hallucinations in Multimodal LLMs with Spoken Queries under Diverse Acoustic Conditions


96. AgenticAD: A Specialized Multiagent System Framework for Holistic Alzheimer Disease Management


97. Comparative Analysis of Large Language Models for the Machine-Assisted Resolution of User Intentions