LLM 관련 주요 논문 - 2025-11-13

1. A Matter of Interest: Understanding Interestingness of Math Problems in Humans and Language Models


2. Patching LLM Like Software: A Lightweight Method for Improving Safety Policy in Large Language Models


3. FaithAct: Faithfulness Planning and Acting in MLLMs


4. SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models


5. DiagramIR: An Automatic Pipeline for Educational Math Diagram Evaluation


6. Multi-Agent GraphRAG: A Text-to-Cypher Framework for Labeled Property Graphs


7. MADD: Multi-Agent Drug Discovery Orchestra


8. EHRStruct: A Comprehensive Benchmark Framework for Evaluating Large Language Models on Structured Electronic Health Record Tasks


9. Towards Provably Unlearnable Examples via Bayes Error Optimisation


10. An Efficient Training Pipeline for Reasoning Graphical User Interface Agents


11. SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning


12. National Institute on Aging PREPARE Challenge: Early Detection of Cognitive Impairment Using Speech - The SpeechCARE Solution


13. Prudential Reliability of Large Language Models in Reinsurance: Governance, Assurance, and Capital Efficiency


14. Information Capacity: Evaluating the Efficiency of Large Language Models via Text Compression


15. MSCR: Exploring the Vulnerability of LLMs’ Mathematical Reasoning Abilities Using Multi-Source Candidate Replacement


16. Dual-Process Scaffold Reasoning for Enhancing LLM Code Debugging


17. Towards a Standard, Enterprise-Relevant Agentic AI Benchmark: Lessons from 5.5 billion tokens’ worth of agentic AI evaluations


18. Knowledge-Augmented Long-CoT Generation for Complex Biomolecular Reasoning


19. Numerical Sensitivity and Robustness: Exploring the Flaws of Mathematical Reasoning in Large Language Models


20. Combining LLM Semantic Reasoning with GNN Structural Modeling for Multi-view Multi-Label Feature Selection


21. VSPO: Validating Semantic Pitfalls in Ontology via LLM-Based CQ Generation



23. Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction


24. Computational Blueprints: Generating Isomorphic Mathematics Problems with Large Language Models


25. Neurophysiological Characteristics of Adaptive Reasoning for Creative Problem-Solving Strategy


26. Data Descriptions from Large Language Models with Influence Estimation


27. SparseRM: A Lightweight Preference Modeling with Sparse Autoencoder


28. WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking


29. Alignment-Aware Quantization for LLM Safety


30. Towards AI-Assisted Generation of Military Training Scenarios


31. ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents


32. AIA Forecaster: Technical Report


33. Making LLMs Reliable When It Matters Most: A Five-Layer Architecture for High-Stakes Decisions


34. AI-Driven Contribution Evaluation and Conflict Resolution: A Framework & Design for Group Workload Investigation


35. Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces


36. Think Before You Retrieve: Learning Test-Time Adaptive Search with Small Language Models


37. Procedural Knowledge Improves Agentic LLM Workflows


38. Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning


39. Analysing Environmental Efficiency in AI for X-Ray Diagnosis


40. Training Language Models to Explain Their Own Computations


41. Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models


42. The Path Not Taken: RLVR Provably Learns Off the Principals


43. Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models


44. Large Sign Language Models: Toward 3D American Sign Language Translation


45. SPEAR-MM: Selective Parameter Evaluation and Restoration via Model Merging for Efficient Financial LLM Adaptation


46. Designing LLM-based Multi-Agent Systems for Software Engineering Tasks: Quality Attributes, Design Patterns and Rationale


47. Anatomy-VLM: A Fine-grained Vision-Language Model for Medical Interpretation


48. Interaction Dynamics as a Reward Signal for LLMs


49. DPRM: A Dual Implicit Process Reward Model in Multi-Hop Question Answering


50. Hybrid Quantum-Classical Selective State Space Artificial Intelligence


51. Adaptive Multi-Agent Response Refinement in Conversational Systems


52. Test-time Diverse Reasoning by Riemannian Activation Steering


53. Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback


54. MARC: Multimodal and Multi-Task Agentic Retrieval-Augmented Generation for Cold-Start Recommender System


55. Relation as a Prior: A Novel Paradigm for LLM-based Document-level Relation Extraction


56. PerspAct: Enhancing LLM Situated Collaboration Skills through Perspective Taking and Active Vision


57. CLIP is All You Need for Human-like Semantic Representations in Stable Diffusion


58. DOA Estimation with Lightweight Network on LLM-Aided Simulated Acoustic Scenes


59. Sharp Eyes and Memory for VideoLLMs: Information-Aware Visual Token Pruning for Efficient and Reliable VideoLLM Reasoning


60. Self-Correction Distillation for Structured Data Question Answering


61. State of the Art in Text Classification for South Slavic Languages: Fine-Tuning or Prompting?


62. NOTAM-Evolve: A Knowledge-Guided Self-Evolving Optimization Framework with LLMs for NOTAM Interpretation


63. Libra-MIL: Multimodal Prototypes Stereoscopic Infused with Task-specific Language Priors for Few-shot Whole Slide Image Classification


64. Exploring the Underwater World Segmentation without Extra Training


65. Intelligence per Watt: Measuring Intelligence Efficiency of Local AI


66. LoopLLM: Transferable Energy-Latency Attacks in LLMs via Repetitive Generation


67. LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost


68. MURPHY: Multi-Turn GRPO for Self Correcting Code Generation


69. Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views


70. Judging by the Rules: Compliance-Aligned Framework for Modern Slavery Statement Monitoring


71. SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought


72. Auto-US: An Ultrasound Video Diagnosis Agent Using Video Classification Framework and LLMs


73. ViPRA: Video Prediction for Robot Actions


74. CAPO: Confidence Aware Preference Optimization Learning for Multilingual Preferences


75. Cortex AISQL: A Production SQL Engine for Unstructured Data


76. Revisiting NLI: Towards Cost-Effective and Human-Aligned Metrics for Evaluating LLMs in Question Answering


77. A Self-Improving Architecture for Dynamic Safety in Large Language Models


78. Private-RAG: Answering Multiple Queries with LLMs while Keeping Your Data Private


79. LLM Output Drift: Cross-Provider Validation & Mitigation for Financial Workflows


80. SemanticForge: Repository-Level Code Generation through Semantic Knowledge Graphs and Constraint Satisfaction


81. FedRW: Efficient Privacy-Preserving Data Reweighting for Enhancing Federated Learning of Language Models


82. Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models


83. Focusing on Language: Revealing and Exploiting Language Attention Heads in Multilingual Large Language Models


84. Enabling Automatic Self-Talk Detection via Earables


85. Alignment-Constrained Dynamic Pruning for LLMs: Identifying and Preserving Alignment-Critical Circuits


86. KG-DF: A Black-box Defense Framework against Jailbreak Attacks Based on Knowledge Graphs


87. The Polite Liar: Epistemic Pathology in Language Models


88. Motif 2 12.7B technical report


89. Dynamic Stability of LLM-Generated Code


90. It Takes Two: A Dual Stage Approach for Terminology-Aware Translation


91. REFLEX: Reference-Free Evaluation of Log Summarization via Large Language Model Judgment


92. GRIP: In-Parameter Graph Reasoning through Fine-Tuning Large Language Models


93. Exploring the Psychometric Validity of AI-Generated Student Responses: A Study on Virtual Personas’ Learning Motivation


94. Pinching Antennas Meet AI in Next-Generation Wireless Networks


95. AudAgent: Automated Auditing of Privacy Policy Compliance in AI Agents


96. Knowledge-Guided Textual Reasoning for Explainable Video Anomaly Detection via LLMs


97. DynaKV: Enabling Accurate and Efficient Long-Sequence LLM Decoding on Smartphones


98. Network and Systems Performance Characterization of MCP-Enabled LLM Agents



100. Synera: Synergistic LLM Serving across Device and Cloud at Scale


101. Advancing mathematics research with large language models