LLM 관련 주요 논문 - 2025-12-03

1. LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess


2. Chain-of-Ground: Improving GUI Grounding via Iterative Reasoning and Reference Feedback


3. Learned-Rule-Augmented Large Language Model Evaluators


4. Predicting Human Chess Moves: An AI Assisted Analysis of Chess Games Using Skill-group Specific n-gram Language Models


5. Testing Transformer Learnability on the Arithmetic Sequence of Rooted Trees


6. H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons


7. Who Judges the Judge? LLM Jury-on-Demand: Building Trustworthy LLM Evaluation Systems


8. LEC: Linear Expectation Constraints for False-Discovery Control in Selective Prediction and Routing Systems


9. SynthStrategy: Extracting and Formalizing Latent Strategic Insights from LLMs in Organic Chemistry


10. Multi-Path Collaborative Reasoning via Reinforcement Learning


11. Automated Risk-of-Bias Assessment of Randomized Controlled Trials: A First Look at a GEPA-trained Programmatic Prompting Framework


12. A Flexible Multi-Agent LLM-Human Framework for Fast Human Validated Tool Building


13. The Necessity of Imperfection:Reversing Model Collapse via Simulating Cognitive Boundedness


14. CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL


15. RoboDriveVLM: A Novel Benchmark and Baseline towards Robust Vision-Language Models for Autonomous Driving


16. OntoMetric: An Ontology-Guided Framework for Automated ESG Knowledge Graph Construction


17. Unsupervised decoding of encoded reasoning using language model interpretability


18. Knowledge Graph Augmented Large Language Models for Next-Visit Disease Prediction


19. Foundation Priors


20. Energy-Aware Data-Driven Model Selection in LLM-Orchestrated AI Systems


21. SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds


22. Med-CRAFT: Automated Construction of Interpretable and Multi-Hop Video Workloads via Knowledge Graph Traversal



24. ChartAnchor: Chart Grounding with Structural-Semantic Fidelity


25. Minimal neuron ablation triggers catastrophic collapse in the language core of Large Vision-Language Models


26. Hybrid-DMKG: A Hybrid Reasoning Framework over Dynamic Multimodal Knowledge Graphs for Multimodal Multihop QA with Knowledge Editing


27. BioPro: On Difference-Aware Gender Fairness for Vision-Language Models


28. MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents


29. Probing the “Psyche’’ of Large Reasoning Models: Understanding Through a Human Lens


30. SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs


31. When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF


32. Model of human cognition


33. EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients


34. Clinical-R1: Empowering Large Language Models for Faithful and Comprehensive Reasoning with Clinical Objective Relative Policy Optimization


35. Mind the data gap: Missingness Still Shapes Large Language Model Prognoses


36. Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models


37. Echo-N1: Affective RL Frontier


38. CogEvo-Edu: Cognitive Evolution Educational Multi-Agent Collaborative System


39. RL-Struct: A Lightweight Reinforcement Learning Framework for Reliable Structured Output in LLMs


40. ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning


41. Trification: A Comprehensive Tree-based Strategy Planner and Structural Verification for Fact-Checking


42. GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment


43. Agentic Policy Optimization via Instruction-Policy Co-Evolution


44. An Empirical Study of Agent Developer Practices in AI Agent Frameworks


45. SVRG and Beyond via Posterior Correction


46. Rectifying LLM Thought from Lens of Optimization


47. Cross-Lingual Interleaving for Speech Language Models


48. BHRAM-IL: A Benchmark for Hallucination Recognition and Assessment in Multiple Indian Languages


49. ICAD-LLM: One-for-All Anomaly Detection via In-Context Learning with Large Language Models


50. Learning the Boundary of Solvability: Aligning LLMs to Detect Unsolvable Problems


51. LPCD: Unified Framework from Layer-Wise to Submodule Quantization


52. ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation


53. PromptBridge: Cross-Model Prompt Transfer for Large Language Models


54. Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN Countries


55. Stabilizing Reinforcement Learning with LLMs: Formulation and Practices


56. Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity


57. Intrinsic Structure as a Proxy for Saliency: SVD-Based Weight Preservation for Mixed-Precision Quantization in Large Language Models


58. EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations


59. Kardia-R1: Unleashing LLMs to Reason toward Understanding and Empathy for Emotional Support via Rubric-as-Judge Reinforcement Learning


60. Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding


61. SUPERChem: A Multimodal Reasoning Benchmark in Chemistry


62. Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation


63. First, do NOHARM: towards clinically safe large language models


64. LLM-as-a-Judge for Scalable Test Coverage Evaluation: Accuracy, Operational Reliability, and Cost


65. S$^2$-MLLM: Boosting Spatial Reasoning Capability of MLLMs for 3D Visual Grounding with Structural Guidance


66. M4-BLIP: Advancing Multi-Modal Media Manipulation Detection through Face-Enhanced Local Analysis


67. TempPerturb-Eval: On the Joint Effects of Internal Temperature and External Perturbations in RAG Robustness


68. Toward a benchmark for CTR prediction in online advertising: datasets, evaluation protocols and perspectives


69. Beyond Greenfield: AI-Driven Productivity in Documentation and Brownfield Engineering


70. SocialFusion: Addressing Social Degradation in Pre-trained Vision-Language Models


71. Efficiently Learning Branching Networks for Multitask Algorithmic Reasoning


72. CycliST: A Video Language Model Benchmark for Reasoning on Cyclical State Transitions


73. When Safety Blocks Sense: Measuring Semantic Confusion in LLM Refusals


74. Chain of Unit-Physics: A Primitive-Centric Approach to Scientific Code Synthesis


75. Table as a Modality for Large Language Models


76. Fine-tuning of lightweight large language models for sentiment classification on heterogeneous financial textual data


77. Mitigating Hallucinations in Zero-Shot Scientific Summarisation: A Pilot Study


78. Beyond High-Entropy Exploration: Correctness-Aware Low-Entropy Segment-Based Advantage Shaping for Reasoning LLMs


79. Look, Recite, Then Answer: Enhancing VLM Performance via Self-Generated Knowledge Hints


80. Less is More: Resource-Efficient Low-Rank Adaptation


81. HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs


82. Bias Injection Attacks on RAG Databases and Sanitization Defenses


83. SHRAG: AFrameworkfor Combining Human-Inspired Search with RAG


84. Provable Benefit of Sign Descent: A Minimal Model Under Heavy-Tailed Class Imbalance


85. REM: Evaluating LLM Embodied Spatial Reasoning through Multi-Frame Trajectories


86. Concept-Guided Backdoor Attack on Vision Language Models


87. Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation


88. Hierarchical Molecular Language Models (HMLMs)


89. ML-Tool-Bench: Tool-Augmented Planning for ML Tasks


90. ART: Adaptive Response Tuning Framework – A Multi-Agent Tournament-Based Approach to LLM Response Optimization


91. Hierarchical Decentralized Multi-Agent Coordination with Privacy-Preserving Knowledge Sharing: Extending AgentNet for Scalable Autonomous Systems


92. AgentODRL: A Large Language Model-based Multi-agent System for ODRL Generation


93. DLRREC: Denoising Latent Representations via Multi-Modal Knowledge Fusion in Deep Recommender Systems


94. Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models


95. G-KV: Decoding-Time KV Cache Eviction with Global Attention


96. ESPO: Entropy Importance Sampling Policy Optimization


97. RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards


98. SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling


99. Significant Other AI: Identity, Memory, and Emotional Regulation as Long-Term Relational Intelligence


100. Red Teaming Large Reasoning Models


101. SelfAI: Building a Self-Training AI System with LLM Agents


102. Layer Probing Improves Kinase Functional Prediction with Protein Language Models


103. Evaluating LLMs in Open-Source Games


104. Progressive Code Integration for Abstractive Bug Report Summarization


105. Tracing Mathematical Proficiency Through Problem-Solving Processes


106. VCWorld: A Biological World Model for Virtual Cell Simulation


107. Words into World: A Task-Adaptive Agent for Language-Guided Spatial Retrieval in AR


108. FiCoTS: Fine-to-Coarse LLM-Enhanced Hierarchical Cross-Modality Interaction for Time Series Forecasting


109. EduEval: A Hierarchical Cognitive Benchmark for Evaluating Large Language Models in Chinese Education


110. RealAppliance: Let High-fidelity Appliance Assets Controllable and Workable as Aligned Real Manuals


111. OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion


112. CodeFlowLM: Incremental Just-In-Time Defect Prediction with Pretrained Language Models and Exploratory Insights into Defect Localization


113. DenseScan: Advancing 3D Scene Understanding with 2D Dense Annotation


114. Constructing Efficient Fact-Storing MLPs for Transformers


115. Asm2SrcEval: Evaluating Large Language Models for Assembly-to-Source Code Translation


116. Generating Verifiable CoT from Execution-Traces


117. RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding


118. Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment


119. NetDeTox: Adversarial and Efficient Evasion of Hardware-Security GNNs via RL-LLM Orchestration


120. Enhancing Cognitive Robotics with Commonsense through LLM-Generated Preconditions and Subgoals


121. Causal Reinforcement Learning based Agent-Patient Interaction with Clinical Domain Knowledge


122. Emergent Convergence in Multi-Agent LLM Annotation


123. Text Annotation via Inductive Coding: Comparing Human Experts to LLMs in Qualitative Data Analysis


124. Assessing Large Language Models in Generating RTL Design Specifications


125. Closing the Gap: Data-Centric Fine-Tuning of Vision Language Models for the Standardized Exam Questions


126. Constrained Network Slice Assignment via Large Language Models


127. LM4Opt-RA: A Multi-Candidate LLM Framework with Structured Ranking for Automating Network Resource Allocation


128. Large Language Model for Verilog Code Generation: Literature Review and the Road Ahead


129. Architect in the Loop Agentic Hardware Design and Verification


130. Cultural Prompting Improves the Empathy and Cultural Responsiveness of GPT-Generated Therapy Responses


131. Leveraging LLMs for Design Ideation: An AI Tool to Assist Creativity


132. Development and Benchmarking of a Blended Human-AI Qualitative Research Assistant


133. Use of Retrieval-Augmented Large Language Model Agent for Long-Form COVID-19 Fact-Checking


134. Enhancing Talent Search Ranking with Role-Aware Expert Mixtures and LLM-based Fine-Grained Job Descriptions