LLM 관련 주요 논문 - 2025-11-12

1. DeepPersona: A Generative Engine for Scaling Deep Synthetic Personas


2. Evaluating Online Moderation Via LLM-Powered Counterfactual Simulations


3. Saliency Map-Guided Knowledge Discovery for Subclass Identification with LLM-Based Symbolic Approximations


4. Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture


5. MENTOR: A Metacognition-Driven Self-Evolution Framework for Uncovering and Mitigating Implicit Risks in LLMs on Domain Tasks


6. LLM Driven Processes to Foster Explainable AI


7. Increasing AI Explainability by LLM Driven Standard Processes


8. RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services


9. Improving Region Representation Learning from Urban Imagery with Noisy Long-Caption Supervision


10. Do LLMs Feel? Teaching Emotion Recognition with Prompts, Retrieval, and Curriculum Learning


11. MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning


12. GRAPH-GRPO-LEX: Contract Graph Modeling and Reinforcement Learning with Group Relative Policy Optimization


13. Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis


14. AUTO-Explorer: Automated Data Collection for GUI Agent


15. SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization


16. Efficient LLM Safety Evaluation through Multi-Agent Debate


17. What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models


18. LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation


19. ALIGN: A Vision-Language Framework for High-Accuracy Accident Location Inference through Geo-Spatial Neural Reasoning


20. Secu-Table: a Comprehensive security table dataset for evaluating semantic table interpretation systems


21. Synthetic Data-Driven Prompt Tuning for Financial QA over Tables and Documents


22. GAIA: A General Agency Interaction Architecture for LLM-Human B2B Negotiation & Screening


23. Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads


24. Dataforge: A Data Agent Platform for Autonomous Data Engineering


25. CSP4SDG: Constraint and Information-Theory Based Role Identification in Social Deduction Games with LLM-Enhanced Inference


26. Chasing Consistency: Quantifying and Optimizing Human-Model Alignment in Chain-of-Thought Reasoning


27. Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles


28. Maestro: Learning to Collaborate via Conditional Listwise Policy Optimization for Multi-Agent LLMs


29. ScRPO: From Errors to Insights


30. Self-Abstraction from Grounded Experience for Plan-Guided Policy Refinement


31. An Empirical Study of Reasoning Steps in Thinking Code LLMs


32. Can a Small Model Learn to Look Before It Leaps? Dynamic Learning and Proactive Correction for Hallucination Detection


33. DiagnoLLM: A Hybrid Bayesian Neural Language Framework for Interpretable Disease Diagnosis


34. Anchors in the Machine: Behavioral and Attributional Evidence of Anchoring Bias in LLMs


35. CoT-X: An Adaptive Framework for Cross-Model Chain-of-Thought Transfer and Optimization


36. From Prompts to Power: Measuring the Energy Footprint of LLM Inference


37. Evidence-Bound Autonomous Research (EviBound): A Governance Framework for Eliminating False Claims


38. Using Vision Language Models as Closed-Loop Symbolic Planners for Robotic Applications: A Control-Theoretic Perspective


39. SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards


40. Surgical Agent Orchestration Platform for Voice-directed Patient Data Interaction


41. Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence


42. Self-Evaluating LLMs for Multi-Step Tasks: Stepwise Confidence Estimation for Failure Detection


43. FinRpt: Dataset, Evaluation System and LLM-based Multi-agent Framework for Equity Research Report Generation


44. When Bias Pretends to Be Truth: How Spurious Correlations Undermine Hallucination Detection in LLMs


45. LMM-IQA: Image Quality Assessment for Low-Dose CT Imaging


46. Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models


47. MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs


48. Leveraging Text-Driven Semantic Variation for Robust OOD Segmentation


49. Discourse Graph Guided Document Translation with Large Language Models


50. LLMServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving Techniques in LLM Infrastructure


51. NoteEx: Interactive Visual Context Manipulation for LLM-Assisted Exploratory Data Analysis in Computational Notebooks


52. Federated Learning for Video Violence Detection: Complementary Roles of Lightweight CNNs and Vision-Language Models for Energy-Efficient Use


53. AdaRec: Adaptive Recommendation with LLMs via Narrative Profiling and Dual-Channel Reasoning


54. Think Consistently, Reason Efficiently: Energy-Based Calibration for Implicit Chain-of-Thought


55. More Agents Helps but Adversarial Robustness Gap Persists


56. E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis


57. Achieving Effective Virtual Reality Interactions via Acoustic Gesture Recognition based on Large Language Models


58. Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora


59. Benchmarking LLMs for Fine-Grained Code Review with Enriched Context in Practice


60. RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation


61. Differentiated Directional Intervention A Framework for Evading LLM Safety Alignment


62. Beyond Plain Demos: A Demo-centric Anchoring Paradigm for In-Context Learning in Alzheimer’s Disease Detection


63. AgentSUMO: An Agentic Framework for Interactive Simulation Scenario Generation in SUMO via Large Language Models


64. Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models


65. Pedagogical Reflections on the Holistic Cognitive Development (HCD) Framework and AI-Augmented Learning in Creative Computing


66. Data Trajectory Alignment for LLM Domain Adaptation: A Two-Phase Synthesis Framework for Telecommunications Mathematics


67. Sensitivity of Small Language Models to Fine-tuning Data Contamination


68. Implicit Federated In-context Learning For Task-Specific LLM Fine-Tuning


69. Rank-1 LoRAs Encode Interpretable Reasoning Signals


70. S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning


71. Revisiting the Data Sampling in Multimodal Post-training from a Difficulty-Distinguish View


72. Structural Enforcement of Statistical Rigor in AI-Driven Discovery: A Functional Architecture



74. Textual Self-attention Network: Test-Time Preference Optimization through Textual Gradient-based Attention


75. How Do VLAs Effectively Inherit from VLMs?


76. SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models


77. CoFineLLM: Conformal Finetuning of LLMs for Language-Instructed Robot Planning


78. Rep2Text: Decoding Full Text from a Single LLM Token Representation


79. LLM For Loop Invariant Generation and Fixing: How Far Are We?


80. On the Analogy between Human Brain and LLMs: Spotting Key Neurons in Grammar Perception


81. Rethinking what Matters: Effective and Robust Multilingual Realignment for Low-Resource Languages


82. A Low-Rank Method for Vision Language Model Hallucination Mitigation in Autonomous Driving


83. Route Experts by Sequence, not by Token


84. Zooming into Comics: Region-Aware RL Improves Fine-Grained Comic Understanding in Vision-Language Models


85. A Multi-Agent System for Semantic Mapping of Relational Data to Knowledge Graphs


86. FLEX: Continuous Agent Evolution via Forward Learning from Experience


87. When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms


88. SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention


89. Walking the Tightrope of LLMs for Software Development: A Practitioners’ Perspective


90. HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection


91. Ghost in the Transformer: Tracing LLM Lineage with SVD-Fingerprint


92. GazeVLM: A Vision-Language Model for Multi-Task Gaze Understanding


93. PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization


94. TimeSense:Making Large Language Models Proficient in Time-Series Analysis


95. Decomate: Leveraging Generative Models for Co-Creative SVG Animation


96. LLM-Guided Reinforcement Learning with Representative Agents for Traffic Modeling


97. Breaking the Modality Barrier: Generative Modeling for Accurate Molecule Retrieval from Mass Spectra


98. WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation


99. Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation


100. Mixtures of SubExperts for Large Language Continual Learning


101. Scaling Laws and In-Context Learning: A Unified Theoretical Framework


102. Overview of CHIP 2025 Shared Task 2: Discharge Medication Recommendation for Metabolic Diseases Based on Chinese Electronic Health Records


103. Assertion-Aware Test Code Summarization with Large Language Models


104. Explicit Knowledge-Guided In-Context Learning for Early Detection of Alzheimer’s Disease


105. RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework


106. AI as intermediary in modern-day ritual: An immersive, interactive production of the roller disco musical Xanadu at UCLA


107. LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs


108. LLM Attention Transplant for Transfer Learning of Tabular Data Across Disparate Domains


109. Large Language Models Develop Novel Social Biases Through Adaptive Exploration


110. Referring Expressions as a Lens into Spatial Language Grounding in Vision-Language Models


111. Evaluation of retrieval-based QA on QUEST-LOFT


112. SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?


113. Simulating Students with Large Language Models: A Review of Architecture, Mechanisms, and Role Modelling in Education with Generative AI


114. Stemming Hallucination in Language Models Using a Licensing Oracle


115. MoSKA: Mixture of Shared KV Attention for Efficient Long-Sequence LLM Inference


116. Revisiting Entropy in Reinforcement Learning for Large Reasoning Models


117. Ontology Learning and Knowledge Graph Construction: A Comparison of Approaches and Their Impact on RAG Performance


118. Kunlun Anomaly Troubleshooter: Enabling Kernel-Level Anomaly Detection and Causal Reasoning for Large Model Distributed Inference


119. DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities


120. Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs


121. Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs


122. NILC: Discovering New Intents with LLM-assisted Clustering


123. The Imperfect Learner: Incorporating Developmental Trajectories in Memory-based Student Simulation


124. Retrieval-Augmented Generation in Medicine: A Scoping Review of Technical Implementations, Clinical Applications, and Ethical Considerations


125. A Remarkably Efficient Paradigm to Multimodal Large Language Models for Sequential Recommendation


126. Quantifying Edits Decay in Fine-tuned LLMs


127. Retrieval Quality at Context Limit


128. EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph


129. Understanding Cross Task Generalization in Handwriting-Based Alzheimer’s Screening via Vision Language Adaptation


130. WAR-Re: Web API Recommendation with Semantic Reasoning


131. MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling


132. When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins


133. VLAD-Grasp: Zero-shot Grasp Detection via Vision-Language Models


134. DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning


135. Lived Experience in Dialogue: Co-designing Personalization in Large Language Models to Support Youth Mental Well-being


136. Language Generation: Complexity Barriers and Implications for Learning


137. Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder


138. OckBench: Measuring the Efficiency of LLM Reasoning


139. AdvisingWise: Supporting Academic Advising in Higher Educations Through a Human-in-the-Loop Multi-Agent Framework


140. Optimizing Diversity and Quality through Base-Aligned Model Collaboration



142. LLMs as Packagers of HPC Software


143. CoPRIS: Efficient and Stable Reinforcement Learning via Concurrency-Controlled Partial Rollout with Importance Sampling


144. Fine-Tuning Vision-Language Models for Multimodal Polymer Property Prediction


145. In-Context Adaptation of VLMs for Few-Shot Cell Detection in Optical Microscopy


146. Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models


147. AGRAG: Advanced Graph-based Retrieval-Augmented Generation for LLMs


148. Automated Invoice Data Extraction: Using LLM and OCR


149. ConnectomeBench: Can LLMs Proofread the Connectome?


150. Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability


151. Gravity-Awareness: Deep Learning Models and LLM Simulation of Human Awareness in Altered Gravity


152. Retracing the Past: LLMs Emit Training Data When They Get Lost


153. Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation


154. Personalized Chain-of-Thought Summarization of Financial News for Investor Decision Support


155. Production-Grade Local LLM Inference on Apple Silicon: A Comparative Study of MLX, MLC-LLM, Ollama, llama.cpp, and PyTorch MPS


156. Towards Ecologically Valid LLM Benchmarks: Understanding and Designing Domain-Centered Evaluations for Journalism Practitioners


157. Predicting Oscar-Nominated Screenplays with Sentence Embeddings


158. Biomedical Hypothesis Explainability with Graph-Based Context Retrieval


159. DOCUEVAL: An LLM-based AI Engineering Tool for Building Customisable Document Evaluation Workflows


160. Customized Retrieval-Augmented Generation with LLM for Debiasing Recommendation Unlearning


161. AI Brown and AI Koditex: LLM-Generated Corpora Comparable to Traditional Corpora of English and Czech Texts