LLM 관련 주요 논문 - 2025-11-19

1. Beyond Mimicry: Preference Coherence in LLMs


2. CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product


3. Automated Construction of Medical Indicator Knowledge Graphs Using Retrieval Augmented Large Language Models


4. FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI


5. Multi-Agent Multimodal Large Language Model Framework for Automated Interpretation of Fuel Efficiency Analytics in Public Transportation


6. Cognitive Maps in Language Models: A Mechanistic Analysis of Spatial Planning


7. Grounded by Experience: Generative Healthcare Prediction Augmented with Hierarchical Agentic Retrieval


8. Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment


9. Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO


10. Cost-Effective Communication: An Auction-based Method for Language Agent Interaction


11. MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications


12. MEGA-GUI: Multi-stage Enhanced Grounding Agents for GUI Elements


13. Scaling Generative Verifiers For Natural Language Mathematical Proof Verification And Selection


14. PragWorld: A Benchmark Evaluating LLMs’ Local World Model under Minimal Linguistic Alterations and Conversational Dynamics


15. GEM: Generative Entropy-Guided Preference Modeling for Few-shot Alignment of LLMs


16. WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance


17. MedRule-KG: A Knowledge-Graph–Steered Scaffold for Reliable Mathematical and Biomedical Reasoning


18. Yanyun-3: Enabling Cross-Platform Strategy Game Operation with Vision-Language Models


19. Fault2Flow: An AlphaEvolve-Optimized Human-in-the-Loop Multi-Agent System for Fault-to-Workflow Automation


20. CoS: Towards Optimal Event Scheduling via Chain-of-Scheduling


21. Online Learning of HTN Methods for integrated LLM-HTN Planning


22. Think, Speak, Decide: Language-Augmented Multi-Agent Reinforcement Learning for Economic Decision-Making


23. Bootstrapping LLMs via Preference-Based Policy Optimization


24. Event-CausNet: Unlocking Causal Knowledge from Text with Large Language Models for Reliable Spatio-Temporal Forecasting


25. Enhancing Conversational Recommender Systems with Tree-Structured Knowledge and Pretrained Language Models


26. ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction


27. Multi-agent Self-triage System with Medical Flowcharts


28. Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning


29. UpBench: A Dynamically Evolving Real-World Labor-Market Agentic Benchmark Framework Built for Human-Centric AI


30. MoralReason: Generalizable Moral Decision Alignment For LLM Agents Using Reasoning-Level Reinforcement Learning


31. MetaGDPO: Alleviating Catastrophic Forgetting with Metacognitive Knowledge through Group Direct Preference Optimization


32. Bayesian Optimization in Language Space: An Eval-Efficient AI Self-Improvement Framework


33. Adaptive Diagnostic Reasoning Framework for Pathology with Multimodal Large Language Models


34. Look As You Think: Unifying Reasoning and Visual Evidence Attribution for Verifiable Document RAG via Reinforcement Learning


35. LLM-Assisted Formalization Enables Deterministic Detection of Statutory Inconsistency in the Internal Revenue Code


36. An Analysis of Architectural Impact on LLM-based Abstract Visual Reasoning: A Systematic Benchmark on RAVEN-FAIR


37. Forgetting-MarI: LLM Unlearning via Marginal Information Regularization


38. TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language Models


39. Do LLMs Really Struggle at NL-FOL Translation? Revealing their Strengths via a Novel Benchmarking Strategy


40. On the Measure of a Model: From Intelligence to Generality


41. Learning to Refine: An Agentic RL Approach for Iterative SPARQL Query Construction


42. Towards autonomous quantum physics research using LLM agents with access to intelligent tools


43. Value-Aligned Prompt Moderation via Zero-Shot Agentic Rewriting for Safe Image Generation


44. CausalGuard: A Smart System for Detecting and Preventing False Information in Large Language Models


45. SynBullying: A Multi LLM Synthetic Conversational Dataset for Cyberbullying Detectio


46. CLINB: A Climate Intelligence Benchmark for Foundational Models


47. LLM-Generated Negative News Headlines Dataset: Creation and Benchmarking Against Real Journalism


48. Protein Secondary Structure Prediction Using 3D Graphs and Relation-Aware Message Passing Transformers


49. Person-AI Bidirectional Fit - A Proof-Of-Concept Case Study Of Augmented Human-Ai Symbiosis In Management Decision-Making Process


50. Weight-sparse transformers have interpretable circuits


51. Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?


52. Data Value in the Age of Scaling: Understanding LLM Scaling Dynamics Under Real-Synthetic Data Mixtures


53. P1: Mastering Physics Olympiads with Reinforcement Learning


54. Beyond SELECT: A Comprehensive Taxonomy-Guided Benchmark for Real-World Text-to-SQL Translation


55. ForgeDAN: An Evolutionary Framework for Jailbreaking Aligned Large Language Models


56. Semantic Document Derendering: SVG Reconstruction via Vision-Language Modeling


57. Trust in Vision-Language Models: Insights from a Participatory User Workshop


58. Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline


59. Descriptor: Distance-Annotated Traffic Perception Question Answering (DTPQA)


60. Moving Pictures of Thought: Extracting Visual Knowledge in Charles S. Peirce’s Manuscripts with Vision-Language Models


61. A Novel Hierarchical Integration Method for Efficient Model Merging in Medical LLMs


62. Donors and Recipients: On Asymmetric Transfer Across Tasks and Languages with Parameter-Efficient Fine-Tuning


63. An LLM-based Quantitative Framework for Evaluating High-Stealthy Backdoor Risks in OSS Supply Chains


64. AutoMalDesc: Large-Scale Script Analysis for Cyber Threat Research


65. Whistledown: Combining User-Level Privacy with Conversational Coherence in LLMs


66. KForge: Program Synthesis for Diverse AI Hardware Accelerators


67. Spatial Blind Spot: Auditory Motion Perception Deficits in Audio LLMs


68. Computational Measurement of Political Positions: A Review of Text-Based Ideal Point Estimation Algorithms


69. ParaDySe: A Parallel-Strategy Switching Framework for Dynamic Sequence Lengths in Transformer


70. Extracting Events Like Code: A Multi-Agent Programming Framework for Zero-Shot Event Extraction


71. MACKO: Sparse Matrix-Vector Multiplication for Low Sparsity


72. Learning from the Undesirable: Robust Adaptation of Language Models without Forgetting


73. AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models


74. SLMQuant:Benchmarking Small Language Model Quantization for Practical Deployment


75. SAGE: Spuriousness-Aware Guided Prompt Exploration for Mitigating Multimodal Bias


76. Tokenize Once, Recommend Anywhere: Unified Item Tokenization for Multi-domain LLM-based Recommendation


77. DeepSport: A Multimodal Large Language Model for Comprehensive Sports Video Reasoning via Agentic Reinforcement Learning


78. On the Fundamental Limits of LLMs at Scale


79. Video Finetuning Improves Reasoning Between Frames


80. NeuroLex: A Lightweight Domain Language Model for EEG Report Understanding and Generation


81. From Passive to Persuasive: Steering Emotional Nuance in Human-AI Negotiation


82. Catastrophic Forgetting in Kolmogorov-Arnold Networks


83. Genomic Next-Token Predictors are In-Context Learners


84. Scalable Multi-Objective and Meta Reinforcement Learning via Gradient Estimation


85. Evidence of Phase Transitions in Small Transformer-Based Language Models


86. Whose Narrative is it Anyway? A KV Cache Manipulation Attack


87. Are LLMs The Way Forward? A Case Study on LLM-Guided Reinforcement Learning for Decentralized Autonomous Driving


88. Adaptive Focus Memory for Language Models


89. HEDGE: Hallucination Estimation via Dense Geometric Entropy for VQA with Vision-Language Models


90. R$^{2}$Seg: Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection


91. Improving Direct Persian-English Speech-to-Speech Translation with Discrete Units and Synthetic Parallel Data


92. BridgeEQA: Virtual Embodied Agents for Real Bridge Inspections


93. LLM4SCREENLIT: Recommendations on Assessing the Performance of Large Language Models for Screening Literature in Systematic Reviews


94. Knots: A Large-Scale Multi-Agent Enhanced Expert-Annotated Dataset and LLM Prompt Optimization for NOTAM Semantic Parsing


95. Group-Aware Reinforcement Learning for Output Diversity in Large Language Models


96. Mitigating Length Bias in RLHF through a Causal Lens


97. Accepted with Minor Revisions: Value of AI-Assisted Scientific Writing


98. SGuard-v1: Safety Guardrail for Large Language Models


99. Evolving Prompts for Toxicity Search in Large Language Models


100. One Request, Multiple Experts: LLM Orchestrates Domain Specific Models via Adaptive Task Routing


101. Assessing LLMs for Serendipity Discovery in Knowledge Graphs: A Case for Drug Repurposing


102. MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding


103. SeedAIchemy: LLM-Driven Seed Corpus Generation for Fuzzing


104. SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs


105. From Phonemes to Meaning: Evaluating Large Language Models on Tamil


106. Don’t Think of the White Bear: Ironic Negation in Transformer Models Under Cognitive Load


107. Decision and Gender Biases in Large Language Models: A Behavioral Economic Perspective


108. Optimal Self-Consistency for Efficient Reasoning with Large Language Models


109. Sangam: Chiplet-Based DRAM-PIM Accelerator with CXL Integration for LLM Inferencing


110. CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models


111. Consistency Is the Key: Detecting Hallucinations in LLM Generated Text By Checking Inconsistencies About Key Facts


112. MME-RAG: Multi-Manager-Expert Retrieval-Augmented Generation for Fine-Grained Entity Recognition in Task-Oriented Dialogues


113. AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models


114. OAD-Promoter: Enhancing Zero-shot VQA using Large Language Models with Object Attribute Description


115. LLMLagBench: Identifying Temporal Training Boundaries in Large Language Models


116. Explainable Transformer-Based Email Phishing Classification with Adversarial Robustness


117. Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys


118. EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation


119. Striking the Right Balance between Compute and Copy: Improving LLM Inferencing Under Speculative Decoding


120. GCAgent: Long-Video Understanding via Schematic and Narrative Episodic Memory


121. On the Entropy Calibration of Language Models


122. KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference


123. Prompt Triage: Structured Optimization Enhances Vision-Language Model Performance on Medical Imaging Benchmarks


124. VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization


125. Flash-Fusion: Enabling Expressive, Low-Latency Queries on IoT Sensor Streams with LLMs


126. Better LLM Reasoning via Dual-Play


127. A Multimodal Manufacturing Safety Chatbot: Knowledge Base Design, Benchmark Development, and Evaluation of Multiple RAG Approaches


128. Towards Autoformalization of LLM-generated Outputs for Requirement Verification


129. Conformal Constrained Policy Optimization for Cost-Effective LLM Agents


130. Scaling Open-Weight Large Language Models for Hydropower Regulatory Information Extraction: A Systematic Analysis


131. On the Notion that Language Models Reason


132. Differences in the Moral Foundations of Large Language Models


133. From Single to Societal: Analyzing Persona-Induced Bias in Multi-Agent Interactions


134. MALBO: Optimizing LLM-Based Multi-Agent Teams via Multi-Objective Bayesian Optimization


135. Image-POSER: Reflective RL for Multi-Expert Image Generation and Editing


136. Scaling Equitable Reflection Assessment in Education via Large Language Models and Role-Based Feedback Agents


137. Demystify, Use, Reflect: Preparing students to be informed LLM-users


138. Concept-RuleNet: Grounded Multi-Agent Neurosymbolic Reasoning in Vision Language Models


139. Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput


140. Reasoning: From Reflection to Solution


141. Doubly Debiased Test-Time Prompt Tuning for Vision-Language Models


142. A Structure-Agnostic Co-Tuning Framework for LLMs and SLMs in Cloud-Edge Systems


143. Beyond Superficial Forgetting: Thorough Unlearning through Knowledge Density Estimation and Block Re-insertion


144. SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization


145. GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning


146. Characterizing and Understanding Energy Footprint and Efficiency of Small Language Model on Edges


147. AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUs


148. Evaluating Large Language Models for Workload Mapping and Scheduling in Heterogeneous HPC Systems


149. Why Should the Server Do It All?: A Scalable, Versatile, and Model-Agnostic Framework for Server-Light DNN Inference over Massively Distributed Clients via Training-Free Intermediate Feature Compression


150. TimeStampEval: A Simple LLM Eval and a Little Fuzzy Matching Trick to Improve Search Accuracy


151. MedBuild AI: An Agent-Based Hybrid Intelligence Framework for Reshaping Agency in Healthcare Infrastructure Planning through Generative Design for Medical Architecture


152. Parallel and Multi-Stage Knowledge Graph Retrieval for Behaviorally Aligned Financial Asset Recommendations


153. The Anatomy of a Triton Attention Kernel


154. DAOpt: Modeling and Evaluation of Data-Driven Optimization under Uncertainty with LLMs


155. HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization


156. MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection