LLM 관련 주요 논문 - 2025-12-19

1. Explaining the Reasoning of Large Language Models Using Attribution Graphs


2. Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning


3. Evaluating Large Language Models in Scientific Discovery


4. Bilateral Spatial Reasoning about Street Networks: Graph-based RAG with Qualitative Spatial Representations


5. SCOPE: Prompt Evolution for Enhancing Agent Effectiveness


6. ChatGPT and Gemini participated in the Korean College Scholastic Ability Test – Earth Science I


7. CangLing-KnowFlow: A Unified Knowledge-and-Flow-fused Agent for Comprehensive Remote Sensing Applications


8. Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models


9. Beyond Accuracy: A Geometric Stability Analysis of Large Language Models in Chess Evaluation


10. IaC Generation with LLMs: An Error Taxonomy and A Study on Configuration Knowledge Injection


11. GR-Agent: Adaptive Graph Reasoning Agent under Incomplete Knowledge


12. Attention as Binding: A Vector-Symbolic Perspective on Transformer Reasoning


13. Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning


14. Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers


15. VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?


16. How Much is Too Much? Exploring LoRA Rank Trade-offs for Retaining Knowledge and Domain Robustness


17. Evaluating Metrics for Safety with LLM-as-Judges


18. How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code?


19. On Assessing the Relevance of Code Reviews Authored by Generative Models


20. Image Complexity-Aware Adaptive Retrieval for Efficient Vision-Language Models


21. Adversarial versification in portuguese as a jailbreak operator in LLMs


22. Exploring User Acceptance and Concerns toward LLM-powered Conversational Agents in Immersive Extended Reality


23. Evaluating LLMs for Zeolite Synthesis Event Extraction (ZSEE): A Systematic Analysis of Prompting Strategies


24. Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning


25. Intersectional Fairness in Vision-Language Models for Medical Image Disease Classification


26. Yes-MT’s Submission to the Low-Resource Indic Language Translation Shared Task in WMT 2024


27. RFKG-CoT: Relation-Driven Adaptive Hop-count Selection and Few-Shot Path Guidance for Knowledge-Aware QA


28. DEER: Draft with Diffusion, Verify with Autoregressive Models


29. MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers


30. Offline Multi-Task Multi-Objective Data-Driven Evolutionary Algorithm with Language Surrogate Model and Implicit Q-Learning


31. HD-Prot: A Protein Language Model for Joint Sequence-Structure Modeling with Continuous Structure Tokens


32. Quantifying Return on Security Controls in LLM Systems


33. The Semantic Illusion: Certified Limits of Embedding-Based Hallucination Detection in RAG Systems


34. The Meta-Prompting Protocol: Orchestrating LLMs via Adversarial Feedback Loops


35. SGM: Safety Glasses for Multimodal Large Language Models via Neuron-Level Detoxification


36. Epistemic diversity across language models mitigates knowledge collapse


37. DreamPRM-Code: Function-as-Step Process Reward Model with Label Correction for LLM Coding


38. Imitation Game: Reproducing Deep Learning Bugs Leveraging an Intelligent Agent


39. Evaluating Large Language Models on Multimodal Chemistry Olympiad Exams


40. EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving


41. Parameter Efficient Multimodal Instruction Tuning for Romanian Vision Language Models


42. DrugRAG: Enhancing Pharmacy LLM Performance Through A Novel Retrieval-Augmented Generation Pipeline


43. Imitation Learning for Multi-turn LM Agents via On-policy Expert Corrections


44. Integrating Large Language Models and Knowledge Graphs to Capture Political Viewpoints in News Media


45. Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse


46. Penetration Testing of Agentic AI: A Comparative Security Analysis Across Models and Frameworks


47. MALCDF: A Distributed Multi-Agent LLM Framework for Real-Time Cyber


48. Let the Barbarians In: How AI Can Accelerate Systems Performance Research


49. Sharing State Between Prompts and Programs


50. Incentives or Ontology? A Structural Rebuttal to OpenAI’s Hallucination Thesis


51. Improving VQA Reliability: A Dual-Assessment Approach with Self-Reflection and Cross-Model Verification


52. Workflows vs Agents for Code Translation


53. Revisiting the Reliability of Language Models in Instruction-Following


54. CODE ACROSTIC: Robust Watermarking for Code Generation


55. One Leak Away: How Pretrained Model Exposure Amplifies Jailbreak Risks in Finetuned LLMs


56. VERAFI: Verified Agentic Financial Intelligence through Neurosymbolic Policy Generation


57. Persistent Backdoor Attacks under Continual Fine-Tuning of LLMs


58. PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents


59. INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT


60. SoMe: A Realistic Benchmark for LLM-based Social Media Agents


61. Hybrid Attribution Priors for Explainable and Robust Model Training


62. How a Bit Becomes a Story: Semantic Steering via Differentiable Fault Injection


63. LLM as a Neural Architect: Controlled Generation of Image Captioning Models Under Strict API Contracts