LLM 관련 주요 논문 - 2025-12-10

1. Large Causal Models from Large Language Models


2. ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning


3. RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models


4. The Agent Capability Problem: Predicting Solvability Through Information-Theoretic Bounds


5. Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement


6. How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations


7. ContextualSHAP : Enhancing SHAP Explanations Through Contextual Language Generation


8. VIGIL: A Reflective Runtime for Self-Healing Agents


9. ClinNoteAgents: An LLM Multi-Agent System for Predicting and Interpreting Heart Failure 30-Day Readmission from Clinical Notes


10. Do Persona-Infused LLMs Affect Performance in a Strategic Reasoning Game?


11. JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models


12. Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning


13. DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems


14. ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems


15. Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents


16. Stochasticity in Agentic Evaluations: Quantifying Inconsistency with Intraclass Correlation


17. The Effect of Belief Boxes and Open-mindedness on Persuasion


18. UncertaintyZoo: A Unified Toolkit for Quantifying Predictive Uncertainty in Deep Learning Systems


19. GENIUS: An Agentic AI Framework for Autonomous Design and Execution of Simulation Protocols


20. Less Is More for Multi-Step Logical Reasoning of LLM Generalisation Under Rule Removal, Paraphrasing, and Compression


21. DaGRPO: Rectifying Gradient Conflict in Reasoning via Distinctiveness-Aware Group Relative Policy Optimization


22. On measuring grounding and generalizing grounding problems


23. ARCANE: A Multi-Agent Framework for Interpretable and Configurable Alignment


24. Deep learning for autism detection using clinical notes: A comparison of transfer learning for a transparent and black-box approach


25. Going All-In on LLM Accuracy: Fake Prediction Markets, Real Confidence Signals


26. Relational Visual Similarity


27. Provable Long-Range Benefits of Next-Token Prediction


28. Understanding Privacy Risks in Code Models Through Training Dynamics: A Causal Approach


29. Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support


30. SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination


31. In-Context and Few-Shots Learning for Forecasting Time Series Data based on Large Language Models


32. When Large Language Models Do Not Work: Online Incivility Prediction through Graph Neural Networks


33. An AI-Powered Autonomous Underwater System for Sea Exploration and Scientific Research


34. PCMind-2.1-Kaiyuan-2B Technical Report


35. Metric-Fair Prompting: Treating Similar Samples Similarly


36. Complementary Learning Approach for Text Classification using Large Language Models


37. Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models


38. MoCoRP: Modeling Consistent Relations between Persona and Response for Persona-based Dialogue


39. VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection


40. LIME: Making LLM Data More Efficient with Linguistic Metadata Embeddings


41. AutoICE: Automatically Synthesizing Verifiable C Code via LLM-driven Evolution


42. Understanding LLM Agent Behaviours via Game Theory: Strategy Recognition, Biases and Multi-Agent Dynamics


43. Persian-Phi: Efficient Cross-Lingual Adaptation of Compact LLMs via Curriculum Learning


44. Do LLMs Trust the Code They Write?


45. ESPADA: Execution Speedup via Semantics Aware Demonstration Data Downsampling for Imitation Learning


46. Structure-Aware Feature Rectification with Region Adjacency Graphs for Training-Free Open-Vocabulary Semantic Segmentation


47. Venus: An Efficient Edge Memory-and-Retrieval System for VLM-based Online Video Understanding


48. DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management


49. Radiance-Field Reinforced Pretraining: Scaling Localization Models with Unlabeled Wireless Signals


50. Exact Synthetic Populations for Scalable Societal and Market Modeling


51. Towards Accurate UAV Image Perception: Guiding Vision-Language Models with Stronger Task Prompts


52. SIT-Graph: State Integrated Tool Graph for Multi-Turn Agents


53. Dropout Prompt Learning: Towards Robust and Adaptive Vision-Language Models


54. NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models


55. VFM-VLM: Vision Foundation Model and Vision Language Model based Visual Comparison for 3D Pose Estimation


56. START: Spatial and Textual Learning for Chart Understanding


57. A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning


58. DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning


59. RisConFix: LLM-based Automated Repair of Risk-Prone Drone Configurations


60. FOAM: Blocked State Folding for Memory-Efficient LLM Training


61. The Geometry of Persona: Disentangling Personality from Reasoning in Large Language Models


62. Leveraging KV Similarity for Online Structured Pruning in LLMs


63. ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking


64. Reformulate, Retrieve, Localize: Agents for Repository-Level Bug Localization


65. Latency-Response Theory Model: Evaluating Large Language Models via Response Accuracy and Chain-of-Thought Length


66. FVA-RAG: Falsification-Verification Alignment for Mitigating Sycophantic Hallucinations


67. Singing Timbre Popularity Assessment Based on Multimodal Large Foundation Model


68. Prompting-in-a-Series: Psychology-Informed Contents and Embeddings for Personality Recognition With Decoder-Only Models


69. NeuroABench: A Multimodal Evaluation Benchmark for Neurosurgical Anatomy Identification


70. SoK: Trust-Authorization Mismatch in LLM Agent Interactions


71. BabelCoder: Agentic Code Translation with Specification Alignment


72. Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior


73. Formal that “Floats” High: Formal Verification of Floating Point Arithmetic


74. Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs


75. RMAdapter: Reconstruction-based Multi-Modal Adapter for Vision-Language Models


76. Optimal and Diffusion Transports in Machine Learning


77. From Description to Score: Can LLMs Quantify Vulnerabilities?


78. From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs


79. Stitch and Tell: A Structured Multimodal Data Augmentation Method for Spatial Understanding


80. VisChainBench: A Benchmark for Multi-Turn, Multi-Image Visual Reasoning Beyond Language Priors


81. Becoming Experienced Judges: Selective Test-Time Learning for Evaluators


82. PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance


83. Task-Model Alignment: A Simple Path to Generalizable AI-Generated Image Detection


84. A Patient-Doctor-NLP-System to contest inequality for less privileged


85. “The Dentist is an involved parent, the bartender is not”: Revealing Implicit Biases in QA with Implicit BBQ


86. The Role of Entropy in Visual Grounding: Analysis and Optimization


87. Mechanistic Interpretability of GPT-2: Lexical and Contextual Layers in Sentiment Analysis


88. GradientSpace: Unsupervised Data Clustering for Improved Instruction Tuning


89. Towards Small Language Models for Security Query Generation in SOC Workflows


90. GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering


91. Towards Efficient Hypergraph and Multi-LLM Agent Recommender Systems


92. Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks


93. BEACON: A Unified Behavioral-Tactical Framework for Explainable Cybercrime Analysis with Large Language Models


94. A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation


95. Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning


96. Classifying German Language Proficiency Levels Using Large Language Models


97. Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices


98. AgenticCyber: A GenAI-Powered Multi-Agent System for Multimodal Threat Detection and Adaptive Response in Cybersecurity


99. RLAX: Large-Scale, Distributed Reinforcement Learning for Large Language Models on TPUs


100. Protecting Bystander Privacy via Selective Hearing in LALMs


101. When Distance Distracts: Representation Distance Bias in BT-Loss for Reward Models


102. Chemistry Integrated Language Model using Hierarchical Molecular Representation for Polymer Informatics


103. Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models


104. RefBench-PRO: Perceptual and Reasoning Oriented Benchmark for Referring Expression Comprehension


105. Who Will Top the Charts? Multimodal Music Popularity Prediction via Adaptive Fusion of Modality Experts and Temporal Engagement Modeling


106. Convergence of Outputs When Two Large Language Models Interact in a Multi-Agentic Setup


107. DUET: Agentic Design Understanding via Experimentation and Testing


108. Do You Feel Comfortable? Detecting Hidden Conversational Escalation in AI Chatbots


109. Future You: Designing and Evaluating Multimodal AI-generated Digital Twins for Strengthening Future Self-Continuity


110. Explainable Melanoma Diagnosis with Contrastive Learning and LLM-based Report Generation


111. Empathy by Design: Aligning Large Language Models for Healthcare Dialogue


112. Reinforcement Learning Integrated Agentic RAG for Software Test Cases Authoring


113. Auto-SPT: Automating Semantic Preserving Transformations for Code


114. PrefGen: Multimodal Preference Learning for Preference-Conditioned Image Generation


115. KidSpeak: A General Multi-purpose LLM for Kids’ Speech Recognition and Screening


116. FlockVote: LLM-Empowered Agent-Based Modeling for Simulating U.S. Presidential Elections