LLM 관련 주요 논문 - 2026-01-07

1. MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents


2. InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents


3. Rationale-Grounded In-Context Learning for Time Series Reasoning with Multimodal Large Language Models


4. Batch-of-Thought: Cross-Instance Learning for Enhanced LLM Reasoning


5. Logical Phase Transitions: Understanding Collapse in LLM Logical Reasoning


6. ReTreVal: Reasoning Tree with Validation - A Hybrid Framework for Enhanced LLM Multi-Step Reasoning


7. HAL: Inducing Human-likeness in LLMs with Alignment


8. LLM Agent Framework for Intelligent Change Analysis in Urban Environment using Remote Sensing Imagery


9. The Path Ahead for Agentic AI: Challenges and Opportunities


10. Time-Scaling Is What Agents Need Now


11. Learning from Prompt itself: the Hierarchical Attribution Prompt Optimization


12. AWARE-US: Benchmark for Preference-Aware Resolution in Tool-Calling Agents


13. Orchestral AI: A Framework for Agent Orchestration


14. SimpleMem: Efficient Lifelong Memory for LLM Agents


15. Textual Explanations and Their Evaluations for Reinforcement Learning Policy


16. Multi-RADS Synthetic Radiology Report Dataset and Head-to-Head Benchmarking of 41 Open-Weight and Proprietary Language Models


17. The Sonar Moment: Benchmarking Audio-Language Models in Audio Geo-Localization


18. Fine-tuning Small Language Models as Efficient Enterprise Search Relevance Labelers


19. UltraLogic: Enhancing LLM Reasoning through Large-Scale Data Synthesis and Bipolar Float Reward


20. DIP: Dynamic In-Context Planner For Diffusion Language Models


21. AnatomiX, an Anatomy-Aware Grounded Multimodal Large Language Model for Chest X-Ray Interpretation


22. Decentralized Autoregressive Generation


23. Prompt-Counterfactual Explanations for Generative AI System Behavior


24. Self-Verification is All You Need To Pass The Japanese Bar Examination


25. ToxiGAN: Toxic Data Augmentation via LLM-Guided Directional Adversarial Generation


26. Who Laughs with Whom? Disentangling Influential Factors in Humor Preferences across User Clusters and LLMs


27. Text-Guided Layer Fusion Mitigates Hallucination in Multimodal LLMs


28. Grad-ELLM: Gradient-based Explanations for Decoder-only LLMs


29. Joint Encoding of KV-Cache Blocks for Scalable LLM Serving


30. Do LLMs Encode Functional Importance of Reasoning Tokens?


31. Lil: Less is Less When Applying Post-Training Sparse-Attention Algorithms in Long-Decode Stage


32. Dementia-R1: Reinforced Pretraining and Reasoning from Unstructured Clinical Notes for Real-World Dementia Prognosis


33. SentGraph: Hierarchical Sentence Graph for Multi-hop Retrieval-Augmented Question Answering


34. JPU: Bridging Jailbreak Defense and Unlearning via On-Policy Path Rectification


35. Towards Faithful Reasoning in Comics for Small MLLMs


36. Interpretable All-Type Audio Deepfake Detection with Audio LLMs via Frequency-Time Reinforcement Learning


37. Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning


38. MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free


39. The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models


40. SastBench: A Benchmark for Testing Agentic SAST Triage


41. PrismVAU: Prompt-Refined Inference System for Multimodal Video Anomaly Understanding


42. RAL2M: Retrieval Augmented Learning-To-Match Against Hallucination in Compliance-Guaranteed Service Systems


43. TA-Prompting: Enhancing Video Large Language Models for Dense Video Captioning via Temporal Anchors


44. LongBench Pro: A More Realistic and Comprehensive Bilingual Long-Context Evaluation Benchmark


45. TiMem: Temporal-Hierarchical Memory Consolidation for Long-Horizon Conversational Agents


46. Netflix Artwork Personalization via LLM Post-training


47. Window-based Membership Inference Attacks Against Fine-tuned Large Language Models


48. Hypothesize-Then-Verify: Speculative Root Cause Analysis for Microservices with Pathwise Parallelism


49. Agentic Memory Enhanced Recursive Reasoning for Root Cause Localization in Microservices


50. Extracting books from production language models


51. When Do Tools and Planning Help LLMs Think? A Cost- and Latency-Aware Benchmark


52. Prioritized Replay for RL Post-training


53. TAAF: A Trace Abstraction and Analysis Framework Synergizing Knowledge Graphs and LLMs


54. Improved Evidence Extraction for Document Inconsistency Detection with LLMs


55. LAsset: An LLM-assisted Security Asset Identification Framework for System-on-Chip (SoC) Verification


56. Chronicals: A High-Performance Framework for LLM Fine-Tuning with 3.51x Speedup over Unsloth


57. LongDA: Benchmarking LLM Agents for Long-Document Data Analysis


58. FlowPlan-G2P: A Structured Generation Framework for Transforming Scientific Papers into Patent Descriptions


59. Reconstructing Item Characteristic Curves using Fine-Tuned Large Language Models


60. Fact-Checking with Large Language Models via Probabilistic Certainty and Consistency


61. LendNova: Towards Automated Credit Risk Assessment with Language Models


62. AI-exposed jobs deteriorated before ChatGPT


63. ModeX: Evaluator-Free Best-of-N Selection for Open-Ended Generation


64. Enhancing Debugging Skills with AI-Powered Assistance: A Real-Time Tool for Debugging Support


65. GEM-Style Constraints for PEFT with Dual Gradient Projection in LoRA


66. Evaluating the Diagnostic Classification Ability of Multimodal Large Language Models: Insights from the Osteoarthritis Initiative


67. Focus on What Matters: Fisher-Guided Adaptive Multimodal Fusion for Vulnerability Detection


68. WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics


69. A large-scale nanocrystal database with aligned synthesis and properties enabling generative inverse design


70. The Vibe-Check Protocol: Quantifying Cognitive Offloading in AI Programming


71. PCEval: A Benchmark for Evaluating Physical Computing Capabilities of Large Language Models


72. Tree of Preferences for Diversified Recommendation


73. How to Discover Knowledge for FutureG: Contextual RAG and LLM Prompting for O-RAN


74. The Refutability Gap: Challenges in Validating Reasoning by Large Language Models


75. LeafTutor: An AI Agent for Programming Assignment Tutoring


76. Permission Manifests for Web Agents


77. TextBridgeGNN: Pre-training Graph Neural Network for Cross-Domain Recommendation via Text-Guided Transfer


78. Towards Trustworthy LLM-Based Recommendation via Rationale Integration


79. The Impact of LLM-Generated Reviews on Recommender Systems: Textual Shifts, Performance Effects, and Strategic Platform Control


80. TWIST: Training-free and Label-free Short Text Clustering through Iterative Vector Updating with LLMs