LLM 관련 주요 논문 - 2026-01-22

1. Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning


2. How to Build AI Agents by Augmenting LLMs with Codified Human Expert Domain Knowledge? A Software Engineering Framework


3. The Plausibility Trap: Using Probabilistic Engines for Deterministic Tasks


4. The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution


5. Measuring and Aligning Abstraction in Vision-Language Models with Medical Taxonomies


6. CI4A: Semantic Component Interfaces for Agents Empowering Web Automation


7. DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs


8. AutoDriDM: An Explainable Benchmark for Decision-Making of Vision-Language Models in Autonomous Driving


9. Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation


10. IB-GRPO: Aligning LLM-based Learning Path Recommendation with Educational Objectives via Indicator-Based Group Relative Policy Optimization


11. Local Language Models for Context-Aware Adaptive Anonymization of Sensitive Text


12. Query-Efficient Agentic Graph Extraction Attacks on GraphRAG Systems


13. Large Language Model-Powered Evolutionary Code Optimization on a Phylogenetic Tree


14. On the Generalization Gap in LLM Planning: Tests and Verifier-Reward RL


15. VisTIRA: Closing the Image-Text Modality Gap in Visual Math Reasoning via Structured Tool Integration


16. Epistemic Constitutionalism Or: how to avoid coherence bias


17. Iterative Refinement Improves Compositional Image Generation


18. MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs



20. Deaf and Hard of Hearing Access to Intelligent Personal Assistants: Comparison of Voice-Based Options with an LLM-Powered Touch Interface


21. Benchmarking Large Language Models for ABAP Code Generation: An Empirical Study on Iterative Improvement by Compiler Feedback


22. The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models


23. Automated Rubrics for Reliable Evaluation of Medical Dialogue Systems


24. Outcome-Based RL Provably Leads Transformers to Reason, but Only With the Right Data


25. Auditing Language Model Unlearning via Information Decomposition


26. Multi-Agent Constraint Factorization Reveals Latent Invariant Solution Structure


27. Knowledge Restoration-driven Prompt Optimization: Unlocking LLM Potential for Open-Domain Relational Triplet Extraction


28. Visual and Cognitive Demands of a Large Language Model-Powered In-vehicle Conversational Agent


29. Obscuring Data Contamination Through Translation: Evidence from Arabic Corpora


30. InstructTime++: Time Series Classification with Multimodal Language Modeling via Implicit Feature Enhancement


31. A Comprehensive Benchmark of Language Models on Unicode and Romanized Sinhala


32. CorpusQA: A 10 Million Token Benchmark for Corpus-Level Analysis and Reasoning


33. Vision-Language Models on the Edge for Real-Time Robotic Perception


34. What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study


35. RECAP: Resistance Capture in Text-based Mental Health Counseling with Large Language Models


36. Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models


37. AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering


38. HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding


39. PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning


40. HCVR Scene Generation: High Compatibility Virtual Reality Environment Generation for Extended Redirected Walking


41. INFA-Guard: Mitigating Malicious Propagation via Infection-Aware Safeguarding in LLM-Based Multi-Agent Systems


42. NeuroFilter: Privacy Guardrails for Conversational LLM Agents


43. Say Anything but This: When Tokenizer Betrays Reasoning in LLMs


44. Forest-Chat: Adapting Vision-Language Agents for Interactive Forest Change Analysis


45. Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models


46. HELIOS: Hierarchical Graph Abstraction for Structure-Aware LLM Decompilation


47. IntelliSA: An Intelligent Static Analyzer for IaC Security Smell Detection Using Symbolic Rules and Neural Inference


48. Self-Blinding and Counterfactual Self-Simulation Mitigate Biases and Sycophancy in Large Language Models


49. Report for NSF Workshop on AI for Electronic Design Automation


50. Towards Execution-Grounded Automated AI Research


51. GutenOCR: A Grounded Vision-Language Front-End for Documents


52. Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering


53. Diffusion Large Language Models for Black-Box Optimization


54. Agentic AI Meets Edge Computing in Autonomous UAV Swarms


55. Measuring the State of Open Science in Transportation Using Large Language Models


56. CityCube: Benchmarking Cross-view Spatial Reasoning on Vision-Language Models in Urban Environments


57. Layer-adaptive Expert Pruning for Pre-Training of Mixture-of-Experts Large Language Models


58. Tracing the Data Trail: A Survey of Data Provenance, Transparency and Traceability in LLMs


59. CORVUS: Red-Teaming Hallucination Detectors via Internal Signal Camouflage in Large Language Models


60. Guardrails for trust, safety, and ethical development and deployment of Large Language Models (LLM)


61. RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension


62. DeepInflation: an AI agent for research and model discovery of inflation


63. Hallucination-Free Automatic Question & Answer Generation for Intuitive Learning


64. Opening the Black Box: A Survey on the Mechanisms of Multi-Step Reasoning in Large Language Models


65. The Slow Drift of Support: Boundary Failures in Multi-Turn Mental Health LLM Dialogues


66. Developmental trajectories of decision making and affective dynamics in large language models


67. From Textbook to Talkbot: A Case Study of a Greek-Language RAG-Based Chatbot in Higher Education


68. Call2Instruct: Automated Pipeline for Generating Q&A Datasets from Call Center Recordings for LLM Fine-Tuning