LLM 관련 주요 논문 - 2026-01-02

1. Context-aware LLM-based AI Agents for Human-centered Energy Management Systems in Smart Buildings


2. AMAP Agentic Planning Technical Report


3. Iterative Deployment Improves Planning Skills in LLMs


4. GenZ: Foundational models as latent variable generators within traditional statistical models


5. BatteryAgent: Synergizing Physics-Informed Interpretation with LLM Reasoning for Intelligent Battery Fault Diagnosis


6. Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization


7. Group Deliberation Oriented Multi-Agent Conversational Model for Complex Reasoning


8. Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization


9. Recursive Language Models


10. MCPAgentBench: A Real-world Task Benchmark for Evaluating LLM Agent MCP Tool Use


11. From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning


12. Evaluating the Reasoning Abilities of LLMs on Underrepresented Mathematics Competition Problems


13. Align While Search: Belief-Guided Exploratory Inference for World-Grounded Embodied Agents


14. Constrained Language Model Policy Optimization via Risk-aware Stepwise Alignment


15. Graph-Based Exploration for ARC-AGI-3 Interactive Reasoning Tasks


16. CogRec: A Cognitive Recommender Agent Fusing Large Language Models and Soar for Explainable Recommendation


17. LoongFlow: Directed Evolutionary Search via a Cognitive Plan-Execute-Summarize Paradigm


18. ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment


19. SPARK: Search Personalization via Agent-Driven Retrieval and Knowledge-sharing


20. A Proof-of-Concept for Explainable Disease Diagnosis Using Large Language Models and Answer Set Programming


21. CASCADE: Cumulative Agentic Skill Creation through Autonomous Development and Evolution


22. The Drill-Down and Fabricate Test (DDFT): A Protocol for Measuring Epistemic Robustness in Language Models



24. Modeling Language as a Sequence of Thoughts


25. DarkEQA: Benchmarking Vision-Language Models for Embodied Question Answering in Low-Light Indoor Environments


26. The Impact of LLMs on Online News Consumption and Production


27. RAIR: A Rule-Aware Benchmark Uniting Challenging Long-Tail and Visual Salience Subset for E-commerce Relevance Assessment


28. Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements


29. Video and Language Alignment in 2D Systems for 3D Multi-object Scenes with Multi-Information Derivative-Free Control


30. LeanCat: A Benchmark Suite for Formal Category Theory in Lean (Part I: 1-Categories)


31. AstroReview: An LLM-driven Multi-Agent Framework for Telescope Proposal Peer Review and Refinement


32. LSRE: Latent Semantic Rule Encoding for Real-Time Semantic Risk Detection in Autonomous Driving


33. Nested Learning: The Illusion of Deep Learning Architectures


34. R-Debater: Retrieval-Augmented Debate Generation through Argumentative Memory


35. Do Large Language Models Know What They Are Capable Of?


36. DynaFix: Iterative Automated Program Repair Driven by Execution-Level Dynamic Information


37. Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space


38. Chat-Driven Optimal Management for Virtual Network Services


39. Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time


40. SynRAG: A Large Language Model Framework for Executable Query Generation in Heterogeneous SIEM System


41. Localized Calibrated Uncertainty in Code Language Models


42. More Than Bits: Multi-Envelope Double Binary Factorization for Extreme Quantization


43. Generative AI-enhanced Sector-based Investment Portfolio Construction


44. Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice


45. HOLOGRAPH: Active Causal Discovery via Sheaf-Theoretic Alignment of Large Language Model Priors


46. Foundation models on the bridge: Semantic hazard detection and safety maneuvers for maritime autonomy with vision-language models


47. PackKV: Reducing KV Cache Memory Footprint through LLM-Aware Lossy Compression


48. Comparing Approaches to Automatic Summarization in Less-Resourced Languages


49. Taming Hallucinations: Boosting MLLMs’ Video Understanding via Counterfactual Video Generation


50. Unified Embodied VLM Reasoning with Robotic Action via Autoregressive Discretized Pre-training


51. OptRot: Mitigating Weight Outliers via Data-Free Rotations for Post-Training Quantization


52. Enhancing LLM-Based Neural Network Generation: Few-Shot Prompting and Efficient Validation for Automated Architecture Design


53. Enhancing LLM Planning Capabilities through Intrinsic Self-Critique


54. Factorized Learning for Temporally Grounded Video-Language Models


55. Beyond Hallucinations: A Composite Score for Measuring Reliability in Open-Source Large Language Models


56. AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives


57. Jailbreaking Attacks vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race?


58. RSAgent: Learning to Reason and Act for Text-Guided Segmentation via Multi-Turn Tool Invocations


59. FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing


60. iCLP: Large Language Model Reasoning with Implicit Cognition Latent Planning


61. Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process


62. Coding With AI: From a Reflection on Industrial Practices to Future Computer Science and Software Engineering Education


63. Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling


64. How Large Language Models Systematically Misrepresent American Climate Opinions


65. Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack


66. Probing the Limits of Compressive Memory: A Study of Infini-Attention in Small-Scale Pretraining


67. From Correctness to Collaboration: Toward a Human-Centered Framework for Evaluating AI Agent Behavior in Software Engineering


68. Adversarial Lens: Exploiting Attention Layers to Generate Adversarial Examples for Evaluation


69. Retrieval Augmented Question Answering: When Should LLMs Admit Ignorance?


70. Improved Bounds for Private and Robust Alignment


71. StressRoBERTa: Cross-Condition Transfer Learning from Depression, Anxiety, and PTSD to Stress Detection


72. Prompt-Induced Over-Generation as Denial-of-Service: A Black-Box Attack-Side Benchmark


73. Entropy-Aware Speculative Decoding Toward Improved LLM Reasoning


74. Audited Skill-Graph Self-Improvement for Agentic LLMs via Verifiable Rewards, Experience Synthesis, and Continual Memory


75. Geometric Scaling of Bayesian Inference in LLMs


76. State-of-the-art Small Language Coder Model: Mify-Coder


77. Hybrid-Code: A Privacy-Preserving, Redundant Multi-Agent Framework for Reliable Local Clinical Coding


78. AgenticTCAD: A LLM-based Multi-Agent Framework for Automated TCAD Code Generation and Device Optimization


79. Break Out the Silverware – Semantic Understanding of Stored Household Items


80. Enforcing Temporal Constraints for LLM Agents


81. A Survey of AI Methods for Geometry Preparation and Mesh Generation in Engineering Simulation


82. HarmTransform: Transforming Explicit Harmful Queries into Stealthy via Multi-Agent Debate


83. PyBangla at BLP-2025 Task 2: Enhancing Bangla-to-Python Code Generation with Iterative Self-Correction and Multilingual Agents


84. STED and Consistency Scoring: A Framework for Evaluating LLM Structured Output Reliability


85. Enriching Historical Records: An OCR and AI-Driven Approach for Database Integration