LLM 관련 주요 논문 - 2026-01-28

1. Why Keep Your Doubts to Yourself? Trading Visual Uncertainties in Multi-Agent Bandit Systems


2. Health-SCORE: Towards Scalable Rubrics for Improving Health-LLMs


3. FadeMem: Biologically-Inspired Forgetting for Efficient Agent Memory


4. AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning


5. Assessing the Quality of Mental Health Support in LLM Responses through Multi-Attribute Human Evaluation


6. A Balanced Neuro-Symbolic Approach for Commonsense Abductive Logic


7. Stability as a Liability:Systematic Breakdown of Linguistic Structure in LLMs


8. Deconstructing Instruction-Following: A New Benchmark for Granular Evaluation of Large Language Model Instruction Compliance Abilities


9. AI Agent for Reverse-Engineering Legacy Finite-Difference Code and Translating to Devito


10. A Generative AI-Driven Reliability Layer for Action-Oriented Disaster Resilience


11. Think-Augmented Function Calling: Improving LLM Parameter Accuracy Through Embedded Reasoning


12. ShopSimulator: Evaluating and Exploring RL-Driven LLM Agent for Shopping Assistants


13. Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents


14. GAIA: A Data Flywheel System for Training GUI Test-Time Scaling Critic Models


15. DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints


16. RareAlert: Aligning heterogeneous large language model reasoning for early rare disease risk screening


17. RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-Agents


18. EvolVE: Evolutionary Search for LLM-based Verilog Generation and Optimization


19. Expert Evaluation and the Limits of Human Feedback in Mental Health AI Safety Testing


20. Sentipolis: Emotion-Aware Agents for Social Simulations


21. LLM-Based SQL Generation: Prompting, Self-Refinement, and Adaptive Weighted Majority Voting


22. Think Locally, Explain Globally: Graph-Guided LLM Investigations via Local Reasoning and Belief Propagation


23. UniCog: Uncovering Cognitive Abilities of LLMs through Latent Mind Space Analysis


24. When Personalization Legitimizes Risks: Uncovering Safety Vulnerabilities in Personalized Dialogue Agents


25. MMR-Bench: A Comprehensive Benchmark for Multimodal LLM Routing


26. Neuro-Symbolic Verification on Instruction Following of LLMs


27. ReFuGe: Feature Generation for Prediction Tasks on Relational Databases with LLM Agents


28. EntWorld: A Holistic Environment and Benchmark for Verifiable Enterprise GUI Agents


29. The LLM Data Auditor: A Metric-oriented Survey on Quality and Trustworthiness in Evaluating Synthetic Data


30. SQL-Trail: Multi-Turn Reinforcement Learning with Interleaved Feedback for Text-to-SQL


31. Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context


32. Intelligence Requires Grounding But Not Embodiment


33. A Syllogistic Probe: Tracing the Evolution of Logic Reasoning in Large Language Models


34. Auditing Disability Representation in Vision-Language Models


35. Multi-Agent Learning Path Planning via LLMs


36. Are We Evaluating the Edit Locality of LLM Model Editing Properly?


37. Phase Transition for Budgeted Multi-Agent Synergy


38. Interpreting Agentic Systems: Beyond Model Explanations to System-Level Accountability


39. ctELM: Decoding and Manipulating Embeddings of Clinical Trials with Embedding Language Models


40. Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes


41. Design Techniques for LLM-Powered Interactive Storytelling: A Case Study of the Dramamancer System


42. POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration


43. PRECISE: Reducing the Bias of LLM Evaluations Using Prediction-Powered Ranking Estimation


44. Dep-Search: Learning Dependency-Aware Reasoning Traces with Persistent Memory


45. $α^3$-SecBench: A Large-Scale Evaluation Suite of Security, Resilience, and Trust for LLM-based UAV Agents over 6G Networks


46. HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs


47. Advances and Innovations in the Multi-Agent Robotic System (MARS) Challenge


48. One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment


49. From Fuzzy to Exact: The Halo Architecture for Infinite-Depth Reasoning via Rational Arithmetic


50. FastInsight: Fast and Insightful Retrieval via Fusion Operators for Graph RAG


51. Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates


52. Funny or Persuasive, but Not Both: Evaluating Fine-Grained Multi-Concept Control in LLMs


53. daVinci-Dev: Agent-native Mid-training for Software Engineering


54. When Domain Pretraining Interferes with Instruction Alignment: An Empirical Study of Adapter Merging in Medical LLMs


55. MultiVis-Agent: A Multi-Agent Framework with Logic Rules for Reliable and Comprehensive Cross-Modal Data Visualization


56. Calibrating Beyond English: Language Diversity for Better Quantized Multilingual LLM


57. TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment


58. Beyond Retention: Orchestrating Structural Safety and Plasticity in Continual Learning for LLMs


59. BoRP: Bootstrapped Regression Probing for Scalable and Human-Aligned LLM Evaluation


60. TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance


61. PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR


62. Typhoon-S: Minimal Open Post-Training for Sovereign Large Language Models


63. MalURLBench: A Benchmark Evaluating Agents’ Vulnerabilities When Processing Web URLs


64. Mitigating the OWASP Top 10 For Large Language Models Applications using Intelligent Agents


65. LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts


66. Addressing LLM Diversity by Infusing Random Concepts


67. A System for Name and Address Parsing with Large Language Models


68. Evaluating Semantic and Syntactic Understanding in Large Language Models for Payroll Systems


69. SD-E$^2$: Semantic Exploration for Reasoning Under Token Budgets


70. A Monosemantic Attribution Framework for Stable Interpretability in Clinical Neuroscience Large Language Models


71. treaming-dLLM: Accelerating Diffusion LLMs via Suffix Pruning and Dynamic Decoding


72. VidLaDA: Bidirectional Diffusion Large Language Models for Efficient Video Understanding


73. MergeMix: Optimizing Mid-Training Data Mixtures via Learnable Model Merging


74. RAICL: Retrieval-Augmented In-Context Learning for Vision-Language-Model Based EEG Seizure Detection


75. DPI: Exploiting Parameter Heterogeneity for Interference-Free Fine-Tuning


76. Context-Aware Iterative Token Detection and Masked Transmission for Wireless Token Communication


77. LLM-42: Enabling Determinism in LLM Inference with Verified Speculation


78. Cross-Lingual Probing and Community-Grounded Analysis of Gender Bias in Low-Resource Bengali


79. Athanor: Authoring Action Modification-based Interactions on Static Visualizations via Natural Language


80. Segment Length Matters: A Study of Segment Lengths on Audio Fingerprinting Performance


81. Agentic reinforcement learning empowers next-generation chemical language models for molecular design and synthesis


82. A Model-Driven Lossless Compression Algorithm Resistant to Mismatch


83. Grammar-Aware Literate Generative Mathematical Programming with Compiler-in-the-Loop


84. UrduLM: A Resource-Efficient Monolingual Urdu Language Model


85. Human-Aligned Enhancement of Programming Answers with LLMs Guided by User Feedback


86. Prompt Driven Development with Claude Code: Building a Complete TUI Framework for the Ring Programming Language


87. Status Hierarchies in Language Models


88. Improving User Privacy in Personalized Generation: Client-Side Retrieval-Augmented Modification of Server-Side Generated Speculations


89. Real-Time Trend Prediction via Continually-Aligned LLM Query Generation


90. Breaking the Protocol: Security Analysis of the Model Context Protocol Specification and Prompt Injection Vulnerabilities in Tool-Integrated LLM Agents


91. Reconstructing Training Data from Adapter-based Federated Large Language Models


92. Less is More for RAG: Information Gain Pruning for Generator-Aligned Reranking and Evidence Selection


93. Bridging Expectation Signals: LLM-Based Experiments and a Behavioral Kalman Filter Framework


94. PEARL: Prototype-Enhanced Alignment for Label-Efficient Representation Learning with Deployment-Driven Insights from Digital Governance Communication Systems


95. Unintended Memorization of Sensitive Information in Fine-Tuned Language Models


96. Clustering-driven Memory Compression for On-device Large Language Models


97. Data-driven Clustering and Merging of Adapters for On-device Large Language Models


98. Towards a Declarative Agentic Layer for Intelligent Agents in MCP-Based Server Ecosystems


99. The 17% Gap: Quantifying Epistemic Decay in AI-Assisted Survey Papers


100. ReLE: A Scalable System and Structured Benchmark for Diagnosing Capability Anisotropy in Chinese LLMs


101. Physical Prompt Injection Attacks on Large Vision-Language Models


102. Prompt and Circumstances: Evaluating the Efficacy of Human Prompt Inference in AI-Generated Art


103. Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers



105. Spectral Geometry for Deep Learning: Compression and Hallucination Detection via Random Matrix Theory


106. Conformal Feedback Alignment: Quantifying Answer-Level Reliability for Robust LLM Alignment


107. Meta-Judging with Large Language Models: Concepts, Methods, and Challenges


108. Mind the Ambiguity: Aleatoric Uncertainty Quantification in LLMs for Safe Medical Question Answering


109. On the Insecurity of Keystroke-Based AI Authorship Detection: Timing-Forgery Attacks Against Motor-Signal Verification


110. Latent-Space Contrastive Reinforcement Learning for Stable and Efficient LLM Reasoning


111. Retell, Reward, Repeat: Reinforcement Learning for Narrative Theory-Informed Story Generation


112. Beyond Outcome Verification: Verifiable Process Reward Models for Structured Reasoning


113. High-Rate Quantized Matrix Multiplication: Theory and Practice


114. TrojanGYM: A Detector-in-the-Loop LLM for Adaptive RTL Hardware Trojan Insertion


115. Beyond Factual QA: Mentorship-Oriented Question Answering over Long-Form Multilingual Content


116. Who Gets Which Message? Auditing Demographic Bias in LLM-Generated Targeted Text


117. Dynamic Role Assignment for Multi-Agent Debate


118. Learning to Collaborate: An Orchestrated-Decentralized Framework for Peer-to-Peer LLM Federation


119. Authority Signals in AI Cited Health Sources: A Framework for Evaluating Source Credibility in ChatGPT Responses


120. Beyond Instrumental and Substitutive Paradigms: Introducing Machine Culture as an Emergent Phenomenon in Large Language Models


121. Boltzmann-GPT: Bridging Energy-Based World Models and Language Generation


122. The Triangle of Similarity: A Multi-Faceted Framework for Comparing Neural Network Representations


123. Lost in Simulation: LLM-Simulated Users are Unreliable Proxies for Human Users in Agentic Evaluations


124. SonoEdit: Null-Space Constrained Knowledge Editing for Pronunciation Correction in LLM-Based TTS


125. ChemNavigator: Agentic AI Discovery of Design Rules for Organic Photocatalysts


126. Do VLMs Have a Moral Backbone? A Study on the Fragile Morality of Vision-Language Models


127. ThinkTank-ME: A Multi-Expert Framework for Middle East Event Forecasting


128. FlashMoE: Reducing SSD I/O Bottlenecks via ML-Based Cache Replacement for Mixture-of-Experts Inference on Edge Devices


129. Initial results of the Digital Consciousness Model


130. Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs


131. Single-Pixel Vision-Language Model for Intrinsic Privacy-Preserving Behavioral Intelligence


132. AMVICC: A Novel Benchmark for Cross-Modal Failure Mode Profiling for VLMs and IGMs


133. Measuring Political Stance and Consistency in Large Language Models


134. MathMixup: Boosting LLM Mathematical Reasoning with Difficulty-Controllable Data Synthesis and Curriculum Learning


135. BibAgent: An Agentic Framework for Traceable Miscitation Detection in Scientific Literature


136. Sparsity-Aware Low-Rank Representation for Efficient Fine-Tuning of Large Language Models


137. Evaluating Reward Model Generalization via Pairwise Maximum Discrepancy Competitions


138. Crystal-KV: Efficient KV Cache Management for Chain-of-Thought LLMs via Answer-First Principle


139. TelcoAI: Advancing 3GPP Technical Specification Search through Agentic Multi-Modal Retrieval-Augmented Generation