LLM 관련 주요 논문 - 2026-01-23

1. Structured Hints for Sample-Efficient Lean Theorem Proving


2. LLM Prompt Evaluation for Educational Applications


3. Multimodal Climate Disinformation Detection: Integrating Vision-Language Models with External Knowledge Sources


4. Controlling Long-Horizon Behavior in Language Model Agents with Explicit State Dynamics


5. Grounding Large Language Models in Reaction Knowledge Graphs for Synthesis Retrieval


6. Deja Vu in Plots: Leveraging Cross-Session Evidence with Retrieval-Augmented LLMs for Live Streaming Risk Assessment


7. ErrorMap and ErrorAtlas: Charting the Failure Landscape of Large Language Models


8. Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification


9. VitalDiagnosis: AI-Driven Ecosystem for 24/7 Vital Monitoring and Chronic Disease Management


10. Agentic Confidence Calibration


11. Tabular Incremental Inference


12. AgentSM: Semantic Memory for Agentic Text-to-SQL


13. Improving Methodologies for LLM Evaluations Across Global Languages


14. From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models


15. Predictive Coding and Information Bottleneck for Hallucination Detection in Large Language Models


16. CogToM: A Comprehensive Theory of Mind Benchmark inspired by Human Cognition for Large Language Models


17. Autonomous Business System via Neuro-symbolic AI


18. TransportAgents: a multi-agents LLM framework for traffic accident severity prediction


19. Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge



21. Not Your Typical Sycophant: The Elusive Nature of Sycophancy in Large Language Models


22. Beyond Prompting: Efficient and Robust Contextual Biasing for Speech LLMs via Logit-Space Integration (LOGIC)


23. Prometheus Mind: Retrofitting Memory to Frozen Language Models


24. Replayable Financial Agents: A Determinism-Faithfulness Assurance Harness for Tool-Using LLM Agents


25. The Paradigm Shift: A Comprehensive Survey on Large Vision Language Models for Multimodal Fake News Detection


26. Aeon: High-Performance Neuro-Symbolic Memory Management for Long-Horizon LLM Agents


27. Uncovering Latent Bias in LLM-Based Emergency Department Triage Through Proxy Variables


28. Gated Sparse Attention: Combining Computational Efficiency with Training Stability for Long-Context Language Models


29. LLM-in-Sandbox Elicits General Agentic Intelligence


30. Learning to Discover at Test Time


31. Replicating Human Motivated Reasoning Studies with LLMs


32. Improving Training Efficiency and Reducing Maintenance Costs via Language Specific Model Merging


33. Sawtooth Wavefront Reordering: Enhanced CuTile FlashAttention on NVIDIA GB10


34. PhysicsMind: Sim and Real Mechanics Benchmarking for Physical Reasoning and Prediction in Foundational VLMs and World Models


35. MMGRid: Navigating Temporal-aware and Cross-domain Generative Recommendation via Model Merging


36. TeNet: Text-to-Network for Compact Policy Synthesis


37. Introducing the Generative Application Firewall (GAF)


38. Virtual Traffic Police: Large Language Model-Augmented Traffic Signal Control for Unforeseen Incidents


39. VideoThinker: Building Agentic VideoLLMs with LLM-Guided Tool Reasoning


40. CoNRec: Context-Discerning Negative Recommendation with LLMs


41. FlexLLM: Composable HLS Library for Flexible Hybrid LLM Accelerator Design


42. Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs


43. FARM: Field-Aware Resolution Model for Intelligent Trigger-Action Automation


44. Connect the Dots: Knowledge Graph-Guided Crawler Attack on Retrieval-Augmented Generation Systems


45. TempoNet: Learning Realistic Communication and Timing Patterns for Network Traffic Simulation


46. Event-VStream: Event-Driven Real-Time Understanding for Long Video Streams


47. Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors


48. DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice


49. Data-Free Privacy-Preserving for LLMs via Model Inversion and Selective Unlearning


50. Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow


51. MapViT: A Two-Stage ViT-Based Framework for Real-Time Radio Quality Map Prediction in Dynamic Environments


52. VIOLA: Towards Video In-Context Learning with Minimal Annotations


53. Multi-Persona Thinking for Bias Mitigation in Large Language Models


54. The Rise of Large Language Models and the Direction and Impact of US Federal Research Funding


55. Martingale Foresight Sampling: A Principled Approach to Inference-Time LLM Decoding


56. Benchmarking LLMs for Pairwise Causal Discovery in Biomedical and Multi-Domain Contexts


57. Chunking, Retrieval, and Re-ranking: An Empirical Evaluation of RAG Architectures for Policy Document Question Answering


58. CURE: Curriculum-guided Multi-task Training for Reliable Anatomy Grounded Report Generation


59. Beyond Fixed Psychological Personas: State Beats Trait, but Language Models are State-Blind


60. Improving MoE Compute Efficiency by Composing Weight and Data Sparsity


61. Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing


62. Abusive music and song transformation using GenAI and LLMs


63. Lost in Transcription: How Speech-to-Text Errors Derail Code Understanding


64. ToolCaching: Towards Efficient Caching for LLM Tool-calling


65. No Reliable Evidence of Self-Reported Sentience in Small Large Language Models


66. Empowering LLMs for Structure-Based Drug Design via Exploration-Augmented Latent Inference


67. RECAP: A Resource-Efficient Method for Adversarial Prompting in Large Language Models


68. ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation


69. Large Language Models as Simulative Agents for Neurodivergent Adult Psychometric Profiles


70. Do people expect different behavior from large language models acting on their behalf? Evidence from norm elicitations in two canonical economic games


71. When Generative AI Meets Extended Reality: Enabling Scalable and Natural Interactions


72. Can We Trust LLM Detectors?


73. Entropy-Tree: Tree-Based Decoding with Entropy-Guided Exploration


74. Agentic Persona Control and Task State Tracking for Realistic User Simulation in Interactive Scenarios


75. LLM-based Multimodal Feedback Produces Equivalent Learning and Better Student Perceptions than Educator Feedback


76. Psychometric Comparability of LLM-Based Digital Twins