LLM 관련 주요 논문 - 2026-04-23

1. Automatic Ontology Construction Using LLMs as an External Layer of Memory, Verification, and Planning for Hybrid Intelligent Systems


2. V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization


3. Where and What: Reasoning Dynamic and Implicit Preferences in Situated Conversational Recommendation


4. Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure


5. CHORUS: An Agentic Framework for Generating Realistic Deliberation Data


6. Self-Guided Plan Extraction for Instruction-Following Tasks with Goal-Conditional Reinforcement Learning


7. Self-Awareness before Action: Mitigating Logical Inertia via Proactive Cognitive Awareness


8. FSFM: A Biologically-Inspired Framework for Selective Forgetting of Agent Memory


9. ActuBench: A Multi-Agent LLM Pipeline for Generation and Evaluation of Actuarial Reasoning Tasks


10. Memory-Augmented LLM-based Multi-Agent System for Automated Feature Generation on Tabular Data


11. Stateless Decision Memory for Enterprise AI Agents


12. HiPO: Hierarchical Preference Optimization for Adaptive Reasoning in LLMs


13. EvoAgent: An Evolvable Agent Framework with Skill Learning and Multi-Agent Delegation


14. From Fuzzy to Formal: Scaling Hospital Quality Improvement with AI


15. Separable Pathways for Causal Reasoning: How Architectural Scaffolding Enables Hypothesis-Space Restructuring in LLM Agents


16. CreativeGame:Toward Mechanic-Aware Creative Game Generation


17. JTPRO: A Joint Tool-Prompt Reflective Optimization Framework for Language Agents


18. Emergence Transformer: Dynamical Temporal Attention Matters


19. Large Language Models Meet Biomedical Knowledge Graphs for Mechanistically Grounded Therapeutic Prioritization


20. MIRROR: A Hierarchical Benchmark for Metacognitive Calibration in Large Language Models


21. The AI Telco Engineer: Toward Autonomous Discovery of Wireless Communications Algorithms


22. Prism: An Evolutionary Memory Substrate for Multi-Agent Open-Ended Discovery


23. SkillGraph: Graph Foundation Priors for LLM Agent Tool Sequence Recommendation


24. OpenCLAW-P2P v6.0: Resilient Multi-Layer Persistence, Live Reference Verification, and Production-Scale Evaluation of Decentralized AI Peer Review


25. Hidden Reliability Risks in Large Language Models: Systematic Identification of Precision-Induced Output Disagreements


26. From Data to Theory: Autonomous Large Language Model Agents for Materials Science


27. From Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM Agents


28. EvoForest: A Novel Machine-Learning Paradigm via Open-Ended Evolution of Computational Graphs


29. ThermoQA: A Three-Tier Benchmark for Evaluating Thermodynamic Reasoning in Large Language Models


30. Explainable AML Triage with LLMs: Evidence Retrieval and Counterfactual Checks


31. AI to Learn 2.0: A Deliverable-Oriented Governance Framework and Maturity Rubric for Opaque AI in Learning-Intensive Domains


32. The Tool-Overuse Illusion: Why Does LLM Prefer External Tools over Internal Knowledge?


33. SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation


34. AVISE: Framework for Evaluating the Security of AI Systems


35. Convergent Evolution: How Different Language Models Learn Similar Number Representations


36. OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model


37. Can “AI” Be a Doctor? A Study of Empathy, Readability, and Alignment in Clinical LLMs


38. Anchor-and-Resume Concession Under Dynamic Pricing for LLM-Augmented Freight Negotiation


39. Supplement Generation Training for Enhancing Agentic Task Performance



41. COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling


42. ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence


43. The Expense of Seeing: Attaining Trustworthy Multimodal Reasoning Within the Monolithic Paradigm


44. GRPO-VPS: Enhancing Group Relative Policy Optimization with Verifiable Process Supervision for Effective Reasoning


45. Trust, Lies, and Long Memories: Emergent Social Dynamics and Reputation in Multi-Round Avalon with LLM Agents


46. LayerTracer: A Joint Task-Particle and Vulnerable-Layer Analysis framework for Arbitrary Large Language Model Architectures


47. Toward Cross-Lingual Quality Classifiers for Multilingual Pretraining Data Selection


48. Enhancing Research Idea Generation through Combinatorial Innovation and Multi-Agent Iterative Search Strategies


49. Evian: Towards Explainable Visual Instruction-tuning Data Auditing


50. Early-Stage Product Line Validation Using LLMs: A Study on Semi-Formal Blueprint Analysis


51. CHASM: Unveiling Covert Advertisements on Chinese Social Media


52. Knowledge Capsules: Structured Nonparametric Memory Units for LLMs


53. MOMO: A framework for seamless physical, verbal, and graphical robot skill learning and adaptation


54. DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories


55. CyberCertBench: Evaluating LLMs in Cybersecurity Certification Knowledge


56. AI models of unstable flow exhibit hallucination


57. Bimanual Robot Manipulation via Multi-Agent In-Context Learning


58. Surrogate modeling for interpreting black-box LLMs in medical predictions


59. Image Generators are Generalist Vision Learners


60. LLM-guided phase diagram construction through high-throughput experimentation


61. Text Steganography with Dynamic Codebook and Multimodal Large Language Model


62. ATIR: Towards Audio-Text Interleaved Contextual Retrieval


63. Hybrid Policy Distillation for LLMs


64. From Scene to Object: Text-Guided Dual-Gaze Prediction


65. Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning


66. Meta-Tool: Efficient Few-Shot Tool Adaptation for Small Language Models


67. Information Aggregation with AI Agents


68. TriEx: A Game-based Tri-View Framework for Explaining Internal Reasoning in Multi-Agent LLMs


69. Statistics, Not Scale: Modular Medical Dialogue with Bayesian Belief Engine


70. EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training


71. Bias in the Tails: How Name-conditioned Evaluative Framing in Resume Summaries Destabilizes LLM-based Hiring


72. Semantic Prompting: Agentic Incremental Narrative Refinement through Spatial Semantic Interaction


73. DistortBench: Benchmarking Vision Language Models on Image Distortion Identification


74. Infection-Reasoner: A Compact Vision-Language Model for Wound Infection Classification with Evidence-Grounded Clinical Reasoning


75. Behavioral Transfer in AI Agents: Evidence and Privacy Implications


76. MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings


77. Depression Risk Assessment in Social Media via Large Language Models


78. From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization


79. DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data


80. ChipCraftBrain: Validation-First RTL Generation via Multi-Agent Orchestration


81. If you’re waiting for a sign… that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems


82. Environmental Understanding Vision-Language Model for Embodied Agent


83. Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts


84. SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution


85. Measuring Creativity in the Age of Generative AI: Distinguishing Human and AI-Generated Creative Performance in Hiring and Talent Systems


86. Enhancing ASR Performance in the Medical Domain for Dravidian Languages


87. LLM Agents Predict Social Media Reactions but Do Not Outperform Text Classifiers: Benchmarking Simulation Accuracy Using 120K+ Personas of 1511 Humans


88. Can LLMs Infer Conversational Agent Users’ Personality Traits from Chat History?


89. KoALa-Bench: Evaluating Large Audio Language Models on Korean Speech Understanding and Faithfulness


90. Do Small Language Models Know When They’re Wrong? Confidence-Based Cascade Scoring for Educational Assessment


91. Self-Describing Structured Data with Dual-Layer Guidance: A Lightweight Alternative to RAG for Precision Retrieval in Large-Scale LLM Knowledge Navigation


92. Phase 1 Implementation of LLM-generated Discharge Summaries showing high Adoption in a Dutch Academic Hospital


93. PR-CAD: Progressive Refinement for Unified Controllable and Faithful Text-to-CAD Generation with Large Language Models


94. CoAuthorAI: A Human in the Loop System For Scientific Book Writing


95. Cognis: Context-Aware Memory for Conversational AI Agents


96. TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference


97. Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models


98. Accelerating PayPal’s Commerce Agent with Speculative Decoding: An Empirical Study on EAGLE3 with Fine-Tuned Nemotron Models


99. OThink-SRR1: Search, Refine and Reasoning with Reinforced Learning for Large Language Models


100. Do Hallucination Neurons Generalize? Evidence from Cross-Domain Transfer in LLMs


101. Can We Locate and Prevent Stereotypes in LLMs?


102. Transparent Screening for LLM Inference and Training Impacts


103. WorkflowGen:an adaptive workflow generation mechanism driven by trajectory experience


104. Soft-Label Governance for Distributional Safety in Multi-Agent Systems


105. Coding with Eyes: Visual Feedback Unlocks Reliable GUI Code Generating and Debugging


106. AutoGraph-R1: End-to-End Reinforcement Learning for Knowledge Graph Construction