LLM 관련 주요 논문 - 2026-05-01

1. LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis


2. What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design


3. Characterizing the Consistency of the Emergent Misalignment Persona


4. RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses


5. Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents


6. SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images


7. Exploring Interaction Paradigms for LLM Agents in Scientific Visualization


8. D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery


9. From LLM-Driven Trading Card Generation to Procedural Relatedness: A Pokémon Case Study


10. Language Models Refine Mechanical Linkage Designs Through Symbolic Reflection and Modular Optimisation


11. LLMs as ASP Programmers: Self-Correction Enables Task-Agnostic Nonmonotonic Reasoning


12. The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models


13. Taming the Centaur(s) with LAPITHS: a framework for a theoretically grounded interpretation of AI performances


14. In-Context Prompting Obsoletes Agent Orchestration for Procedural Tasks


15. Modeling Clinical Concern Trajectories in Language Model Agents


16. KellyBench: A Benchmark for Long-Horizon Sequential Decision Making


17. Rethinking Agentic Reinforcement Learning In Large Language Models


18. ObjectGraph: From Document Injection to Knowledge Traversal – A Native File Format for the Agentic Era


19. Intent2Tx: Benchmarking LLMs for Translating Natural Language Intents into Ethereum Transactions


20. Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering


21. Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation


22. Knowledge Graph Representations for LLM-Based Policy Compliance Reasoning


23. Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents


24. When Agents Evolve, Institutions Follow


25. The TEA Nets framework combines AI and cognitive network science to model targets, events and actors in text


26. From Context to Skills: Can Language Models Learn from Context Skillfully?


27. Optimization before Evaluation: Evaluation with Unoptimised Prompts Can be Misleading


28. Political Bias Audits of LLMs Capture Sycophancy to the Inferred Auditor


29. WaferSAGE: Large Language Model-Powered Wafer Defect Analysis via Synthetic Data Generation and Rubric-Guided Reinforcement Learning


30. Math Education Digital Shadows for facilitating learning with LLMs: Math performance, anxiety and confidence in simulated students and AIs


31. Trace-Level Analysis of Information Contamination in Multi-Agent Systems


32. SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation


33. In-Context Examples Suppress Scientific Knowledge Recall in LLMs


34. Belief-Guided Inference Control for Large Language Model Services via Verifiable Observations


35. InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?


36. Safe Bilevel Delegation (SBD): A Formal Framework for Runtime Delegation Safety in Multi-Agent Systems


37. Heterogeneous Scientific Foundation Model Collaboration


38. METASYMBO: Multi-Agent Language-Guided Metamaterial Discovery via Symbolic Latent Evolution


39. OptimusKG: Unifying biomedical knowledge in a modern multimodal graph


40. AutoSurfer – Teaching Web Agents through Comprehensive Surfing, Learning, and Modeling


41. Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents


42. When Roles Fail: Epistemic Constraints on Advocate Role Fidelity in LLM-Based Political Statement Analysis


43. Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction


44. TRUST: A Framework for Decentralized AI Service v.0.1


45. Think it, Run it: Autonomous ML pipeline generation via self-healing multi-agent AI


46. End-to-end autonomous scientific discovery on a real optical platform


47. When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems


48. PhyCo: Learning Controllable Physical Priors for Generative Motion


49. Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows


50. Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes


51. Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection



53. TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering


54. Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling


55. Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding


56. Design Structure Matrix Modularization with Large Language Models


57. TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions


58. From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation


59. Can AI Be a Good Peer Reviewer? A Survey of Peer Review Process, Evaluation, and the Future


60. Beyond Semantics: Measuring Fine-Grained Emotion Preservation in Small Language Model-Based Machine Translation


61. CastFlow: Learning Role-Specialized Agentic Workflows for Time Series Forecasting


62. Test Before You Deploy: Governing Updates in the LLM Supply Chain


63. RuC: HDL-Agnostic Rule Completion Benchmark Generation


64. Instruction-Guided Poetry Generation in Arabic and Its Dialects


65. Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation


66. AgentEconomist: An End-to-end Agentic System Translating Economic Intuitions into Executable Computational Experiments


67. ANCORA: Learning to Question via Manifold-Anchored Self-Play for Verifiable Reasoning


68. HAVEN: Hybrid Automated Verification ENgine for UVM Testbench Synthesis with LLMs


69. Mapping how LLMs debate societal issues when shadowing human personality traits, sociodemographics and social media behavior


70. APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and Interpretation


71. Debiasing Reward Models via Causally Motivated Inference-Time Intervention


72. Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study


73. Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors


74. Beyond the Mean: Within-Model Reliable Change Detection for LLM Evaluation


75. COHERENCE: Benchmarking Fine-Grained Image-Text Alignment in Interleaved Multimodal Contexts


76. Toward Autonomous SOC Operations: End-to-End LLM Framework for Threat Detection, Query Generation, and Resolution in Security Operations


77. Pragmos: A Process Agentic Modeling System


78. Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents


79. Evaluating Epistemic Guardrails in AI Reading Assistants: A Behavioral Audit of a Minimal Prototype


80. When 2D Tasks Meet 1D Serialization: On Serialization Friction in Structured Tasks


81. From Prompt to Physical Actuation: Holistic Threat Modeling of LLM-Enabled Robotic Systems


82. Self-Evolving Software Agents


83. Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models


84. Instruction Complexity Induces Positional Collapse in Adversarial LLM Evaluation


85. Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves


86. Path-Lock Expert: Separating Reasoning Mode in Hybrid Thinking via Architecture-Level Separation


87. What Suppresses Nash Equilibrium Play in Large Language Models? Mechanistic Evidence and Causal Control


88. Enhancing Linux Privilege Escalation Attack Capabilities of Local LLM Agents


89. Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations


90. Efficient Training on Multiple Consumer GPUs with RoundPipe


91. Detecting Clinical Discrepancies in Health Coaching Agents: A Dual-Stream Memory and Reconciliation Architecture


92. Automatic Causal Fairness Analysis with LLM-Generated Reporting


93. Beyond Accuracy: LLM Variability in Evidence Screening for Software Engineering SLRs


94. When Continual Learning Moves to Memory: A Study of Experience Reuse in LLM Agents


95. AgenticRecTune: Multi-Agent with Self-Evolving Skillhub for Recommendation System Optimization


96. DeepTutor: Towards Agentic Personalized Tutoring


97. Static Program Slicing Using Language Models With Dataflow-Aware Pretraining and Constrained Decoding


98. LLM Biases


99. CareGuardAI: Context-Aware Multi-Agent Guardrails for Clinical Safety & Hallucination Mitigation in Patient-Facing LLMs


100. Simulating Validity: Modal Decoupling in MLLM Generated Feedback on Science Drawings


101. Policy-Governed LLM Routing with Intent Matching for Instrument Laboratories


102. The Impact of LLM Self-Consistency and Reasoning Effort on Automated Scoring Accuracy and Cost


103. Agentic Compilation: Mitigating the LLM Rerun Crisis for Minimized-Inference-Cost Web Automation