LLM 관련 주요 논문 - 2026-04-09

1. How Much LLM Does a Self-Revising Agent Actually Need?


2. Reason in Chains, Learn in Trees: Self-Rectification and Grafting for Multi-turn Agent Policy Optimization


3. EVGeoQA: Benchmarking LLMs on Dynamic, Multi-Objective Geo-Spatial Exploration


4. EmoMAS: Emotion-Aware Multi-Agent System for High-Stakes Edge-Deployable Negotiation with Bayesian Orchestration


5. What’s Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI Reasoning


6. Beyond Surface Judgments: Human-Grounded Risk Evaluation of LLM-Generated Disinformation


7. TurboAgent: An LLM-Driven Autonomous Multi-Agent Framework for Turbomachinery Aerodynamic Design


8. Steering the Verifiability of Multimodal AI Hallucinations


9. ATANT: An Evaluation Framework for AI Continuity


10. Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability


11. On Emotion-Sensitive Decision Making of Small Language Model Agents


12. ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning


13. Qualixar OS: A Universal Operating System for AI Agent Orchestration


14. SELFDOUBT: Uncertainty Quantification for Reasoning LLMs via the Hedge-to-Verify Ratio


15. SymptomWise: A Deterministic Reasoning Layer for Reliable and Efficient AI Systems


16. Weakly Supervised Distillation of Hallucination Signals into Transformer Representations


17. Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules


18. Syntax Is Easy, Semantics Is Hard: Evaluating LLMs for LTL Translation


19. Evaluating In-Context Translation with Synchronous Context-Free Grammar Transduction


20. Chatbot-Based Assessment of Code Understanding in Automated Programming Assessment Systems


21. A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering


22. Validated Intent Compilation for Constrained Routing in LEO Mega-Constellations


23. TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories


24. The ATOM Report: Measuring the Open Language Model Ecosystem


25. Dynamic Context Evolution for Scalable Synthetic Data Generation


26. The Impact of Steering Large Language Models with Persona Vectors in Educational Applications


27. SurFITR: A Dataset for Surveillance Image Forgery Detection and Localisation


28. STRIDE-ED: A Strategy-Grounded Stepwise Reasoning Framework for Empathetic Dialogue Systems


29. AV-SQL: Decomposing Complex Text-to-SQL Queries with Agentic Views


30. KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis



32. Self-Preference Bias in Rubric-Based Evaluation of Large Language Models


33. An empirical study of LoRA-based fine-tuning of large language models for automated test case generation


34. Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models


35. The AI Skills Shift: Mapping Skill Obsolescence, Emergence, and Transition Pathways in the LLM Era


36. XR-CareerAssist: An Immersive Platform for Personalised Career Guidance Leveraging Extended Reality and Multimodal AI


37. SentinelSphere: Integrating AI-Powered Real-Time Threat Detection with Cybersecurity Awareness Training


38. Do We Need Distinct Representations for Every Speech Token? Unveiling and Exploiting Redundancy in Large Speech Language Models


39. Digital Skin, Digital Bias: Uncovering Tone-Based Biases in LLMs and Emoji Embeddings


40. MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors


41. HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues


42. On the Step Length Confounding in LLM Reasoning Data Selection


43. Towards Privacy-Preserving Large Language Model: Text-free Inference Through Alignment and Adaptation


44. WRAP++: Web discoveRy Amplified Pretraining


45. Environmental, Social and Governance Sentiment Analysis on Slovene News: A Novel Dataset and Models


46. OmniTabBench: Mapping the Empirical Frontiers of GBDTs, Neural Networks, and Foundation Models for Tabular Data at Scale


47. MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization


48. Evaluating Repository-level Software Documentation via Question Answering and Feature-Driven Development


49. FlowExtract: Procedural Knowledge Extraction from Maintenance Flowcharts


50. TeamLLM: A Human-Like Team-Oriented Collaboration Framework for Multi-Step Contextualized Tasks


51. Evaluating LLM-Based 0-to-1 Software Generation in End-to-End CLI Tool Scenarios


52. Luwen Technical Report


53. ChemVLR: Prioritizing Reasoning in Perception for Chemical Vision-Language Understanding


54. A Graph-Enhanced Defense Framework for Explainable Fake News Detection with LLM


55. Restoring Heterogeneity in LLM-based Social Simulation: An Audience Segmentation Approach


56. SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning


57. Scientific Knowledge-driven Decoding Constraints Improving the Reliability of LLMs


58. LLM-based Schema-Guided Extraction and Validation of Missing-Person Intelligence from Heterogeneous Data Sources


59. AI-Driven Research for Databases


60. SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills


61. MedConclusion: A Benchmark for Biomedical Conclusion Generation from Structured Abstracts


62. Improving Robustness In Sparse Autoencoders via Masked Regularization


63. Inference-Time Code Selection via Symbolic Equivalence Partitioning


64. Distributed Interpretability and Control for Large Language Models


65. Multi-objective Evolutionary Merging Enables Efficient Reasoning Models


66. The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?


67. The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning


68. When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don’t


69. Attention Flows: Tracing LLM Conceptual Engagement via Story Summaries


70. Say Something Else: Rethinking Contextual Privacy as Information Sufficiency


71. WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks


72. In-Context Learning in Speech Language Models: Analyzing the Role of Acoustic Features, Linguistic Structure, and Induction Heads


73. Severity-Aware Weighted Loss for Arabic Medical Text Generation


74. A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech


75. Blockchain and AI: Securing Intelligent Networks for the Future


76. AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent


77. TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models


78. Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization


79. ClawLess: A Security Model of AI Agents


80. Plasma GraphRAG: Physics-Grounded Parameter Selection for Gyrokinetic Simulations


81. Towards the Development of an LLM-Based Methodology for Automated Security Profiling in Compliance with Ukrainian Cybersecurity Regulations


82. MAT-Cell: A Multi-Agent Tree-Structured Reasoning Framework for Batch-Level Single-Cell Annotation


83. Attribution-Driven Explainable Intrusion Detection with Encoder-Based Large Language Models


84. ToxReason: A Benchmark for Mechanistic Chemical Toxicity Reasoning via Adverse Outcome Pathway


85. Incentive-Aware Multi-Fidelity Optimization for Generative Advertising in Large Language Models


86. $S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models


87. FLeX: Fourier-based Low-rank EXpansion for multilingual transfer


88. DISSECT: Diagnosing Where Vision Ends and Language Priors Begin in Scientific VLMs


89. SALLIE: Safeguarding Against Latent Language & Image Exploits


90. Automating Database-Native Function Code Synthesis with LLMs


91. Probabilistic Language Tries: A Unified Framework for Compression, Decision Policies, and Execution Reuse


92. The End of the Foundation Model Era: Open-Weight Models, Sovereign AI, and Inference as Infrastructure


93. Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses


94. Unsupervised Neural Network for Automated Classification of Surgical Urgency Levels in Medical Transcriptions


95. Invisible Influences: Investigating Implicit Intersectional Biases through Persona Engineering in Large Language Models


96. Code Sharing In Prediction Model Research: A Scoping Review


97. Illocutionary Explanation Planning for Source-Faithful Explanations in Retrieval-Augmented Language Models


98. Distributional Open-Ended Evaluation of LLM Cultural Value Alignment Based on Value Codebook


99. Extracting Breast Cancer Phenotypes from Clinical Notes: Comparing LLMs with Classical Ontology Methods


100. A Comparative Study of Demonstration Selection for Practical Large Language Models-based Next POI Prediction


101. The Human Condition as Reflected in Contemporary Large Language Models


102. Tool-MCoT: Tool Augmented Multimodal Chain-of-Thought for Content Safety Moderation


103. SensorPersona: An LLM-Empowered System for Continual Persona Extraction from Longitudinal Mobile Sensor Streams


104. Front-End Ethics for Sensor-Fused Health Conversational Agents: An Ethical Design Space for Biometrics


105. Cross-Lingual Transfer and Parameter-Efficient Adaptation in the Turkic Language Family: A Theoretical Framework for Low-Resource Language Models


106. Beyond Facts: Benchmarking Distributional Reading Comprehension in Large Language Models


107. Concentrated siting of AI data centers drives regional power-system stress under rising global compute demand


108. Temporally Phenotyping GLP-1RA Case Reports with Large Language Models: A Textual Time Series Corpus and Risk Modeling


109. Consistency-Guided Decoding with Proof-Driven Disambiguation for Three-Way Logical Question Answering


110. Hallucination as output-boundary misclassification: a composite abstention architecture for language models


111. The Stepwise Informativeness Assumption: Why are Entropy Dynamics and Reasoning Correlated in LLMs?


112. LLM Spirals of Delusion: A Benchmarking Audit Study of AI Chatbot Interfaces


113. Benchmarking LLM Tool-Use in the Wild



115. EviSnap: Faithful Evidence-Cited Explanations for Cold-Start Cross-Domain Recommendation


116. LLM-Augmented Knowledge Base Construction For Root Cause Analysis


117. Fighting AI with AI: AI-Agent Augmented DNS Blocking of LLM Services during Student Evaluations


118. Knowledge Graphs Generation from Cultural Heritage Texts: Combining LLMs and Ontological Engineering for Scholarly Debates