LLM 관련 주요 논문 - 2026-04-24

1. From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation


2. Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models


3. Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows


4. Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems


5. Who Defines “Best”? Towards Interactive, User-Defined Evaluation of LLM Leaderboards


6. GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion


7. CoFEE: Reasoning Control for LLM-Based Feature Discovery


8. Separable Expert Architecture: Toward Privacy-Preserving LLM Personalization via Composable Adapters and Deletable User Proxies


9. Unbiased Prevalence Estimation with Multicalibrated LLMs


10. BioMiner: A Multi-modal System for Automated Mining of Protein-Ligand Bioactivity Data from Literature


11. How English Print Media Frames Human-Elephant Conflicts in India


12. Efficient Agent Evaluation via Diversity-Guided User Simulation


13. AI-Gram: When Visual Agents Interact in a Social Network


14. FairQE: Multi-Agent Framework for Mitigating Gender Bias in Translation Quality Estimation


15. ReaGeo: Reasoning-Enhanced End-to-End Geocoding with LLMs


16. Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning


17. Ideological Bias in LLMs’ Economic Causal Reasoning


18. Spatial Metaphors for LLM Memory: A Critical Analysis of the MemPalace Architecture


19. Can MLLMs “Read” What is Missing?


20. Enhancing Online Recruitment with Category-Aware MoE and LLM-based Data Augmentation


21. ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures


22. Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management


23. Trust but Verify: Introducing DAVinCI – A Framework for Dual Attribution and Verification in Claim Inference for Language Models


24. Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction


25. Propensity Inference: Environmental Contributors to LLM Behaviour


26. Mind the Prompt: Self-adaptive Generation of Task Plan Explanations via LLMs


27. InVitroVision: a Multi-Modal AI Model for Automated Description of Embryo Development using Natural Language


28. Who Defines Fairness? Target-Based Prompting for Demographic Representation in Generative Models


29. HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering


30. Value-Conflict Diagnostics Reveal Widespread Alignment Faking in Language Models


31. Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks


32. Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI


33. When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs


34. TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale


35. A Multimodal Text- and Graph-Based Approach for Open-Domain Event Extraction from Documents


36. Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models


37. TraceScope: Interactive URL Triage via Decoupled Checklist Adjudication


38. Modulating Cross-Modal Convergence with Single-Stimulus, Intra-Modal Dispersion


39. Why are all LLMs Obsessed with Japanese Culture? On the Hidden Cultural and Regional Biases of LLMs


40. AEL: Agent Evolving Learning for Open-Ended Environments


41. Building a Precise Video Language with Human-AI Oversight


42. Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers


43. Process Supervision via Verbal Critique Improves Reasoning in Large Language Models


44. DryRUN: On the Role of Public Tests in LLM-Driven Code Generation


45. A Metamorphic Testing Approach to Diagnosing Memorization in LLM-Based Program Repair


46. Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation


47. Reasoning Primitives in Hybrid and Non-Hybrid LLMs


48. Differentially Private De-identification of Dutch Clinical Notes: A Comparative Evaluation


49. VG-CoT: Towards Trustworthy Visual Reasoning via Grounded Chain-of-Thought


50. VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation


51. mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code


52. Beyond Single Plots: A Benchmark for Question Answering on Multi-Charts


53. Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning


54. MiMIC: Mitigating Visual Modality Collapse in Universal Multimodal Retrieval While Avoiding Semantic Misalignment


55. Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition


56. CAP: Controllable Alignment Prompting for Unlearning in LLMs


57. SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference


58. EngramaBench: Evaluating Long-Term Conversational Memory with Structured Graph Retrieval


59. Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model


60. SQLyzr: A Comprehensive Benchmark and Evaluation Platform for Text-to-SQL


61. On Reasoning Behind Next Occupation Recommendation


62. Doubly Saturated Ramsey Graphs: A Case Study in Computer-Assisted Mathematical Discovery


63. Adaptive Instruction Composition for Automated LLM Red-Teaming


64. Dialect vs Demographics: Quantifying LLM Bias from Implicit Linguistic Signals vs. Explicit User Profiles


65. Enhancing Science Classroom Discourse Analysis through Joint Multi-Task Learning for Reasoning-Component Classification


66. Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms


67. Leveraging Multimodal LLMs for Built Environment and Housing Attribute Assessment from Street-View Imagery


68. Behavioral Consistency and Transparency Analysis on Large Language Model API Gateways


69. Serialisation Strategy Matters: How FHIR Data Format Affects LLM Medication Reconciliation


70. Strategic Polysemy in AI Discourse: A Philosophical Analysis of Language, Hype, and Power


71. Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models


72. Thinking Like a Botanist: Challenging Multimodal Language Models with Intent-Driven Chain-of-Inquiry


73. IRIS: Interpolative Rényi Iterative Self-play for Large Language Model Fine-Tuning


74. The Path Not Taken: Duality in Reasoning about Program Execution


75. Absorber LLM: Harnessing Causal Synchronization for Test-Time Training


76. Omission Constraints Decay While Commission Constraints Persist in Long-Context LLM Agents


77. Biomedical systems biology workflow orchestration and execution with PoSyMed


78. Reinforcing privacy reasoning in LLMs via normative simulacra from fiction


79. Predicting Scale-Up of Metal-Organic Framework Syntheses with Large Language Models


80. Deep Interest Mining with Cross-Modal Alignment for SemanticID Generation in Generative Recommendation


81. RealRoute: Dynamic Query Routing System via Retrieve-then-Verify Paradigm


82. KGiRAG: An Iterative GraphRAG Approach for Responding Sensemaking Queries


83. ERA: Evidence-based Reliability Alignment for Honest Retrieval-Augmented Generation


84. Association Is Not Similarity: Learning Corpus-Specific Associations for Multi-Hop Retrieval


85. MATRAG: Multi-Agent Transparent Retrieval-Augmented Generation for Explainable Recommendations


86. The Effect of Idea Elaboration on the Automatic Assessment of Idea Originality