LLM 관련 주요 논문 - 2026-04-22

1. A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding


2. SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models


3. Time Series Augmented Generation for Financial Applications


4. Multi-modal Reasoning with LLMs for Visual Semantic Arithmetic


5. Detecting Data Contamination in Large Language Models


6. DT2IT-MRM: Debiased Preference Construction and Iterative Training for Multimodal Reward Modeling


7. SimDiff: Depth Pruning via Similarity and Difference


8. CoDA: Towards Effective Cross-domain Knowledge Transfer via CoT-guided Domain Adaptation


9. GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models


10. Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture The Flag Challenges


11. Large Language Models Exhibit Normative Conformity


12. Explicit Trait Inference for Multi-Agent Coordination


13. UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction


14. Reasoning-Aware AIGC Detection via Alignment and Reinforcement


15. Towards Scalable Lifelong Knowledge Editing with Selective Knowledge Suppression


16. OLLM: Options-based Large Language Models


17. Reinforcement Learning Improves LLM Accuracy and Reasoning in Disease Classification from Radiology Reports


18. SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution


19. DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning


20. Personalized Benchmarking: Evaluating LLMs by Individual Preferences


21. From Natural Language to Executable Narsese: A Neuro-Symbolic Benchmark and Pipeline for Reasoning with NARS


22. Human-Guided Harm Recovery for Computer Use Agents


23. AI scientists produce results without reasoning scientifically


24. ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System


25. Beyond One Output: Visualizing and Comparing Distributions of Language Model Generations


26. VLA Foundry: A Unified Framework for Training Vision-Language-Action Models


27. Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language


28. Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model


29. Cross-Model Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Across Three Large Language Models


30. Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI


31. Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps


32. Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment


33. Mesh Memory Protocol: Semantic Infrastructure for Multi-Agent LLM Systems


34. Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps


35. BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps


36. EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training



38. HP-Edit: A Human-Preference Post-Training Framework for Image Editing


39. Evaluation-driven Scaling for Scientific Discovery


40. PLaMo 2.1-VL Technical Report


41. RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models


42. Co-Refine: AI-Powered Tool Supporting Qualitative Analysis


43. HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models


44. Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms


45. IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text


46. Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs


47. Beyond Semantic Similarity: A Component-Wise Evaluation Framework for Medical Question Answering Systems with Health Equity Implications


48. CulturALL: Benchmarking Multilingual and Multicultural Competence of LLMs on Grounded Tasks


49. ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning


50. Streamliners for Answer Set Programming


51. Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs


52. SCURank: Ranking Multiple Candidate Summaries with Summary Content Units for Enhanced Summarization


53. LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation


54. ST-Prune: Training-Free Spatio-Temporal Token Pruning for Vision-Language Models in Autonomous Driving


55. The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models


56. DP-FlogTinyLLM: Differentially private federated log anomaly detection using Tiny LLMs


57. Think Before Writing: Feature-Level Multi-Objective Optimization for Generative Citation Visibility


58. ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety


59. Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery


60. SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning


61. RARE: Redundancy-Aware Retrieval Evaluation Framework for High-Similarity Corpora


62. Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control


63. FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion


64. Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees


65. $R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction


66. Self-Improving Tabular Language Models via Iterative Group Alignment


67. Distillation Traps and Guards: A Calibration Knob for LLM Distillability


68. Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest


69. Fine-Tuning Small Reasoning Models for Quantum Field Theory


70. MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation


71. Harmful Intent as a Geometrically Recoverable Feature of LLM Residual Streams


72. Hierarchically Robust Zero-shot Vision-language Models


73. Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning


74. Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring


75. OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens


76. LLM-as-Judge Framework for Evaluating Tone-Induced Hallucination in Vision-Language Models


77. Experiments or Outcomes? Probing Scientific Feasibility in Large Language Models


78. REVEAL: Multimodal Vision-Language Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction


79. Towards Understanding the Robustness of Sparse Autoencoders


80. Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling


81. Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation


82. Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks


83. Owner-Harm: A Missing Threat Model for AI Agent Safety


84. Unlocking the Edge deployment and ondevice acceleration of multi-LoRA enabled one-for-all foundational LLM


85. From Craft to Kernel: A Governance-First Execution Architecture and Semantic ISA for Agentic Computers


86. ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants


87. Agent-GWO: Collaborative Agents for Dynamic Prompt Optimization in Large Language Models


88. SpikeMLLM: Spike-based Multimodal Large Language Models via Modality-Specific Temporal Scales and Temporal Compression


89. TurboEvolve: Towards Fast and Robust LLM-Driven Program Evolution


90. Two-dimensional early exit optimisation of LLM inference


91. SPRITE: From Static Mockups to Engine-Ready Game UI


92. CentaurTA Studio: A Self-Improving Human-Agent Collaboration System for Thematic Analysis


93. Compile to Compress: Boosting Formal Theorem Provers by Compiler Outputs