LLM 관련 주요 논문 - 2026-03-06

1. The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks


2. Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation


3. Distributed Partial Information Puzzles: Examining Common Ground Construction Under Epistemic Asymmetry


4. Judge Reliability Harness: Stress Testing the Reliability of LLM Judges



6. STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks


7. X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes


8. GCAgent: Enhancing Group Chat Communication through Dialogue Agents System


9. MedCoRAG: Interpretable Hepatology Diagnosis via Hybrid Evidence Retrieval and Multispecialty Consensus


10. Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Reasoning


11. WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents


12. Enhancing Zero-shot Commonsense Reasoning by Integrating Visual Knowledge via Machine Imagination


13. Survive at All Costs: Exploring LLM’s Risky Behaviors under Survival Pressure


14. S5-SHB Agent: Society 5.0 enabled Multi-model Agentic Blockchain Framework for Smart Home


15. BioLLMAgent: A Hybrid Framework with Enhanced Structural Interpretability for Simulating Human Decision-Making in Computational Psychiatry


16. Alignment Backfire: Language-Dependent Reversal of Safety Interventions Across 16 Languages in LLM Multi-Agent Systems


17. EvoTool: Self-Evolving Tool-Use Policy Optimization in LLM Agents via Blame-Aware Mutation and Diversity-Aware Selection


18. Authorize-on-Demand: Dynamic Authorization with Legality-Aware Intellectual Property Protection for VLMs


19. Differentially Private Multimodal In-Context Learning


20. K-Gen: A Multimodal Language-Conditioned Approach for Interpretable Keypoint-Guided Trajectory Generation


21. On Multi-Step Theorem Prediction via Non-Parametric Structural Priors


22. Design Behaviour Codes (DBCs): A Taxonomy-Driven Layered Governance Benchmark for Large Language Models


23. VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment


24. LLM-Grounded Explainability for Port Congestion Prediction via Temporal Graph Attention Networks


25. EchoGuard: An Agentic Framework with Knowledge-Graph Memory for Detecting Manipulative Communication in Longitudinal Dialogue


26. MOOSEnger – a Domain-Specific AI Agent for the MOOSE Ecosystem


27. HiMAP-Travel: Hierarchical Multi-Agent Planning for Long-Horizon Constrained Travel


28. CONE: Embeddings for Complex Numerical Data Preserving Unit and Variable Semantics


29. Solving an Open Problem in Theoretical Physics using AI-Assisted Discovery


30. Using Vision + Language Models to Predict Item Difficulty


31. When Agents Persuade: Propaganda Generation and Mitigation in LLMs


32. Towards automated data analysis: A guided framework for LLM-based risk estimation


33. Self-Attribution Bias: When AI Monitors Go Easy on Themselves


34. Adaptive Memory Admission Control for LLM Agents


35. Progressive Refinement Regulation for Accelerating Diffusion Language Model Decoding


36. POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation


37. Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation


38. Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval


39. SAIL: Similarity-Aware Guidance and Inter-Caption Augmentation-based Learning for Weakly-Supervised Dense Video Captioning


40. Ensembling Language Models with Sequential Monte Carlo


41. PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration


42. Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution


43. WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation


44. Balancing Coverage and Draft Latency in Vocabulary Trimming for Faster Speculative Decoding


45. Escaping the Hydrolysis Trap: An Agentic Workflow for Inverse Design of Durable Photocatalytic Covalent Organic Frameworks


46. Logi-PAR: Logic-Infused Patient Activity Recognition via Differentiable Rule


47. C2-Faith: Benchmarking LLM Judges for Causal and Coverage Faithfulness in Chain-of-Thought Reasoning


48. LBM: Hierarchical Large Auto-Bidding Model via Reasoning and Acting


49. Measuring the Redundancy of Decoder Layers in SpeechLLMs



51. 3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding


52. When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger


53. Location-Aware Pretraining for Medical Difference Visual Question Answering


54. BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning


55. Free Lunch for Pass@$k$? Low Cost Diverse Sampling for Diffusion Language Models


56. Attention’s Gravitational Field:A Power-Law Interpretation of Positional Correlation


57. Guiding Diffusion-based Reconstruction with Contrastive Signals for Balanced Visual Representation


58. Beyond Linear LLM Invocation: An Efficient and Effective Semantic Filter Paradigm


59. TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings


60. Stacked from One: Multi-Scale Self-Injection for Context Window Extension


61. DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval


62. Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild


63. Detection of Illicit Content on Online Marketplaces using Large Language Models


64. Hate Speech Detection using Large Language Models with Data Augmentation and Feature Enhancement


65. Why the Brain Consolidates: Predictive Forgetting for Optimal Generalisation


66. Optimizing Language Models for Crosslingual Knowledge Consistency


67. Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks


68. Neuro-Symbolic Financial Reasoning via Deterministic Fact Ledgers and Adversarial Low-Latency Hallucination Detector


69. Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning


70. From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration


71. Understanding the Dynamics of Demonstration Conflict in In-Context Learning


72. VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling


73. Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks


74. Large Language Models as Bidding Agents in Repeated HetNet Auction


75. Query Disambiguation via Answer-Free Context: Doubling Performance on Humanity’s Last Exam


76. Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models


77. A unified foundational framework for knowledge injection and evaluation of Large Language Models in Combustion Science


78. vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models


79. AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems


80. ZorBA: Zeroth-order Federated Fine-tuning of LLMs with Heterogeneous Block Activation


81. What Is Missing: Interpretable Ratings for Large Language Model Outputs


82. Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Devices


83. Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?


84. Context-Dependent Affordance Computation in Vision-Language Models


85. Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Meaning in LLM Text Summaries


86. One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache


87. SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models


88. Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework


89. Semantic Containment as a Fundamental Property of Emergent Misalignment


90. CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models