LLM 관련 주요 논문 - 2026-03-12

1. A Hybrid Knowledge-Grounded Framework for Safety and Traceability in Prescription Verification


2. Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization


3. Trajectory-Informed Memory Generation for Self-Improving Agent Systems


4. Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning


5. CUAAudit: Meta-Evaluation of Vision-Language Models as Auditors of Autonomous Computer-Use Agents


6. Adaptive RAN Slicing Control via Reward-Free Self-Finetuning Agents


7. Resource-constrained Amazons chess decision framework integrating large language models and graph attention


8. Verbalizing LLM’s Higher-order Uncertainty via Imprecise Probabilities


9. Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability


10. Hybrid Self-evolving Structured Memory for GUI Agents


11. COMIC: Agentic Sketch Comedy Generation


12. Does AI See like Art Historians? Interpreting How Vision Language Models Recognize Artistic Style


13. GroundCount: Grounding Vision-Language Models with Object Detection for Mitigating Counting Hallucinations


14. When Fine-Tuning Fails and when it Generalises: Role of Data Diversity and Mixed Training in LLM-based TTS


15. LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation


16. Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models


17. Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis


18. Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation


19. Risk-Adjusted Harm Scoring for Automated Red Teaming for LLMs in Financial Services


20. Taking Shortcuts for Categorical VQA Using Super Neurons


21. Towards Robust Speech Deepfake Detection via Human-Inspired Reasoning


22. EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution


23. Are Video Reasoning Models Ready to Go Outside?


24. Reinforcement Learning with Conditional Expectation Reward


25. Towards Cognitive Defect Analysis in Active Infrared Thermography with Vision-Text Cues


26. SCORE: Replacing Layer Stacking with Contractive Recurrent Depth


27. Learning to Negotiate: Multi-Agent Deliberation for Collective Value Alignment in LLMs


28. Aligning Large Language Models with Searcher Preferences


29. G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition


30. The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training


31. Designing Service Systems from Textual Evidence


32. Causal Concept Graphs in LLM Latent Space for Stepwise Reasoning


33. Utility Function is All You Need: LLM-based Congestion Control


34. Mitigating Translationese Bias in Multilingual LLM-as-a-Judge via Disentangled Information Bottleneck


35. Is this Idea Novel? An Automated Benchmark for Judgment of Research Ideas


36. Simulation-in-the-Reasoning (SiR): A Conceptual Framework for Empirically Grounded AI in Autonomous Transportation


37. Conversational AI-Enhanced Exploration System to Query Large-Scale Digitised Collections of Natural History Museums


38. DUCTILE: Agentic LLM Orchestration of Engineering Analysis in Product Development Practice


39. Rethinking the Harmonic Loss via Non-Euclidean Distance Layers


40. Delta-K: Boosting Multi-Instance Generation via Cross-Attention Augmentation


41. Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models


42. MCP-in-SoS: Risk assessment framework for open-source MCP servers


43. Compatibility at a Cost: Systematic Discovery and Exploitation of MCP Clause-Compliance Vulnerabilities


44. Mashup Learning: Faster Finetuning by Remixing Past Checkpoints


45. Social Knowledge for Cross-Domain User Preference Modeling


46. The Generation-Recognition Asymmetry: Six Dimensions of a Fundamental Divide in Formal Language Theory


47. CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR


48. Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models


49. Execution Is the New Attack Surface: Survivability-Aware Agentic Crypto Trading with OpenClaw-Style Local Executors


50. Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference


51. ES-dLLM: Efficient Inference for Diffusion Large Language Models by Early-Skipping


52. KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization


53. Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models


54. Why LLMs Fail: A Failure Analysis and Partial Success Measurement for Automated Security Patch Generation


55. ADVERSA: Measuring Multi-Turn Guardrail Degradation and Judge Reliability in Large Language Models


56. HTMuon: Improving Muon via Heavy-Tailed Spectral Correction


57. Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead


58. Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents


59. Training Language Models via Neural Cellular Automata


60. Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination Reduction


61. Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safety


62. Targeted Bit-Flip Attacks on LLM-Based Agents


63. Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study


64. Prompts and Prayers: the Rise of GPTheology


65. DeliberationBench: A Normative Benchmark for the Influence of Large Language Models on Users’ Views


66. Assessing Cognitive Biases in LLMs for Judicial Decision Support: Virtuous Victim and Halo Effects


67. Measuring and Eliminating Refusals in Military Large Language Models


68. Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment


69. SENS-ASR: Semantic Embedding injection in Neural-transducer for Streaming Automatic Speech Recognition


70. SpreadsheetArena: Decomposing Preference in LLM Generation of Spreadsheet Workbooks


71. Leveraging Wikidata for Geographically Informed Sociocultural Bias Dataset Creation: Application to Latin America


72. Automated evaluation of LLMs for effective machine translation of Mandarin Chinese to English


73. There Are No Silly Questions: Evaluation of Offline LLM Capabilities from a Turkish Perspective


74. Context Over Compute Human-in-the-Loop Outperforms Iterative Chain-of-Thought Prompting in Interview Answer Quality


75. Evaluating Adjective-Noun Compositionality in LLMs: Functional vs Representational Perspectives


76. CEI: A Benchmark for Evaluating Pragmatic Reasoning in Language Models


77. TAMUSA-Chat: A Domain-Adapted Large Language Model Conversational System for Research and Responsible Deployment


78. PoultryLeX-Net: Domain-Adaptive Dual-Stream Transformer Architecture for Large-Scale Poultry Stakeholder Modeling


79. A Two-Stage Architecture for NDA Analysis: LLM-based Segmentation and Transformer-based Clause Classification



81. Causally Grounded Mechanistic Interpretability for LLMs with Faithful Natural-Language Explanations


82. Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation


83. Quantifying Hallucinations in Language Language Models on Medical Textbooks


84. The Dunning-Kruger Effect in Large Language Models: An Empirical Study of Confidence Calibration


85. Explainable LLM Unlearning Through Reasoning


86. One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis


87. Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards