LLM 관련 주요 논문 - 2026-04-27

1. Rethinking Math Reasoning Evaluation: A Robust LLM-as-a-Judge Framework Beyond Symbolic Rigidity


2. Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents


3. Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models


4. When Does LLM Self-Correction Help? A Control-Theoretic Markov Diagnostic and Verify-First Intervention


5. Emergent Strategic Reasoning Risks in AI: A Taxonomy-Driven Evaluation Framework


6. Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents


7. Sound Agentic Science Requires Adversarial Experiments


8. Read the Paper, Write the Code: Agentic Reproduction of Social-Science Results


9. Math Takes Two: A test for emergent mathematical reasoning in communication


10. Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation


11. From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification


12. Learning Evidence Highlighting for Frozen LLMs


13. SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning


14. Controllable Spoken Dialogue Generation: An LLM-Driven Grading System for K-12 Non-Native English Learners


15. FeatEHR-LLM: Leveraging Large Language Models for Feature Engineering in Electronic Health Records


16. CGC: Compositional Grounded Contrast for Fine-Grained Multi-Image Understanding


17. SSG: Logit-Balanced Vocabulary Partitioning for LLM Watermarking


18. CNSL-bench: Benchmarking the Sign Language Understanding Capabilities of MLLMs on Chinese National Sign Language


19. BLAST: Benchmarking LLMs with ASP-based Structured Testing


20. Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets



22. Semantic Error Correction and Decoding for Short Block Channel Codes


23. Tell Me Why: Designing an Explainable LLM-based Dialogue System for Student Problem Behavior Diagnosis


24. Evaluating LLM-Based Goal Extraction in Requirements Engineering: Prompting Strategies and Their Limitations


25. An LLM-Driven Closed-Loop Autonomous Learning Framework for Robots Facing Uncovered Tasks in Open Environments


26. ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression


27. Estimating Tail Risks in Language Model Output Distributions


28. Reliable Self-Harm Risk Screening via Adaptive Multi-Agent LLM Systems


29. When AI Speaks, Whose Values Does It Express? A Cross-Cultural Audit of Individualism-Collectivism Bias in Large Language Models


30. PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training


31. Spontaneous Persuasion: An Audit of Model Persuasiveness in Everyday Conversations


32. Ethics Testing: Proactive Identification of Generative AI System Harms


33. Optimal Question Selection from a Large Question Bank for Clinical Field Recovery in Conversational Psychiatric Intake


34. Reliability Auditing for Downstream LLM tasks in Psychiatry: LLM-Generated Hospitalization Risk Scores


35. Lightweight Retrieval-Augmented Generation and Large Language Model-Based Modeling for Scalable Patient-Trial Matching


36. Call-Chain-Aware LLM-Based Test Generation for Java Projects


37. Shared Lexical Task Representations Explain Behavioral Variability In LLMs


38. MambaCSP: Hybrid-Attention State Space Models for Hardware-Efficient Channel State Prediction


39. Focus Session: Hardware and Software Techniques for Accelerating Multimodal Foundation Models


40. Feedback Over Form: Why Execution Feedback Matters More Than Pipeline Topology in 1-3B Code Generation


41. Large Language Models Are Bad Dice Players: LLMs Struggle to Generate Random Numbers from Statistical Distributions