LLM 관련 주요 논문 - 2026-02-23

1. Neurosymbolic Language Reasoning as Satisfiability Modulo Theory


2. WorkflowPerturb: Calibrated Stress Tests for Evaluating Multi-Agent Workflow Metrics


3. Alignment in Time: Peak-Aware Orchestration for Long-Horizon Agentic Systems


4. El Agente Gráfico: Structured Execution Graphs for Scientific Agents


5. The Token Games: Evaluating Language Model Reasoning with Puzzle Duels


6. Ontology-Guided Neuro-Symbolic Inference: Grounding Language Models with Mathematical Domain Knowledge


7. Epistemic Traps: Rational Misalignment Driven by Model Misspecification


8. Zero-shot Interactive Perception


9. “How Do I …?”: Procedural Questions Predominate Student-LLM Chatbot Conversations


10. Validating Political Position Predictions of Arguments


11. Vichara: Appellate Judgment Prediction and Explanation for the Indian Judicial System


12. Analyzing and Improving Chain-of-Thought Monitorability Through Information Theory


13. Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers


14. Simplifying Outcomes of Language Model Component Analyses with ELIA


15. Thinking by Subtraction: Confidence-Driven Contrastive Decoding for LLM Reasoning


16. [Re] Benchmarking LLM Capabilities in Negotiation through Scoreable Games


17. Capabilities Ain’t All You Need: Measuring Propensities in AI


18. Click it or Leave it: Detecting and Spoiling Clickbait with Informativeness Measures and Large Language Models


19. FENCE: A Financial and Multimodal Jailbreak Detection Dataset


20. Agentic Adversarial QA for Improving Domain-Specific LLMs


21. OODBench: Out-of-Distribution Benchmark for Large Vision-Language Models


22. Perceived Political Bias in LLMs Reduces Persuasive Abilities


23. Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards


24. NIMMGen: Learning Neural-Integrated Mechanistic Digital Twins with LLMs


25. Aurora: Neuro-Symbolic AI Driven Advising Agent


26. CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications


27. Memory-Based Advantage Shaping for LLM-Guided Reinforcement Learning


28. MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance


29. Improving Neural Topic Modeling with Semantically-Grounded Soft Label Distributions


30. Understanding Unreliability of Steering Vectors in Language Models: Geometric Predictors and the Limits of Linear Approximations


31. Understanding the Fine-Grained Knowledge Capabilities of Vision-Language Models


32. Five Fatal Assumptions: Why T-Shirt Sizing Systematically Fails for AI Projects


33. ScaleBITS: Scalable Bitwidth Search for Hardware-Aligned Mixed-Precision LLMs


34. Can LLM Safety Be Ensured by Constraining Parameter Regions?


35. EXACT: Explicit Attribute-Guided Decoding-Time Personalization


36. AsynDBT: Asynchronous Distributed Bilevel Tuning for efficient In-Context Learning with Large Language Models


37. Agentic Unlearning: When LLM Agent Meets Machine Unlearning


38. Robust Pre-Training of Medical Vision-and-Language Models with Domain-Invariant Multi-Modal Masked Reconstruction


39. Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO


40. CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models


41. Assessing LLM Response Quality in the Context of Technology-Facilitated Abuse


42. AI Hallucination from Students’ Perspective: A Thematic Analysis