LLM 관련 주요 논문 - 2026-02-19

1. Agent Skill Framework: Perspectives on the Potential of Small Language Models in Industrial Environments


2. Creating a digital poet


3. Framework of Thoughts: A Foundation Framework for Dynamic and Optimized Reasoning based on Chains, Trees, and Graphs


4. Leveraging Large Language Models for Causal Discovery: a Constraint-based, Argumentation-driven Approach


5. Verifiable Semantics for Agent-to-Agent Communication


6. Toward Scalable Verifiable Reward: Proxy State-Based Evaluation for Multi-turn Tool-Calling LLM Agents


7. GPSBench: Do Large Language Models Understand GPS Coordinates?


8. Improving Interactive In-Context Learning from Natural Language Feedback


9. Evidence-Grounded Subspecialty Reasoning: Evaluating a Curated Clinical Intelligence Layer on the 2025 Endocrinology Board-Style Examination


10. How Uncertain Is the Grade? A Benchmark of Uncertainty Metrics for LLM-Based Automatic Assessment


11. Policy Compiler for Secure Agentic Systems


12. Measuring Mid-2025 LLM-Assistance on Novice Performance in Biology


13. Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents


14. SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation


15. Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment


16. Retrieval Augmented Generation of Literature-derived Polymer Knowledge: The Example of a Biodegradable Polymer Expert System


17. Who can we trust? LLM-as-a-jury for Comparative Assessment


18. FlowPrefill: Decoupling Preemption from Prefill Scheduling Granularity to Mitigate Head-of-Line Blocking in LLM Serving


19. A Contrastive Learning Framework Empowered by Attention-based Feature Adaptation for Street-View Image Classification


20. Recursive language models for jailbreak detection: a procedural defense for tool-augmented agents


21. Learning to Learn from Language Feedback with Social Meta-Learning


22. IndicEval: A Bilingual Indian Educational Evaluation Framework for Large Language Models


23. Intra-Fairness Dynamics: The Bias Spillover Effect in Targeted LLM Alignment


24. Designing Production-Scale OCR for India: Multilingual and Domain-Specific Systems


25. Spatial Audio Question Answering and Reasoning on Dynamic Source Movements


26. Are LLMs Ready to Replace Bangla Annotators?


27. Long-Tail Knowledge in Large Language Models: Taxonomy, Mechanisms, Interventions and Implications


28. Beyond Learning: A Training-Free Alternative to Model Adaptation


29. HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents


30. Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution


31. Human-AI Collaboration in Large Language Model-Integrated Building Energy Management Systems: The Role of User Domain Knowledge and AI Literacy


32. Retrieval Collapses When AI Pollutes the Web


33. Surrogate-Based Prevalence Measurement for Large-Scale A/B Testing


34. OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis


35. Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs


36. Can Generative Artificial Intelligence Survive Data Contamination? Theoretical Guarantees under Contaminated Recursive Training


37. MAEB: Massive Audio Embedding Benchmark


38. Anatomy of Capability Emergence: Scale-Invariant Representation Collapse and Top-Down Reorganization in Neural Networks


39. ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization


40. DocSplit: A Comprehensive Benchmark Dataset and Evaluation Approach for Document Packet Recognition and Splitting


41. EarthSpatialBench: Benchmarking Spatial Reasoning Capabilities of Multimodal LLMs on Earth Imagery


42. Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis


43. Doc-to-LoRA: Learning to Instantly Internalize Contexts


44. Egocentric Bias in Vision-Language Models


45. Evidence for Daily and Weekly Periodic Variability in GPT-4o Performance


46. FUTURE-VLA: Forecasting Unified Trajectories Under Real-time Execution


47. Fly0: Decoupling Semantic Grounding from Geometric Planning for Zero-Shot Aerial Navigation


48. Test-Time Adaptation for Tactile-Vision-Language Models


49. Playing With AI: How Do State-Of-The-Art Large Language Models Perform in the 1977 Text-Based Adventure Game Zork?


50. Not the Example, but the Process: How Self-Generated Examples Enhance LLM Reasoning


51. Enhancing Action and Ingredient Modeling for Semantically Grounded Recipe Generation


52. CAST: Achieving Stable LLM-based Text Analysis for Data Analytics


53. State Design Matters: How Representations Shape Dynamic Reasoning in Large Language Models


54. Rethinking Soft Compression in Retrieval-Augmented Generation: A Query-Conditioned Selector Perspective


55. Decoupling Strategy and Execution in Task-Focused Dialogue via Goal-Oriented Preference Optimization


56. Narrative Theory-Driven LLM Methods for Automatic Story Generation and Understanding: A Survey


57. Preference Optimization for Review Question Generation Improves Writing Quality


58. Can LLMs Assess Personality? Validating Conversational AI for Trait Profiling


59. Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models


60. Language Model Representations for Efficient Few-Shot Tabular Classification


61. The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts