LLM 관련 주요 논문 - 2026-02-17

1. BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents


2. X-SYS: A Reference Architecture for Interactive Explanation Systems


3. SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks


4. Evaluating Robustness of Reasoning Models on Parameterized Logical Problems


5. Think Fast and Slow: Step-Level Cognitive Depth Adaptation for LLM Agents


6. AI Agents for Inventory Control: Human-LLM-OR Complementarity


7. Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models


8. To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models


9. Intent-Driven Smart Manufacturing Integrating Knowledge Graphs and Large Language Models


10. A Theoretical Framework for Adaptive Utility-Weighted Benchmarking


11. Semantic Chunking and the Entropy of Natural Language


12. CoPE-VideoLM: Codec Primitives For Efficient Video Language Models


13. Asynchronous Verified Semantic Caching for Tiered LLM Architectures


14. In-Context Autonomous Network Incident Response: An End-to-End Large Language Model Agent Approach


15. SCOPE: Selective Conformal Optimized Pairwise LLM Judging


16. Look Inward to Explore Outward: Learning Temperature Policy from LLM Internal States via Hierarchical RL


17. Buy versus Build an LLM: A Decision Framework for Governments


18. Know More, Know Clearer: A Meta-Cognitive Framework for Knowledge Augmentation in Large Language Models


19. RGAlign-Rec: Ranking-Guided Alignment for Latent Query Reasoning in Recommendation Systems


20. TriGen: NPU Architecture for End-to-End Acceleration of Large Language Models based on SW-HW Co-Design


21. Transporting Task Vectors across Different Architectures without Training


22. Never say never: Exploring the effects of available knowledge on agent persuasiveness in controlled physiotherapy motivation dialogues


23. EPRBench: A High-Quality Benchmark Dataset for Event Stream Based Visual Place Recognition


24. RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training


25. Knowledge-Based Design Requirements for Generative Social Robots in Higher Education


26. Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models


27. TRACE: Temporal Reasoning via Agentic Context Evolution for Streaming Electronic Health Records (EHRs)


28. GRAIL: Geometry-Aware Retrieval-Augmented Inference with LLMs over Hyperbolic Representations of Patient Trajectories


29. Left-right asymmetry in predicting brain activity from LLMs’ representations emerges with their formal linguistic competence


30. RAT-Bench: A Comprehensive Benchmark for Text Anonymization


31. “Not Human, Funnier”: How Machine Identity Shapes Humor Perception in Online AI Stand-up Comedy


32. IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models


33. Artic: AI-oriented Real-time Communication for MLLM Video Assistant


34. Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats


35. TensorCommitments: A Lightweight Verifiable Inference for Language Models


36. Vision Token Reduction via Attention-Driven Self-Compression for Efficient Multimodal Large Language Models


37. Self-EvolveRec: Self-Evolving Recommender Systems with LLM-based Directional Feedback


38. VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction


39. Monte Carlo Tree Search with Reasoning Path Refinement for Small Language Models in Conversational Text-to-NoSQL


40. SD-MoE: Spectral Decomposition for Effective Expert Specialization


41. Decoder-only Conformer with Modality-aware Sparse Mixtures of Experts for ASR


42. Favia: Forensic Agent for Vulnerability-fix Identification and Analysis


43. Designing RNAs with Language Models


44. Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward


45. RankLLM: Weighted Ranking of LLMs by Quantifying Question Difficulty


46. CacheMind: From Miss Rates to Why – Natural-Language, Trace-Grounded Reasoning for Cache Replacement


47. Soft Contamination Means Benchmarks Test Shallow Generalization


48. What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis


49. Intrinsic Credit Assignment for Long Horizon Interaction


50. ForeAct: Steering Your VLA with Efficient Visual Foresight Planning


51. Perceptual Self-Reflection in Agentic Physics Simulation Code Generation


52. OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization


53. Retrieval-Augmented Self-Taught Reasoning Model with Adaptive Chain-of-Thought for ASR Named Entity Correction


54. From Biased Chatbots to Biased Agents: Examining Role Assignment Effects on LLM Agent Robustness


55. A Lightweight LLM Framework for Disaster Humanitarian Information Classification


56. Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection


57. Language-Guided Invariance Probing of Vision-Language Models