LLM 관련 주요 논문 - 2026-02-02

1. High-quality generation of dynamic game content via small language models: A proof of concept


2. TSAQA: Time Series Analysis Question And Answering Benchmark


3. Make Anything Match Your Target: Universal Adversarial Perturbations against Closed-Source MLLMs via Multi-Crop Routed Meta Optimization


4. RAudit: A Blind Auditing Protocol for Large Language Model Reasoning


5. From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics


6. Guided by Trajectories: Repairing and Rewarding Tool-Use Trajectories for Tool-Integrated Reasoning


7. Quantifying Model Uniqueness in Heterogeneous AI Ecosystems


8. Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text


9. Alignment among Language, Vision and Action Representations


10. Game-Theoretic Co-Evolution for LLM-Based Heuristic Discovery


11. CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning


12. Toward IIT-Inspired Consciousness in LLMs: A Reward-Based Learning Framework


13. TSPO: Breaking the Double Homogenization Dilemma in Multi-turn Search Policy Optimization


14. AutoRefine: From Trajectories to Reusable Expertise for Continual LLM Agent Refinement


15. A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization


16. Best-of-Q: Improving VLM agents with Q-function Action Ranking at Inference


17. Real-Time Aligned Reward Model beyond Semantics


18. Task-Aware LLM Council with Adaptive Decision Pathways for Decision Support


19. UCPO: Uncertainty-Aware Policy Optimization


20. Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments


21. Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling


22. SYMPHONY: Synergistic Multi-agent Planning with Heterogeneous Language Model Assembly


23. Learn More with Less: Uncertainty Consistency Guided Query Selection for RLVR


24. PerfGuard: A Performance-Aware Agent for Visual Content Generation


25. Decoding in Geometry: Alleviating Embedding-Space Crowding for Complex Reasoning


26. Darwinian Memory: A Training-Free Self-Regulating Memory System for GUI Agent Evolution


27. Why Self-Rewarding Works: Theoretical Guarantees for Iterative Alignment of Language Models


28. When LLM meets Fuzzy-TOPSIS for Personnel Selection through Automated Profile Analysis


29. Sparks of Rationality: Do Reasoning LLMs Align with Human Judgment and Choice?


30. Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents


31. The Six Sigma Agent: Achieving Enterprise-Grade Reliability in LLM Systems Through Consensus-Driven Decomposed Execution


32. JAF: Judge Agent Forest


33. TEON: Tensorized Orthonormalization Beyond Layer-Wise Muon for Large Language Model Pre-Training


34. Now You Hear Me: Audio Narrative Attacks Against Large Audio-Language Models



36. Med-Scout: Curing MLLMs’ Geometric Blindness in Medical Perception via Geometry-Aware RL Post-Training


37. MonoScale: Scaling Multi-Agent System with Monotonic Improvement


38. Probing the Trajectories of Reasoning Traces in Large Language Models


39. SPICE: Submodular Penalized Information-Conflict Selection for Efficient Large Language Model Training


40. Secure Tool Manifest and Digital Signing Solution for Verifiable MCP and LLM Pipelines


41. WiFiPenTester: Advancing Wireless Ethical Hacking with Governed GenAI


42. From Similarity to Vulnerability: Key Collision Attack on LLM Semantic Caching


43. OrLog: Resolving Complex Queries with LLMs and Probabilistic Reasoning


44. Character as a Latent Variable in Large Language Models: A Mechanistic Account of Emergent Misalignment and Conditional Safety Failures


45. Towards Explicit Acoustic Evidence Perception in Audio LLMs for Speech Deepfake Detection


46. On the Impact of Code Comments for Automated Bug-Fixing: An Empirical Study


47. Bias Beyond Borders: Political Ideology Evaluation and Steering in Multilingual LLMs


48. Mano: Restriking Manifold Optimization for LLM Training


49. Residual Context Diffusion Language Models


50. Protecting Private Code in IDE Autocomplete using Differential Privacy


51. MTDrive: Multi-turn Interactive Reinforcement Learning for Autonomous Driving


52. BEAR: Towards Beam-Search-Aware Optimization for Recommendation with Large Language Models


53. Evaluating Large Language Models for Security Bug Report Prediction


54. DiffuSpeech: Silent Thought, Spoken Answer via Unified Speech-Text Diffusion


55. Should LLMs, $\textit{like}$, Generate How Users Talk? Building Dialect-Accurate Dialog[ue]s Beyond the American Default with MDial


56. EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis


57. MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering


58. Learning to Build Shapes by Extrusion


59. Just-in-Time Catching Test Generation at Meta


60. Hide and Seek in Embedding Space: Geometry-based Steganography and Detection in Large Language Models


61. How Far Can Pretrained LLMs Go in Symbolic Music? Controlled Comparisons of Supervised and Preference-based Adaptation


62. Qualitative Evaluation of LLM-Designed GUI


63. Procedural Knowledge Extraction from Industrial Troubleshooting Guides Using Vision Language Models


64. ImgCoT: Compressing Long Chain of Thought into Compact Visual Tokens for Efficient Reasoning of Large Language Model


65. AEGIS: White-Box Attack Path Generation using LLMs and Training Effectiveness Evaluation for Large-Scale Cyber Defence Exercises


66. Breaking the Blocks: Continuous Low-Rank Decomposed Scaling for Unified LLM Quantization and Adaptation


67. Vision-Language Models Unlock Task-Centric Latent Actions


68. Gated Relational Alignment via Confidence-based Distillation for Efficient VLMs


69. FNF: Functional Network Fingerprint for Large Language Models


70. Do Transformers Have the Ability for Periodicity Generalization?


71. NAG: A Unified Native Architecture for Encoder-free Text-Graph Modeling in Language Models


72. MCP-Diag: A Deterministic, Protocol-Driven Architecture for AI-Native Network Diagnostics


73. Time-Annealed Perturbation Sampling: Diverse Generation for Diffusion Language Models


74. TTCS: Test-Time Curriculum Synthesis for Self-Evolving


75. Language Model Circuits Are Sparse in the Neuron Basis


76. Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry


77. MC-GRPO: Median-Centered Group Relative Policy Optimization for Small-Rollout Reinforcement Learning


78. SpanNorm: Reconciling Training Stability and Performance in Deep Transformers


79. Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding


80. Whispers of Wealth: Red-Teaming Google’s Agent Payments Protocol via Prompt Injection


81. EUGens: Efficient, Unified, and General Dense Layers


82. Are LLM Evaluators Really Narcissists? Sanity Checking Self-Preference Evaluations


83. Towards the Holographic Characteristic of LLMs for Efficient Short-text Generation


84. Shattered Compositionality: Counterintuitive Learning Dynamics of Transformers for Arithmetic


85. FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks


86. Does My Chatbot Have an Agenda? Understanding Human and AI Agency in Human-Human-like Chatbot Interaction


87. Countering the Over-Reliance Trap: Mitigating Object Hallucination for LVLMs via a Self-Validation Framework


88. Tuning the Implicit Regularizer of Masked Diffusion Language Models: Enhancing Generalization via Insights from $k$-Parity


89. Automating Forecasting Question Generation and Resolution for AI Evaluation


90. Jailbreaks on Vision Language Model via Multimodal Reasoning


91. Culturally Grounded Personas in Large Language Models: Characterization and Alignment with Socio-Psychological Value Frameworks


92. SP^2DPO: An LLM-assisted Semantic Per-Pair DPO Generalization


93. Context Structure Reshapes the Representational Geometry of Language Models


94. MERMAID: Memory-Enhanced Retrieval and Reasoning with Multi-Agent Iterative Knowledge Grounding for Veracity Assessment


95. Recoverability Has a Law: The ERR Measure for Tool-Augmented Agents


96. From Retrieving Information to Reasoning with AI: Exploring Different Interaction Modalities to Support Human-AI Coordination in Clinical Decision-Making


97. PersonaCite: VoC-Grounded Interviewable Agentic Synthetic AI Personas for Verifiable User and Design Research


98. Predicting Intermittent Job Failure Categories for Diagnosis Using Few-Shot Fine-Tuned Language Models


99. MirrorMark: A Distortion-Free Multi-Bit Watermark for Large Language Models


100. A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy


101. Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation


102. Neural Signals Generate Clinical Notes in the Wild


103. ShellForge: Adversarial Co-Evolution of Webshell Generation and Multi-View Detection for Robust Webshell Defense


104. In Vino Veritas and Vulnerabilities: Examining LLM Safety via Drunk Language Inducement


105. UniFinEval: Towards Unified Evaluation of Financial Multimodal Models across Text, Images and Videos