LLM 관련 주요 논문 - 2026-02-12

1. FormalJudge: A Neuro-Symbolic Paradigm for Agentic Oversight


2. Can LLMs Cook Jamaican Couscous? A Study of Cultural Novelty in Recipe Generation


3. Reinforcing Chain-of-Thought Reasoning with Self-Evolving Rubrics


4. See, Plan, Snap: Evaluating Multimodal GUI Agents in Scratch


5. To Think or Not To Think, That is The Question for Large Reasoning Models in Theory of Mind Tasks


6. Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets


7. Abstraction Generation for Generalized Planning with Pretrained Large Language Models


8. MERIT Feedback Elicits Better Bargaining in LLM Negotiators


9. Found-RL: foundation model-enhanced reinforcement learning for autonomous driving


10. LiveMedBench: A Contamination-Free Medical Benchmark for LLMs with Automated Rubric Evaluation


11. Discovering Differences in Strategic Behavior Between Humans and LLMs


12. Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling


13. Weight Decay Improves Language Model Plasticity


14. Learning to Compose for Cross-domain Agentic Workflow Generation


15. DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning


16. SteuerLLM: Local specialized large language model for German tax law analysis


17. In-the-Wild Model Organisms: Mitigating Undesirable Emergent Behaviors in Production LLM Post-Training via Data Attribution


18. Chatting with Images for Introspective Visual Thinking


19. GraphSeek: Next-Generation Graph Analytics with LLMs


20. Language Model Inversion through End-to-End Differentiation


21. Chain-of-Look Spatial Reasoning for Dense Surgical Instrument Counting


22. Fine-Tuning GPT-5 for GPU Kernel Generation


23. FeatureBench: Benchmarking Agentic Coding for Complex Feature Development


24. Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers


25. Search or Accelerate: Confidence-Switched Position Beam Search for Diffusion Language Models


26. Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System



28. The CLEF-2026 FinMMEval Lab: Multilingual and Multimodal Evaluation of Financial AI Systems


29. Diagnosing Structural Failures in LLM-Based Evidence Extraction for Meta-Analysis


30. Beyond Confidence: The Rhythms of Reasoning in Generative Models


31. PELLI: Framework to effectively integrate LLMs for quality software generation


32. RSHallu: Dual-Mode Hallucination Evaluation for Remote-Sensing Multimodal Large Language Models with Domain-Tailored Mitigation


33. VulReaD: Knowledge-Graph-guided Software Vulnerability Reasoning and Detection


34. Locomo-Plus: Beyond-Factual Cognitive Memory Evaluation Framework for LLM Agents


35. VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training


36. Mitigating Reward Hacking in RLHF via Bayesian Non-negative Reward Modeling


37. Online Causal Kalman Filtering for Stable and Effective Policy Optimization


38. LLM-Based Scientific Equation Discovery via Physics-Informed Token-Regularized Policy Optimization


39. MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning


40. When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning


41. LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer


42. Contrastive Learning for Multi Label ECG Classification with Jaccard Score Based Sigmoid Loss


43. C^2ROPE: Causal Continuous Rotary Positional Encoding for 3D Large Multimodal-Models Reasoning


44. Enhancing Weakly Supervised Multimodal Video Anomaly Detection through Text Guidance


45. LHAW: Controllable Underspecification for Long-Horizon Tasks


46. Protecting Context and Prompts: Deterministic Security for Non-Deterministic AI


47. Constructing Industrial-Scale Optimization Modeling Benchmark


48. AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning


49. Control Reinforcement Learning: Token-Level Mechanistic Analysis via Learned SAE Feature Steering


50. AIvilization v0: Toward Large-Scale Artificial Social Simulation with a Unified Agent Architecture and Adaptive Agent Profiles


51. Modular Multi-Task Learning for Chemical Reaction Prediction


52. Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs


53. Making Databases Faster with LLM Evolutionary Sampling


54. Learning Self-Interpretation from Interpretability Artifacts: Training Lightweight Adapters on Vector-Label Pairs


55. Are More Tokens Rational? Inference-Time Scaling in Language Models as Adaptive Resource Rationality


56. KORAL: Knowledge Graph Guided LLM Reasoning for SSD Operational Analysis


57. ImprovEvolve: Ask AlphaEvolve to Improve the Input Solution and Then Improvise


58. Self-Evolving Recommendation System: End-To-End Autonomous Model Optimization With LLM Agents


59. Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models


60. EvoCodeBench: A Human-Performance Benchmark for Self-Evolving LLM-Driven Coding Systems


61. Beyond SMILES: Evaluating Agentic Systems for Drug Discovery


62. Omni-Safety under Cross-Modality Conflict: Vulnerabilities, Dynamics Mechanisms and Efficient Alignment


63. PRISM-XR: Empowering Privacy-Aware XR Collaboration with Multimodal Large Language Models


64. Exploring Semantic Labeling Strategies for Third-Party Cybersecurity Risk Assessment Questionnaires


65. Red-teaming the Multimodal Reasoning: Jailbreaking Vision-Language Models via Cross-modal Entanglement Attacks


66. On the Use of a Large Language Model to Support the Conduction of a Systematic Mapping Study: A Brief Report from a Practitioner’s View


67. When LLMs get significantly worse: A statistical approach to detect model degradations


68. Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study


69. Anonymization-Enhanced Privacy Protection for Mobile GUI Agents: Available but Invisible


70. Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs – Evolution, Limitations, and Cognitive Enhancement


71. Reverse-Engineering Model Editing on Language Models


72. AgentTrace: A Structured Logging Framework for Agent System Observability


73. “Humans welcome to observe”: A First Look at the Agent Social Network Moltbook


74. Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke