LLM 관련 주요 논문 - 2025-10-22

1. Seg the HAB: Language-Guided Geospatial Algae Bloom Reasoning and Segmentation


2. VAR: Visual Attention Reasoning via Structured Search and Backtracking


3. SOCIA-Nabla: Textual Gradient Meets Multi-Agent Orchestration for Automated Simulator Generation


4. Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models


5. Crucible: Quantifying the Potential of Control Algorithms through LLM Agents


6. StarBench: A Turn-Based RPG Benchmark for Agentic Multimodal Decision-Making and Information Seeking


7. LAFA: Agentic LLM-Driven Federated Analytics over Decentralized Data Sources


8. Probabilistic Modeling of Intentions in Socially Intelligent LLM Agents


9. CircuitSeer: Mining High-Quality Data by Probing Mathematical Reasoning Circuits in LLMs


10. PlanU: Large Language Model Decision Making through Planning under Uncertainty


11. AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library


12. Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents


13. Memory-Augmented State Machine Prompting: A Novel LLM Agent Framework for Real-Time Strategy Games


14. Genesis: Evolving Attack Strategies for LLM Web Agent Red-Teaming


15. Illusions of reflection: open-ended task reveals systematic failures in Large Language Models’ reflective reasoning


16. ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning


17. Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains


18. AgentChangeBench: A Multi-Dimensional Evaluation Framework for Goal-Shift Robustness in Conversational AI


19. Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model


20. LLM-Based Multi-Agent System for Simulating and Analyzing Marketing and Consumer Behavior


21. Annotating the Chain-of-Thought: A Behavior-Labeled Dataset for AI Safety


22. Learning from Generalization Patterns: An Evaluation-Driven Approach to Enhanced Data Augmentation for Fine-Tuning Small Language Models


23. Measuring Reasoning in LLMs: a New Dialectical Angle


24. SMaRT: Select, Mix, and ReinvenT - A Strategy Fusion Framework for LLM-Driven Reasoning and Planning


25. Planned Diffusion


26. CompactPrompt: A Unified Pipeline for Prompt Data Compression in LLM Workflows


27. OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning


28. FABRIC: Framework for Agent-Based Realistic Intelligence Creation


29. Beyond More Context: Retrieval Diversity Boosts Multi-Turn Intent Understanding


30. Activation Manifold Projection: Liberating Task-Specific Behaviors from LLM Architectures


31. Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs


32. How Do LLMs Use Their Depth?


33. LightMem: Lightweight and Efficient Memory-Augmented Generation


34. Towards Faithful and Controllable Personalization via Critique-Post-Edit Reinforcement Learning


35. Fine-Tuned Thoughts: Leveraging Chain-of-Thought Reasoning for Industrial Asset Health Monitoring


36. Online SFT for LLM Reasoning: Surprising Effectiveness of Self-Tuning without Rewards


37. Verifiable Accuracy and Abstention Rewards in Curriculum RL to Alleviate Lost-in-Conversation


38. HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models


39. Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options


40. Fetch.ai: An Architecture for Modern Multi-Agent Systems


41. Exploring Membership Inference Vulnerabilities in Clinical Large Language Models


42. Reasoning Language Model Inference Serving Unveiled: An Empirical Study


43. Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views


44. Large language models for folktale type automation based on motifs: Cinderella case study


45. WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality


46. EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval


47. Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation


48. Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation


49. One Size Fits All? A Modular Adaptive Sanitization Kit (MASK) for Customizable Privacy-Preserving Phone Scam Detection


50. CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment


51. Simple and Efficient Heterogeneous Temporal Graph Neural Network


52. ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization


53. MENTOR: A Reinforcement Learning Framework for Model Enhancement via Teacher-Optimized Rewards in Small Models


54. From Retrieval to Generation: Unifying External and Parametric Knowledge for Medical Question Answering


55. Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs


56. StreamingTOM: Streaming Token Compression for Efficient Video Understanding


57. DelvePO: Direction-Guided Self-Evolving Framework for Flexible Prompt Optimization


58. Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs


59. Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge


60. RadDiagSeg-M: A Vision Language Model for Joint Diagnosis and Multi-Target Segmentation in Radiology


61. ActivationReasoning: Logical Reasoning in Latent Activation Spaces


62. Automatic Prompt Generation via Adaptive Selection of Prompting Techniques


63. From AutoRecSys to AutoRecLab: A Call to Build, Evaluate, and Govern Autonomous Recommender-Systems Research Labs


64. Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth


65. Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models


66. Language Models as Semantic Augmenters for Sequential Recommenders


67. SAVANT: Semantic Analysis with Vision-Augmented Anomaly deTection


68. From Local to Global: Revisiting Structured Pruning Paradigms for Large Language Models


69. DynaQuery: A Self-Adapting Framework for Querying Structured and Multimodal Data


70. Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution


71. BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?


72. SimBA: Simplifying Benchmark Analysis Using Performance Matrices Alone


73. PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits


74. Believe It or Not: How Deeply do LLMs Believe Implanted Facts?


75. UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts


76. AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM


77. EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning


78. SpecAgent: A Speculative Retrieval and Forecasting Agent for Code Completion


79. Efficient Toxicity Detection in Gaming Chats: A Comparative Study of Embeddings, Fine-Tuned Transformers and LLMs


80. Rewarding the Journey, Not Just the Destination: A Composite Path and Answer Self-Scoring Reward Mechanism for Test-Time Reinforcement Learning


81. Select-Then-Decompose: From Empirical Analysis to Adaptive Selection Strategy for Task Decomposition in Large Language Models


82. CLAWS:Creativity detection for LLM-generated solutions using Attention Window of Sections


83. ParaVul: A Parallel Large Language Model and Retrieval-Augmented Framework for Smart Contract Vulnerability Detection


84. JT-Safe: Intrinsically Enhancing the Safety and Trustworthiness of LLMs


85. TACLA: An LLM-Based Multi-Agent Tool for Transactional Analysis Training in Education


86. Interpretability Framework for LLMs in Undergraduate Calculus


87. BreakFun: Jailbreaking LLMs via Schema Exploitation



89. Automated Algorithm Design for Auto-Tuning Optimizers


90. L-MoE: End-to-End Training of a Lightweight Mixture of Low-Rank Adaptation Experts


91. Long-Context Attention Benchmark: From Kernel Efficiency to Distributed Context Parallelism


92. Hierarchical Federated Unlearning for Large Language Models


93. When Intelligence Fails: An Empirical Study on Why LLMs Struggle with Password Cracking


94. From Flows to Words: Can Zero-/Few-Shot LLMs Detect Network Intrusions? A Grammar-Constrained, Calibrated Evaluation on UNSW-NB15


95. Does GenAI Rewrite How We Write? An Empirical Study on Two-Million Preprints


96. POPI: Personalizing LLMs via Optimized Natural Language Preference Inference


97. Outraged AI: Large language models prioritise emotion over cost in fairness enforcement


98. 3D Weakly Supervised Semantic Segmentation via Class-Aware and Geometry-Guided Pseudo-Label Refinement


99. Repairing Tool Calls Using Post-tool Execution Reflection and RAG


100. Modeling Layered Consciousness with Multi-Agent Large Language Models


101. GRETEL: A Goal-driven Retrieval and Execution-based Trial Framework for LLM Tool Selection Enhancing


102. Brain-Language Model Alignment: Insights into the Platonic Hypothesis and Intermediate-Layer Advantage


103. LLM Assisted Alpha Fairness for 6 GHz WiFi and NR_U Coexistence: An Agentic Orchestrator for Throughput, Energy, and SLA