LLM 관련 주요 논문 - 2026-02-06

1. DyTopo: Dynamic Topology Routing for Multi-Agent Reasoning via Semantic Matching


2. AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions


3. A Guide to Large Language Models in Modeling and Simulation: From Core Techniques to Critical Challenges


4. Agent2Agent Threats in Safety-Critical LLM Assistants: A Human-Centric Taxonomy


5. BABE: Biology Arena BEnchmark


6. TKG-Thinker: Towards Dynamic Reasoning over Temporal Knowledge Graphs via Agentic Reinforcement Learning


7. NEX: Neuron Explore-Exploit Scoring for Label-Free Chain-of-Thought Selection and Model Ranking


8. FiMI: A Domain-Specific Language Model for Indian Finance Ecosystem


9. Determining Energy Efficiency Sweet Spots in Production LLM Inference


10. Graph-based Agent Memory: Taxonomy, Techniques, and Applications


11. Generative Ontology: When Structured Knowledge Learns to Create


12. Emulating Aggregate Human Choice Behavior and Biases with GPT Conversational Agents


13. TangramSR: Can Vision-Language Models Reason in Continuous Geometric Space?


14. Reasoning-guided Collaborative Filtering with Language Models for Explainable Recommendation


15. Split Personality Training: Revealing Latent Knowledge Through Alternate Personalities


16. SDFP: Speculative Decoding with FIT-Pruned Models for Training-Free and Plug-and-Play LLM Acceleration


17. ALIVE: Awakening LLM Reasoning via Adversarial Learning and Instructive Verbal Evaluation


18. Refine and Purify: Orthogonal Basis Optimization with Null-Space Denoising for Conditional Representation Learning


19. H-AdminSim: A Multi-Agent Simulator for Realistic Hospital Administrative Workflows with FHIR Integration


20. Clinical Validation of Medical-based Large Language Model Chatbots on Ophthalmic Patient Queries with LLM-based Evaluation


21. RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs


22. AgentXRay: White-Boxing Agentic Systems via Workflow Reconstruction


23. ProAct: Agentic Lookahead in Interactive Environments


24. Hallucination-Resistant Security Planning with a Large Language Model


25. Surgery: Mitigating Harmful Fine-Tuning for Large Language Models via Attention Sink


26. HugRAG: Hierarchical Causal Knowledge Graph Design for RAG


27. SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers


28. Understanding LLM Evaluator Behavior: A Structured Multi-Evaluator Framework for Merchant Risk Assessment


29. GAMMS: Graph based Adversarial Multiagent Modeling Simulator


30. VERA-MH: Reliability and Validity of an Open-Source AI Safety Evaluation in Mental Health


31. Towards Reducible Uncertainty Modeling for Reliable Large Language Model Agents


32. Evaluating Large Language Models on Solved and Unsolved Problems in Graph Theory: Implications for Computing Education


33. MINT: Minimal Information Neuro-Symbolic Tree for Objective-Driven Knowledge-Gap Reasoning and Active Elicitation



35. CommCP: Efficient Multi-Agent Coordination via LLM-Based Communication with Conformal Prediction


36. Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory


37. Correctness-Optimized Residual Activation Lens (CORAL): Transferrable and Calibration-Aware Inference-Time Steering


38. GenArena: How Can We Achieve Human-Aligned Evaluation for Visual Generation Tasks?


39. Inverse Depth Scaling From Most Layers Being Similar


40. Compound Deception in Elite Peer Review: A Failure Mode Taxonomy of 100 Fabricated Citations at NeurIPS 2025


41. Regularized Calibration with Successive Rounding for Post-Training Quantization


42. EuroLLM-22B: Technical Report


43. xList-Hate: A Checklist-Based Framework for Interpretable and Generalizable Hate Speech Detection


44. DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders


45. Allocentric Perceiver: Disentangling Allocentric Reasoning from Egocentric Visual Priors via Frame Instantiation


46. Automated Customization of LLMs for Enterprise Code Repositories Using Semantic Scopes


47. Learning to Inject: Automated Prompt Injection via Reinforcement Learning


48. CompactRAG: Reducing LLM Calls and Token Overhead in Multi-Hop Question Answering


49. Towards Green AI: Decoding the Energy of LLM Inference in Software Development


50. Exploring AI-Augmented Sensemaking of Patient-Generated Health Data: A Mixed-Method Study with Healthcare Professionals in Cardiac Risk Reduction


51. Alignment Verifiability in Large Language Models: Normative Indistinguishability under Behavioral Evaluation


52. AI chatbots versus human healthcare professionals: a systematic review and meta-analysis of empathy in patient care


53. Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation


54. Multi-Task GRPO: Reliable LLM Reasoning Across Tasks


55. AI Agent Systems for Supply Chains: Structured Decision Prompts and Memory Retrieval


56. Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations


57. Transport and Merge: Cross-Architecture Merging for Large Language Models


58. A Unified Framework for Rethinking Policy Divergence Measures in GRPO


59. LinguistAgent: A Reflective Multi-Model Platform for Automated Linguistic Annotation


60. LMMRec: LLM-driven Motivation-aware Multimodal Recommendation


61. Structured Context Engineering for File-Native Agentic Systems: Evaluating Schema Accuracy, Format Effectiveness, and Multi-File Navigation at Scale


62. Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers


63. Beyond Length: Context-Aware Expansion and Independence as Developmentally Sensitive Evaluation in Child Utterances


64. Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening


65. FlashBlock: Attention Caching for Efficient Long-Context Block Diffusion


66. Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science


67. Hybrid Gated Flow (HGF): Stabilizing 1.58-bit LLMs via Selective Low-Rank Correction


68. CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs


69. EGSS: Entropy-guided Stepwise Scaling for Reliable Software Engineering


70. Semantic Search over 9 Million Mathematical Theorems


71. Aligning Large Language Model Behavior with Human Citation Preferences


72. Double-P: Hierarchical Top-P Sparse Attention for Long-Context LLMs


73. Data-Centric Interpretability for LLM-based Multi-Agent Reinforcement Learning


74. EBPO: Empirical Bayes Shrinkage for Stabilizing Group-Relative Policy Optimization


75. CoSA: Compressed Sensing-Based Adaptation of Large Language Models


76. TIDE: Temporal Incremental Draft Engine for Self-Improving LLM Inference


77. Rethinking Rubric Generation for Improving LLM Judge and Reward Modeling for Open-ended Tasks


78. VISTA: Enhancing Visual Conditioning via Track-Following Preference Optimization in Vision-Language-Action Models


79. Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?


80. CoWork-X: Experience-Optimized Co-Evolution for Multi-Agent Collaboration System


81. EntRGi: Entropy Aware Reward Guidance for Diffusion Language Models


82. Learning Rate Matters: Vanilla LoRA May Suffice for LLM Fine-tuning


83. Privileged Information Distillation for Language Models


84. Linear Model Merging Unlocks Simple and Scalable Multimodal Data Mixture Optimization


85. ASA: Activation Steering for Tool-Calling Domain Adaptation


86. Depth-Wise Emergence of Prediction-Centric Geometry in Large Language Models


87. PriMod4AI: Lifecycle-Aware Privacy Threat Modeling for AI Systems using LLM


88. Internalizing LLM Reasoning via Discovery and Replay of Latent Actions


89. A$^2$-LLM: An End-to-end Conversational Audio Avatar Large Language Model


90. Evaluating Kubernetes Performance for GenAI Inference: From Automatic Speech Recognition to LLM Summarization


91. Steering Externalities: Benign Activation Steering Unintentionally Increases Jailbreak Risk for Large Language Models


92. Extracting Recurring Vulnerabilities from Black-Box LLM-Generated Software


93. A Causal Perspective for Enhancing Jailbreak Attack and Defense