LLM 관련 주요 논문 - 2025-11-05

1. Simulating Environments with Reasoning Models for Agent Training



3. ExplicitLM: Decoupling Knowledge from Parameters via Explicit Memory Banks


4. Analyzing Sustainability Messaging in Large-Scale Corporate Social Media


5. TPS-Bench: Evaluating AI Agents’ Tool Planning \& Scheduling Abilities in Compounding Tasks


6. Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges


7. Automatic Minds: Cognitive Parallels Between Hypnotic States and Large Language Model Processing


8. llmSHAP: A Principled Approach to LLM Explainability


9. QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code


10. MiRAGE: Misconception Detection with Retrieval-Guided Multi-Stage Reasoning and Ensemble Fusion


11. DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models


12. Modular Task Decomposition and Dynamic Collaboration in Multi-Agent Systems Driven by Large Language Models


13. Efficient Test-Time Retrieval Augmented Generation


14. Knowledge Elicitation with Large Language Models for Interpretable Cancer Stage Identification from Pathology Reports


15. AI for pRedicting Exacerbations in KIDs with aSthma (AIRE-KIDS)


16. Aligning LLM agents with human learning and adjustment behavior: a dual agent approach


17. LLMs Position Themselves as More Rational Than Humans: Emergence of AI Self-Awareness Measured Through Game Theory


18. Do Math Reasoning LLMs Help Predict the Impact of Public Transit Events?


19. Count-Based Approaches Remain Strong: A Benchmark Against Transformer and LLM Pipelines on Structured EHR


20. How Focused Are LLMs? A Quantitative Study via Repetitive Deterministic Prediction Tasks


21. Reevaluating Self-Consistency Scaling in Multi-Agent Systems


22. Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries


23. Leveraging Multi-Agent System (MAS) and Fine-Tuned Small Language Models (SLMs) for Automated Telecom Network Troubleshooting


24. Reimagining Safety Alignment with An Image


25. GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining


26. Efficiency vs. Alignment: Investigating Safety and Fairness Risks in Parameter-Efficient Fine-Tuning of LLMs


27. Diverse Human Value Alignment for Large Language Models via Ethical Reasoning



29. Advancing Cognitive Science with LLMs


30. Engineering.ai: A Platform for Teams of AI Engineers in Computational Design


31. QuantumBench: A Benchmark for Quantum Problem Solving


32. SmartMLOps Studio: Design of an LLM-Integrated IDE with Automated MLOps Pipelines for Model Development and Monitoring


33. A Detailed Study on LLM Biases Concerning Corporate Social Responsibility and Green Supply Chains


34. Dynamic Routing Between Experts: A Data-Efficient Approach to Continual Learning in Vision-Language Models


35. KV Cache Transform Coding for Compact Storage in LLM Inference


36. Plan-and-Write: Structure-Guided Length Control for LLMs without Model Retraining


37. Accumulating Context Changes the Beliefs of Language Models


38. Random Initialization of Gated Sparse Adapters


39. GenDexHand: Generative Simulation for Dexterous Hands


40. Context-Guided Decompilation: A Step Towards Re-executability


41. RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks


42. Scam Shield: Multi-Model Voting and Fine-Tuned LLMs Against Adversarial Attacks


43. Multi-Step Knowledge Interaction Analysis via Rank-2 Subspace Disentanglement


44. Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI


45. SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia


46. EngChain: A Symbolic Benchmark for Verifiable Multi-Step Reasoning in Engineering


47. A Graph-based RAG for Energy Efficiency Question Answering


48. Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models


49. Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving


50. Imperfect Language, Artificial Intelligence, and the Human Mind: An Interdisciplinary Approach to Linguistic Errors in Native Spanish Speakers


51. BanglaNirTox: A Large-scale Parallel Corpus for Explainable AI in Bengali Text Detoxification


52. HMVLM: Human Motion-Vision-Lanuage Model via MoE LoRA


53. When to Trust the Answer: Question-Aligned Semantic Nearest Neighbor Entropy for Safer Surgical VQA


54. Privacy Preserving Ordinal-Meta Learning with VLMs for Fine-Grained Fruit Quality Prediction


55. SEPS: Semantic-enhanced Patch Slimming Framework for fine-grained cross-modal alignment


56. RAGSmith: A Framework for Finding the Optimal Composition of Retrieval-Augmented Generation Methods Across Datasets


57. PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise


58. DEEPAMBIGQA: Ambiguous Multi-hop Questions for Benchmarking LLM Answer Completeness


59. Exploringand Unleashing the Power of Large Language Models in CI/CD Configuration Translation


60. DeepSpecs: Expert-Level Questions Answering in 5G


61. When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding


62. Rescuing the Unpoisoned: Efficient Defense against Knowledge Corruption Attacks on RAG Systems


63. Speech-DRAME: A Framework for Human-Aligned Benchmarks in Speech Role-Play


64. Forget BIT, It is All about TOKEN: Towards Semantic Information Theory for LLMs


65. An Interdisciplinary and Cross-Task Review on Missing Data Imputation


66. ZoFia: Zero-Shot Fake News Detection with Entity-Guided Retrieval and Multi-LLM Interaction


67. AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence


68. GeoToken: Hierarchical Geolocalization of Images via Next Token Prediction


69. HAFixAgent: History-Aware Automated Program Repair Agent


70. OceanAI: A Conversational Platform for Accurate, Transparent, Near-Real-Time Oceanographic Insights


71. ORANGE: An Online Reflection ANd GEneration framework with Domain Knowledge for Text-to-SQL


72. The Riddle of Reflection: Evaluating Reasoning and Self-Awareness in Multilingual LLMs using Indian Riddles


73. URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model


74. Maestro: Orchestrating Robotics Modules with Vision-Language Models for Zero-Shot Generalist Robots


75. Assessing LLM Reasoning Steps via Principal Knowledge Grounding


76. Pay for The Second-Best Service: A Game-Theoretic Approach Against Dishonest LLM Providers


77. OmniBrainBench: A Comprehensive Multimodal Benchmark for Brain Imaging Analysis Across Multi-stage Clinical Tasks


78. CodeClash: Benchmarking Goal-Oriented Software Engineering


79. Enhancing Adversarial Transferability in Visual-Language Pre-training Models via Local Shuffle and Sample-based Attack


80. GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding


81. Efficient Reinforcement Learning for Large Language Models with Intrinsic Exploration


82. A Voice-Enabled Virtual Patient System for Interactive Training in Standardized Clinical Assessment


83. Evolve to Inspire: Novelty Search for Diverse Image Generation


84. Isotropic Curvature Model for Understanding Deep Learning Optimization: Is Gradient Orthogonalization Optimal?


85. ShadowLogic: Backdoors in Any Whitebox LLM


86. AgentGit: A Version Control Framework for Reliable and Scalable LLM-Powered Multi-Agent Systems


87. Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering


88. EPARA: Parallelizing Categorized AI Inference in Edge Clouds


89. Diagnosing Hallucination Risk in AI Surgical Decision-Support: A Sequential Framework for Sequential Validation


90. FlashEVA: Accelerating LLM inference via Efficient Attention


91. Red-teaming Activation Probes using Prompted LLMs


92. Air Pollution Forecasting in Bucharest


93. HIP-LLM: A Hierarchical Imprecise Probability Approach to Reliability Assessment of Large Language Models


94. Reasoning Planning for Language Models


95. Proactive DDoS Detection and Mitigation in Decentralized Software-Defined Networking via Port-Level Monitoring and Zero-Training Large Language Models


96. DRIP: Defending Prompt Injection via De-instruction Training and Residual Fusion Model Architecture


97. MedRECT: A Medical Reasoning Benchmark for Error Correction in Clinical Texts


98. LGCA: Enhancing Semantic Representation via Progressive Expansion


99. PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks


100. Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling


101. UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings


102. Exploiting Latent Space Discontinuities for Building Universal LLM Jailbreaks and Data Extraction Attacks


103. MH-1M: A 1.34 Million-Sample Comprehensive Multi-Feature Android Malware Dataset for Machine Learning, Deep Learning, Large Language Models, and Threat Intelligence Research


104. Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits


105. A Technical Exploration of Causal Inference with Hybrid LLM Synthetic Data


106. Calibration Across Layers: Understanding Calibration Evolution in LLMs


107. FedReplay: A Feature Replay Assisted Federated Transfer Learning Framework for Efficient and Privacy-Preserving Smart Agriculture



109. Neural Transparency: Mechanistic Interpretability Interfaces for Anticipating Model Behaviors for Personalized AI


110. Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning


111. Training LLMs Beyond Next Token Prediction - Filling the Mutual Information Gap


112. Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories


113. EL-MIA: Quantifying Membership Inference Risks of Sensitive Entities in LLMs


114. Effectiveness of LLMs in Temporal User Profiling for Recommendation


115. What a diff makes: automating code migration with large language models


116. A Dual Large Language Models Architecture with Herald Guided Prompts for Parallel Fine Grained Traffic Signal Control


117. Dynamic Model Selection for Trajectory Prediction via Pairwise Ranking and Meta-Features


118. Inferring multiple helper Dafny assertions with LLMs


119. LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers


120. Cognitive Alignment in Personality Reasoning: Leveraging Prototype Theory for MBTI Inference


121. Chain of Time: In-Context Physical Simulation with Image Generation Models


122. Loquetier: A Virtualized Multi-LoRA Framework for Unified LLM Fine-tuning and Serving


123. Urban-MAS: Human-Centered Urban Prediction with LLM-Based Multi-Agent System


124. Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail


125. Adding New Capability in Existing Scientific Application with LLM Assistance


126. Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph


127. Benchmarking Generative AI Against Bayesian Optimization for Constrained Multi-Objective Inverse Design


128. Latent Domain Prompt Learning for Vision-Language Models


129. World Simulation with Video Foundation Models for Physical AI


130. MISA: Memory-Efficient LLMs Optimization with Module-wise Importance Sampling


131. SpatialTraceGen: High-Fidelity Traces for Efficient VLM Spatial Reasoning Distillation


132. FLoRA: Fused forward-backward adapters for parameter efficient fine-tuning and reducing inference-time latencies of LLMs


133. Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World


134. Semi-Supervised Preference Optimization with Limited Feedback


135. Feature-Guided SAE Steering for Refusal-Rate Control using Contrasting Prompts


136. Chitchat with AI: Understand the supply chain carbon disclosure of companies worldwide through Large Language Model


137. Generative human motion mimicking through feature extraction in denoising diffusion settings