LLM 관련 주요 논문 - 2025-09-19

1. Generalizable Geometric Image Caption Synthesis


2. Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment


3. A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making


4. Sentinel Agents for Secure and Trustworthy Agentic AI in Multi-Agent Systems


5. OpenLens AI: Fully Autonomous Research Agent for Health Infomatics


6. The NazoNazo Benchmark: A Cost-Effective and Extensible Test of Insight-Based Reasoning in LLMs


7. RationAnomaly: Log Anomaly Detection with Rationality via Chain-of-Thought and Reinforcement Learning


8. AgentCompass: Towards Reliable Evaluation of Agentic Workflows in Production


9. SynBench: A Benchmark for Differentially Private Text Generation


10. (P)rior(D)yna(F)low: A Priori Dynamic Workflow Construction via Multi-Agent Collaboration


11. Rationality Check! Benchmarking the Rationality of Large Language Models


12. VCBench: Benchmarking LLMs in Venture Capital


13. Detecting Pipeline Failures through Fine-Grained Analysis of Web Agents


14. From Capabilities to Performance: Evaluating Key Functional Properties of LLM Architectures in Penetration Testing


15. FlowRL: Matching Reward Distributions for LLM Reasoning


16. Orion: Fuzzing Workflow Automation


17. Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning


18. SMARTER: A Data-efficient Framework to Improve Toxicity Detection with Explanation via Self-augmenting Large Language Models


19. TextMine: LLM-Powered Knowledge Extraction for Humanitarian Mine Action


20. CLEAR: A Comprehensive Linguistic Evaluation of Argument Rewriting by Large Language Models


21. Cross-Modal Knowledge Distillation for Speech Large Language Models


22. Patent Language Model Pretraining with ModernBERT


23. A Multi-To-One Interview Paradigm for Efficient MLLM Evaluation


24. MARIC: Multi-Agent Reasoning for Image Classification


25. Empathy-R1: A Chain-of-Empathy and Reinforcement Learning Framework for Long-Form Mental Health Support


26. OnlineMate: An LLM-Based Multi-Agent Companion System for Cognitive Support in Online Learning


27. TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding


28. Spatial Audio Motion Understanding and Reasoning


29. MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models


30. Towards Human-like Multimodal Conversational Agent by Generating Engaging Speech


31. Reveal and Release: Iterative LLM Unlearning with Self-generated Data


32. Automating Modelica Module Generation Using Large Language Models: A Case Study on Building Control Description Language


33. Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection


34. Enterprise AI Must Enforce Participant-Aware Access Control


35. ATLANTIS: AI-driven Threat Localization, Analysis, and Triage Intelligence System


36. Do Vision-Language Models See Urban Scenes as People Do? An Urban Perception Benchmark


37. VisMoDAl: Visual Analytics for Evaluating and Improving Corruption Robustness of Vision-Language Models


38. LLM Jailbreak Detection for (Almost) Free!


39. Catch Me If You Can? Not Yet: LLMs Still Struggle to Imitate the Implicit Writing Styles of Everyday Authors


40. Delta Knowledge Distillation for Large Language Models


41. BEACON: Behavioral Malware Classification with Large Language Model Embeddings and Deep Learning


42. Introducing OmniGEC: A Silver Multilingual Dataset for Grammatical Error Correction


43. Process-Supervised Reinforcement Learning for Interactive Multimodal Tool-Use Agents


44. Correct-Detect: Balancing Performance and Ambiguity Through the Lens of Coreference Resolution in LLMs


45. Simulating a Bias Mitigation Scenario in Large Language Models


46. When Content is Goliath and Algorithm is David: The Style and Semantic Effects of Generative Search Engine


47. A Taxonomy of Prompt Defects in LLM Systems


48. Q-ROAR: Outlier-Aware Rescaling for RoPE Position Interpolation in Quantized Long-Context LLMs


49. Beyond Classification: Evaluating LLMs for Fine-Grained Automatic Malware Behavior Auditing


50. The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration


51. SCoGen: Scenario-Centric Graph-Based Synthesis of Real-World Code Problems


52. Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization


53. Beyond Data Privacy: New Privacy Risks for Large Language Models


54. FedMentor: Domain-Aware Differential Privacy for Heterogeneous Federated LLMs in Mental Health


55. Discovering New Theorems via LLMs with In-Context Proof Learning in Lean


56. SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models


57. SparseDoctor: Towards Efficient Chat Doctor with Mixture of Experts Enhanced Large Language Models


58. DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models


59. Graph-Enhanced Retrieval-Augmented Question Answering for E-Commerce Customer Support


60. Evolution of Kernels: Automated RISC-V Kernel Optimization with Large Language Models


61. Shutdown Resistance in Large Language Models


62. From Correction to Mastery: Reinforced Distillation of Large Language Model Agents


63. JU-NLP at Touché: Covert Advertisement in Conversational AI-Generation and Detection Strategies


64. Opening the Black Box: Interpretable LLMs via Semantic Resonance Architecture


65. Hallucination Detection with the Internal Layers of LLMs


66. CrossPT: Exploring Cross-Task Transferability through Multi-Task Prompt Tuning


67. LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures