LLM 관련 주요 논문 - 2025-10-06

1. Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner


2. CoDA: Agentic Systems for Collaborative Data Visualization


3. Improving Cooperation in Collaborative Embodied AI


4. Reward Model Routing in Alignment


5. Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents


6. NCV: A Node-Wise Consistency Verification Approach for Low-Cost Structured Error Localization in LLM Reasoning


7. Automated Constraint Specification for Job Scheduling by Regulating Generative Model with Domain-Specific Representation


8. ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks


9. AutoMaAS: Self-Evolving Multi-Agent Architecture Search for Large Language Models


10. Geolog-IA: Conversational System for Academic Theses


11. On the Role of Temperature Sampling in Test-Time Scaling


12. Multimodal Large Language Model Framework for Safe and Interpretable Grid-Integrated EVs


13. Agentic Additive Manufacturing Alloy Discovery


14. Multimodal Function Vectors for Spatial Relations


15. Safe and Efficient In-Context Learning via Risk Control


16. BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks


17. Reward Models are Metrics in a Trench Coat


18. Self-Anchor: Large Language Model Reasoning via Step-by-step Attention Alignment


19. Abstain and Validate: A Dual-LLM Policy for Reducing Noise in Agentic Program Repair


20. Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning


21. Topic Modeling as Long-Form Generation: Can Long-Context LLMs revolutionize NTM via Zero-Shot Prompting?


22. SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus


23. Investigating The Smells of LLM Generated Code


24. Untargeted Jailbreak Attack


25. Grounding Large Language Models in Clinical Evidence: A Retrieval-Augmented Generation System for Querying UK NICE Clinical Guidelines


26. Multimodal Carotid Risk Stratification with Large Vision-Language Models: Benchmarking, Fine-Tuning, and Clinical Insights


27. DMark: Order-Agnostic Watermarking for Diffusion Large Language Models


28. Flamed-TTS: Flow Matching Attention-Free Models for Efficient Generating and Dynamic Pacing Zero-shot Text-to-Speech


29. Evaluating Large Language Models for IUCN Red List Species Information


30. Dissecting Transformers: A CLEAR Perspective towards Green AI


31. Work Zones challenge VLM Trajectory Planning: Toward Mitigation and Robust Autonomous Driving


32. MaskCD: Mitigating LVLM Hallucinations by Image Head Masked Contrastive Decoding


33. Prototyping Digital Social Spaces through Metaphor-Driven Design: Translating Spatial Concepts into an Interactive Social Simulation


34. SAE-RNA: A Sparse Autoencoder Model for Interpreting RNA Language Model Representations


35. TravelBench : Exploring LLM Performance in Low-Resource Domains


36. A $1000\times$ Faster LLM-enhanced Algorithm For Path Planning in Large-scale Grid Maps


37. Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks


38. HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference


39. TutorBench: A Benchmark To Assess Tutoring Capabilities Of Large Language Models


40. Automatic Building Code Review: A Case Study


41. How Confident are Video Models? Empowering Video Models to Express their Uncertainty


42. Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback


43. ToolTweak: An Attack on Tool Selection in LLM-based Agents


44. Knowledge-Graph Based RAG System Evaluation Framework


45. PHORECAST: Enabling AI Understanding of Public Health Outreach Across Populations


46. Litespark Technical Report: High-Throughput, Energy-Efficient LLM Training Framework


47. CLARITY: Clinical Assistant for Routing, Inference, and Triage


48. Dynamic Target Attack


49. Glaucoma Detection and Structured OCT Report Generation via a Fine-tuned Multimodal Large Language Model


50. CWM: An Open-Weights LLM for Research on Code Generation with World Models


51. Pretraining with hierarchical memories: separating long-tail and common knowledge


52. A Hybrid CAPTCHA Combining Generative AI with Keystroke Dynamics for Enhanced Bot Detection


53. A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory


54. Training Dynamics of Parametric and In-Context Knowledge Utilization in Language Models


55. Beyond Manuals and Tasks: Instance-Level Context Learning for LLM Agents


56. A Cross-Lingual Analysis of Bias in Large Language Models Using Romanian History


57. Spiral of Silence in Large Language Model Agents


58. Emission-GPT: A domain-specific language model agent for knowledge retrieval, emission inventory and data analysis


59. DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding


60. Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark


61. Evaluating Bias in Spoken Dialogue LLMs for Real-World Decisions and Recommendations


62. Language, Culture, and Ideology: Personalizing Offensiveness Detection in Political Tweets with Reasoning LLMs


63. LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL


64. Small Language Models for Curriculum-based Guidance


65. Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression


66. $\texttt{BluePrint}$: A Social Media User Dataset for LLM Persona Evaluation and Training


67. CATMark: A Context-Aware Thresholding Framework for Robust Cross-Task Watermarking in Large Language Models


68. DRIFT: Learning from Abundant User Dissatisfaction in Real-World Preference Learning


69. Evaluating Uncertainty Quantification Methods in Argumentative Large Language Models


70. Optimizing Long-Form Clinical Text Generation with Claim-Based Rewards


71. CRACQ: A Multi-Dimensional Approach To Automated Document Assessment


72. FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory


73. Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing


74. Human Mobility Datasets Enriched With Contextual and Social Dimensions


75. Synthetic Dialogue Generation for Interactive Conversational Elicitation & Recommendation (ICER)


76. EntropyLong: Effective Long-Context Training via Predictive Uncertainty


77. SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification


78. AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering


79. KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI


80. Hallucination-Resistant, Domain-Specific Research Assistant with Self-Evaluation and Vector-Grounded Retrieval


81. Agentic-AI Healthcare: Multilingual, Privacy-First Framework with MCP Agents


82. Hallucination reduction with CASAL: Contrastive Activation Steering For Amortized Learning


83. Modeling the Attack: Detecting AI-Generated Text by Quantifying Adversarial Perturbations