LLM 관련 주요 논문 - 2025-10-08

1. Barbarians at the Gate: How AI is Upending Systems Research


2. Moloch’s Bargain: Emergent Misalignment When LLMs Compete for Audiences


3. Classical AI vs. LLMs for Decision-Maker Alignment in Health Insurance Choices


4. Constraint-Aware Route Recommendation from Natural Language via Hierarchical LLM Agents


5. Scientific Algorithm Discovery by Augmenting AlphaEvolve with Deep Research


6. Training-Free Time Series Classification via In-Context Reasoning with LLM Agents


7. Optimizing for Persuasion Improves LLM Generalization: Evidence from Quality-Diversity Evolution of Debate Strategies


8. ConstraintLLM: A Neuro-Symbolic Framework for Industrial-Level Constraint Programming


9. ARM: Discovering Agentic Reasoning Modules for Generalizable Multi-Agent Systems


10. Artificially intelligent agents in the social and behavioral sciences: A history and outlook


11. Syn-Diag: An LLM-based Synergistic Framework for Generalizable Few-shot Fault Diagnosis on the Edge


12. Joint Communication Scheduling and Velocity Control for Multi-UAV-Assisted Post-Disaster Monitoring: An Attention-Based In-Context Learning Approach


13. D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI


14. Large Language Model-Based Uncertainty-Adjusted Label Extraction for Artificial Intelligence Model Development in Upper Extremity Radiography


15. From Agentification to Self-Evolving Agentic AI for Wireless Networks: Concepts, Approaches, and Future Research Directions


16. In-the-Flow Agentic System Optimization for Effective Planning and Tool Use


17. Vul-R2: A Reasoning LLM for Automated Vulnerability Repair


18. VAL-Bench: Measuring Value Alignment in Language Models


19. AInstein: Assessing the Feasibility of AI-Generated Approaches to Research Problems


20. Biomedical reasoning in action: Multi-agent System for Auditable Biomedical Evidence Synthesis


21. BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions


22. Beyond Monolithic Rewards: A Hybrid and Multi-Aspect Reward Optimization for MLLM Alignment


23. Efficient Prediction of Pass@k Scaling in Large Language Models


24. Graph-based LLM over Semi-Structured Population Data for Dynamic Policy Response


25. Plug-and-Play Dramaturge: A Divide-and-Conquer Approach for Iterative Narrative Script Refinement via Collaborative LLM Agents


26. Lang-PINN: From Language to Physics-Informed Neural Networks via a Multi-Agent Framework


27. Structuring Reasoning for Complex Rules Beyond Flat Representations


28. Optimization Modeling via Semantic Anchored Alignment


29. Structured Cognition for Behavioral Intelligence in Large Language Model Agents: Preliminary Study


30. Rule Encoding and Compliance in Large Language Models: An Information-Theoretic Analysis


31. EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark


32. Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents


33. Automated Program Repair of Uncompilable Student Code


34. RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback


35. LLMs as Policy-Agnostic Teammates: A Case Study in Human Proxy Design for Heterogeneous Agent Teams


36. CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credits


37. Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation


38. Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models


39. Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability


40. Reasoning under Vision: Understanding Visual-Spatial Cognition in Vision-Language Models for CAPTCHA


41. VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization


42. CDTP: A Large-Scale Chinese Data-Text Pair Dataset for Comprehensive Evaluation of Chinese LLMs


43. Detection and Measurement of Hailstones with Multimodal Large Language Models


44. LexiCon: a Benchmark for Planning under Temporal Constraints in Natural Language


45. Probing the Difficulty Perception Mechanism of Large Language Models


46. EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models


47. LLM-FS-Agent: A Deliberative Role-based Large Language Model Architecture for Transparent Feature Selection


48. DACP: Domain-Adaptive Continual Pre-Training of Large Language Models for Phone Conversation Summarization


49. Mitigating Premature Exploitation in Particle-based Monte Carlo for Inference-Time Scaling


50. Data-efficient Targeted Token-level Preference Optimization for LLM-based Text-to-Speech


51. FinReflectKG - EvalBench: Benchmarking Financial KG with Multi-Dimensional Evaluation


52. Towards Reliable and Practical LLM Security Evaluations via Bayesian Modelling


53. Uncovering Representation Bias for Investment Decisions in Open-Source Large Language Models


54. Membership Inference Attacks on Tokenizers of Large Language Models


55. Code-Switching In-Context Learning for Cross-Lingual Transfer of Large Language Models


56. MADIAVE: Multi-Agent Debate for Implicit Attribute Value Extraction


57. HOI-R1: Exploring the Potential of Multimodal Large Language Models for Human-Object Interaction Detection


58. AutoPentester: An LLM Agent-based Framework for Automated Pentesting


59. AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents


60. Improving Chain-of-Thought Efficiency for Autoregressive Image Generation


61. Deciphering Invariant Feature Decoupling in Source-free Time Series Forecasting with Proxy Denoising


62. Domain-Shift-Aware Conformal Prediction for Large Language Models


63. Critical attention scaling in long-context transformers


64. Seeing the Big Picture: Evaluating Multimodal LLMs’ Ability to Interpret and Grade Handwritten Student Work


65. Provably Mitigating Corruption, Overoptimization, and Verbosity Simultaneously in Offline and Online RLHF/DPO Alignment


66. CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension


67. Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting


68. LANTERN: Scalable Distillation of Large Language Models for Job-Person Fit and Explanation


69. AMAQ: Adaptive Mixed-bit Activation Quantization for Collaborative Parameter Efficient Fine-tuning


70. Adversarial Reinforcement Learning for Large Language Model Agent Safety


71. UnitTenX: Generating Tests for Legacy Packages with AI Agents Powered by Formal Verification


72. Physics-Informed Machine Learning in Biomedical Science and Engineering


73. See the past: Time-Reversed Scene Reconstruction from Thermal Traces Using Visual Language Models


74. Context Length Alone Hurts LLM Performance Despite Perfect Retrieval


75. AutoDAN-Reasoning: Enhancing Strategies Exploration based Jailbreak Attacks with Test-Time Scaling


76. Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization


77. DeepV: A Model-Agnostic Retrieval-Augmented Framework for Verilog Code Generation with a High-Quality Knowledge Base


78. RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style Contexts


79. DP-Adam-AC: Privacy-preserving Fine-Tuning of Localizable Language Models Using Adam Optimization with Adaptive Clipping


80. CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers


81. A novel hallucination classification framework


82. OptPipe: Memory- and Scheduling-Optimized Pipeline Parallelism for LLM Training


83. Auditing Pay-Per-Token in Large Language Models


84. Emergent Coordination in Multi-Agent Language Models


85. From Poisoned to Aware: Fostering Backdoor Self-Awareness in LLMs


86. SATER: A Self-Aware and Token-Efficient Approach to Routing and Cascading


87. VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation


88. A Single Character can Make or Break Your LLM Evals


89. Chronological Thinking in Full-Duplex Spoken Dialogue Language Models


90. Every Step Counts: Decoding Trajectories as Authorship Fingerprints of dLLMs


91. Linguistic Characteristics of AI-Generated Text: A Survey


92. Training Large Language Models To Reason In Parallel With Global Forking Tokens


93. Rationale-Augmented Retrieval with Constrained LLM Re-Ranking for Task Discovery


94. Improving Metacognition and Uncertainty Communication in Language Models


95. Hallucination is Inevitable for LLMs with the Open World Assumption


96. Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices


97. COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning


98. Ads that Talk Back: Implications and Perceptions of Injecting Personalized Advertising into LLM Chatbots