[arXiv Digest] 2025-07-08

1. When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors

2. Modeling Latent Partner Strategies for Adaptive Zero-Shot Human-Agent Collaboration

3. SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity’s Last Exam?

4. MedGemma Technical Report

5. GIST: Cross-Domain Click-Through Rate Prediction via Guided Content-Behavior Distillation

6. Rule Learning for Knowledge Graph Reasoning under Agnostic Distribution Shift

7. How Rules Represent Causal Knowledge: Causal Modeling with Abductive Logic Programs

8. When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning

9. Supported Abstract Argumentation for Case-Based Reasoning

10. MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction

11. DoPI: Doctor-like Proactive Interrogation LLM for Traditional Chinese Medicine

12. Application and Evaluation of Large Language Models for Forecasting the Impact of Traffic Incidents

13. FurniMAS: Language-Guided Furniture Decoration using Multi-Agent System

14. LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction

15. Activation Steering for Chain-of-Thought Compression

16. ChipSeek-R1: Generating Human-Surpassing RTL with LLM via Hierarchical Reward-Driven Reinforcement Learning

17. LumiCRS: Asymmetric Contrastive Prototype Learning for Long-Tail Conversational Movie Recommendation

18. Advocate for Complete Benchmarks for Formal Reasoning with Formal/Informal Statements and Formal/Informal Proofs

19. Trojan Horse Prompting: Jailbreaking Conversational Multimodal Models by Forging Assistant Message

20. Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?

21. DisMS-TS: Eliminating Redundant Multi-Scale Features for Time Series Classification

22. Exploring Core and Periphery Precepts in Biological and Artificial Intelligence: An Outcome-Based Perspective

23. Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

24. From Marginal to Joint Predictions: Evaluating Scene-Consistent Trajectory Prediction Approaches for Automated Driving

25. Action Space Reduction Strategies for Reinforcement Learning in Autonomous Driving

26. CTA: Cross-Task Alignment for Better Test Time Training

27. All in One: Visual-Description-Guided Unified Point Cloud Segmentation

28. EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling

29. Train-before-Test Harmonizes Language Model Rankings

30. Infrastructuring Contestability: A Framework for Community-Defined AI Value Pluralism

31. CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale

32. OpenS2S: Advancing Open-Source End-to-End Empathetic Large Speech Language Model

33. Critiques of World Models

34. LAID: Lightweight AI-Generated Image Detection in Spatial and Spectral Domains

35. AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models

36. Effects of Unplanned Incoming Flights on Airport Relief Processes after a Major Natural Disaster

37. OGF: An Online Gradient Flow Method for Optimizing the Statistical Steady-State Time Averages of Unsteady Turbulent Flows

38. Interpretable Mnemonic Generation for Kanji Learning via Expectation-Maximization