[arXiv Digest] 2025-07-08

1. When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors

2. MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction

3. DoPI: Doctor-like Proactive Interrogation LLM for Traditional Chinese Medicine

4. Application and Evaluation of Large Language Models for Forecasting the Impact of Traffic Incidents

5. FurniMAS: Language-Guided Furniture Decoration using Multi-Agent System

6. LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction

7. Activation Steering for Chain-of-Thought Compression

8. ChipSeek-R1: Generating Human-Surpassing RTL with LLM via Hierarchical Reward-Driven Reinforcement Learning

9. Trojan Horse Prompting: Jailbreaking Conversational Multimodal Models by Forging Assistant Message

10. Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?

11. Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

12. All in One: Visual-Description-Guided Unified Point Cloud Segmentation

13. Train-before-Test Harmonizes Language Model Rankings

14. CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale

15. OpenS2S: Advancing Open-Source End-to-End Empathetic Large Speech Language Model

16. AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models

17. Interpretable Mnemonic Generation for Kanji Learning via Expectation-Maximization