[arXiv Digest] 2025-07-02

1. Enhancing LLM Agent Safety via Causal Influence Prompting

2. Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

3. SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents

4. Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess

5. Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

6. ASTRO: Teaching Language Models to Reason by Reflecting and Backtracking In-Context

7. GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

8. Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations

9. Reasoning as an Adaptive Defense for Safety

10. From Sentences to Sequences: Rethinking Languages in Biological System

11. WebArXiv: Evaluating Multimodal Agents on Time-Invariant arXiv Tasks

12. Large Language Model Powered Intelligent Urban Agents: Concepts, Capabilities, and Applications

13. The Age of Sensorial Zero Trust: Why We Can No Longer Trust Our Senses

14. Stylometry recognizes human and LLM-generated texts in short samples

15. HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning

16. CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs

17. Many LLMs Are More Utilitarian Than One

18. LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing

19. SAFER: Probing Safety in Reward Models with Sparse Autoencoder

20. Generative Exaggeration in LLM Social Agents: Consistency, Bias, and Toxicity

21. Cognitive Load-Aware Inference: A Neuro-Symbolic Framework for Optimizing the Token Economy of Large Language Models

22. Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies

23. TUM-MiKaNi at SemEval-2025 Task 3: Towards Multilingual and Knowledge-Aware Non-factual Hallucination Identification

24. Not All Attention Heads Are What You Need: Refining CLIP’s Image Representation with Attention Ablation

25. Box-QAymo: Box-Referring VQA Dataset for Autonomous Driving

27. Twill: Scheduling Compound AI Systems on Heterogeneous Mobile Edge Platforms

28. Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and High-Performance GPUs

29. iPanda: An Intelligent Protocol Testing and Debugging Agent for Conformance Testing

30. An AST-guided LLM Approach for SVRF Code Synthesis

31. VTS-Guided AI Interaction Workflow for Business Insights