[arXiv Digest] 2025-07-24

1. Thinking Isn’t an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations

2. Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks

3. Simulating multiple human perspectives in socio-ecological systems using large language models

4. Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning

5. An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models

6. Compliance Brain Assistant: Conversational Agentic AI for Assisting Compliance Tasks in Enterprise Environments

7. Agent Identity Evals: Measuring Agentic Identity

8. Improving LLMs’ Generalized Reasoning Abilities by Graph Problems

9. HySafe-AI: Hybrid Safety Architectural Analysis Framework for AI Systems: A Case Study

10. Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks

11. Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

12. AI Telephone Surveying: Automating Quantitative Data Collection with an AI Interviewer

13. From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes

14. CASCADE: LLM-Powered JavaScript Deobfuscator at Google

15. Enabling Cyber Security Education through Digital Twins and Generative AI

16. MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs

17. BGM-HAN: A Hierarchical Attention Network for Accurate and Fair Decision Assessment on Semi-Structured Profiles

18. Probing Vision-Language Understanding through the Visual Entailment Task: promises and pitfalls

19. Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning

20. Each to Their Own: Exploring the Optimal Embedding in RAG

21. HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs

22. Investigating Training Data Detection in AI Coders

23. DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning

24. A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model

25. Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance

26. Understanding Prompt Programming Tasks and Questions

27. A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task

28. The Pluralistic Moral Gap: Understanding Judgment and Value Differences between Humans and Large Language Models

29. DesignLab: Designing Slides Through Iterative Detection and Correction

30. LLM Meets the Sky: Heuristic Multi-Agent Reinforcement Learning for Secure Heterogeneous UAV Networks

31. SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs

32. Resilient Multi-Agent Negotiation for Medical Supply Chains:Integrating LLMs and Blockchain for Transparent Coordination

33. Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance

34. BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving

35. Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models