LLM 관련 주요 논문 - 2025-11-25

1. That’s not natural: The Impact of Off-Policy Training Data on Probe Performance


2. Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism


3. The Belief-Desire-Intention Ontology for modelling mental reality and agency


4. Budget-Aware Tool-Use Enables Effective Agent Scaling


5. Fantastic Bugs and Where to Find Them in AI Benchmarks


6. Cognitive BASIC: An In-Model Interpreted Reasoning Language for LLMs


7. Stable diffusion models reveal a persisting human and AI gap in visual creativity


8. Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards


9. PersonaAgent with GraphRAG: Community-Aware Knowledge Graphs for Personalized LLM


10. REMSA: An LLM Agent for Foundation Model Selection in Remote Sensing


11. SMILE: A Composite Lexical-Semantic Metric for Question-Answering Evaluation


12. Beyond Multiple Choice: A Hybrid Framework for Unifying Robust Evaluation and Verifiable Reasoning Training


13. Large Language Models for Sentiment Analysis to Detect Social Challenges: A Use Case with South African Languages


14. Intervene-All-Paths: Unified Mitigation of LVLM Hallucinations across Alignment Formats


15. Parrot: Persuasion and Agreement Robustness Rating of Output Truth – A Sycophancy Robustness Benchmark for LLMs


16. Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models


17. The PLLuM Instruction Corpus


18. Device-Guided Music Transfer


19. Learning to Compress: Unlocking the Potential of Large Language Models for Text Representation


20. Why Do Language Model Agents Whistleblow?


21. OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding


22. CLLMRec: LLM-powered Cognitive-Aware Concept Recommendation via Semantic Alignment and Prerequisite Knowledge Distillation


23. Supervised Fine Tuning of Large Language Models for Domain Specific Knowledge Graph Construction:A Case Study on Hunan’s Historical Celebrities


24. The Finer the Better: Towards Granular-aware Open-set Domain Generalization


25. Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems


26. OmniGround: A Comprehensive Spatio-Temporal Grounding Benchmark for Real-World Complex Scenarios


27. Deep Improvement Supervision


28. ConCISE: A Reference-Free Conciseness Evaluation Metric for LLM-Generated Answers


29. WorldGen: From Text to Traversable and Interactive 3D Worlds


30. Mesh RAG: Retrieval Augmentation for Autoregressive Mesh Generation


31. Revisiting Multimodal KV Cache Compression: A Frequency-Domain-Guided Outlier-KV-Aware Approach


32. Revisiting Audio-language Pretraining for Learning General-purpose Audio Representation


33. SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge


34. Password Strength Analysis Through Social Network Data Exposure: A Combined Approach Relying on Data Reconstruction and Generative Models


35. AutoBackdoor: Automating Backdoor Attacks via LLM Agents


36. Large language models for automated PRISMA 2020 adherence checking


37. RAG-Driven Data Quality Governance for Enterprise ERP Systems


38. Detecting and Steering LLMs’ Empathy in Action


39. Hierarchical Retrieval with Out-Of-Vocabulary Queries: A Case Study on SNOMED CT


40. Falsely Accused: How AI Detectors Misjudge Slightly Polished Arabic Articles


41. Prompt-Based Value Steering of Large Language Models


42. How Well Do LLMs Understand Tunisian Arabic?


43. Bench360: Benchmarking Local LLM Inference from 360°