LLM 관련 주요 논문 - 2025-12-08

1. SymPyBench: A Dynamic Benchmark for Scientific Reasoning with Executable Python Code


2. TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models


3. PRiSM: An Agentic Multimodal Benchmark for Scientific Reasoning via Python-Grounded Evaluation


4. To Err Is Human: Systematic Quantification of Errors in Published AI Papers via LLM Analysis


5. Using Large Language Models to Create Personalized Networks From Therapy Sessions


6. The Missing Layer of AGI: From Pattern Alchemy to Coordination Physics


7. Evolutionary System 2 Reasoning: An Empirical Proof


8. Ontology Learning with LLMs: A Benchmark Study on Axiom Identification


9. MIND: Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large Models


10. The Seeds of Scheming: Weakness of Will in the Building Blocks of Agentic Systems


11. BEAVER: An Efficient Deterministic LLM Verifier


12. ChipMind: Retrieval-Augmented Reasoning for Long-Context Circuit Design Specifications


13. MCP-AI: Protocol-Driven Intelligence Framework for Autonomous Reasoning in Healthcare


14. Bridging Traditional Machine Learning and Large Language Models: A Two-Part Course Design for Modern AI Education


15. Semantic Faithfulness and Entropy Production Measures to Tame Your LLM Demons and Manage Hallucinations


16. Documenting SME Processes with Conversational AI: From Tacit Knowledge to BPMN


17. Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms


18. Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity


19. M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG


20. MaxShapley: Towards Incentive-compatible Generative Search with Fair Context Attribution


21. Trusted AI Agents in the Cloud


22. Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures


23. Optimizing Medical Question-Answering Systems: A Comparative Study of Fine-Tuned and Zero-Shot Large Language Models with RAG Framework


24. Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling


25. Mechanistic Interpretability of Antibody Language Models Using SAEs


26. Efficient Text Classification with Conformal In-Context Learning


27. Faithfulness metric fusion: Improving the evaluation of LLM trustworthiness across domains


28. Feasibility of AI-Assisted Programming for End-User Development


29. Grounded Multilingual Medical Reasoning for Question Answering with Large Language Models


30. 2K-Characters-10K-Stories: A Quality-Gated Stylized Narrative Dataset with Disentangled Control and Sequence Consistency


31. Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models


32. RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMs


33. Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction


34. Dynamic Alignment for Collective Agency: Toward a Scalable Self-Improving Framework for Open-Ended LLM Alignment


35. Knowing Your Uncertainty – On the application of LLM in social sciences


36. ArtistMus: A Globally Diverse, Artist-Centric Benchmark for Retrieval-Augmented Music Question Answering


37. A Systematic Framework for Enterprise Knowledge Retrieval: Leveraging LLM-Generated Metadata to Enhance RAG Systems


38. Simulating Life Paths with Digital Twins: AI-Generated Future Selves Influence Decision-Making and Expand Human Choice


39. Mitigating Self-Preference by Authorship Obfuscation


40. Please Don’t Kill My Vibe: Empowering Agents with Data Flow Control


41. The Effect of Document Summarization on LLM-Based Relevance Judgments


42. To Think or Not to Think: The Hidden Cost of Meta-Training with Excessive CoT Examples


43. The Erosion of LLM Signatures: Can We Still Distinguish Human and LLM-Generated Scientific Ideas After Iterative Paraphrasing?


44. Beyond Detection: A Comprehensive Benchmark and Study on Representation Learning for Fine-Grained Webshell Family Classification


45. From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model


46. XR-DT: Extended Reality-Enhanced Digital Twin for Agentic Mobile Robots


47. Learning to Code with Context: A Study-Based Approach


48. Towards A Cultural Intelligence and Values Inferences Quality Benchmark for Community Values and Common Knowledge


49. Semore: VLM-guided Enhanced Semantic Motion Representations for Visual Reinforcement Learning


50. How to Tame Your LLM: Semantic Collapse in Continuous Systems


51. ChromouVQA: Benchmarking Vision-Language Models under Chromatic Camouflaged Images


52. AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance


53. RAG-IGBench: Innovative Evaluation for RAG-based Interleaved Generation in Open-domain Question Answering