LLM 관련 주요 논문 - 2025-11-10

1. Cleaning Maintenance Logs with LLM Agents for Improved Predictive Maintenance


2. ORCHID: Orchestrated Retrieval-Augmented Classification with Human-in-the-Loop Intelligent Decision-Making for High-Risk Property


3. Real-Time Reasoning Agents in Evolving Environments


4. DMA: Online RAG Alignment with Human Feedback


5. SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models


6. TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework


7. What Are the Facts? Automated Extraction of Court-Established Facts from Criminal-Court Opinions


8. LiveStar: Live Streaming Assistant for Real-World Online Video Understanding


9. TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems


10. Model Merging Improves Zero-Shot Generalization in Bioacoustic Foundation Models


11. Generating Software Architecture Description from Source Code using Reverse Engineering and Large Language Model


12. UA-Code-Bench: A Competitive Programming Benchmark for Evaluating LLM Code Generation in Ukrainian


13. 8bit-GPT: Exploring Human-AI Interaction on Obsolete Macintosh Operating Systems


14. Pluralistic Behavior Suite: Stress-Testing Multi-Turn Adherence to Custom Behavioral Policies


15. Query Generation Pipeline with Enhanced Answerability Assessment for Financial Information Retrieval


16. Enhancing Public Speaking Skills in Engineering Students Through AI


17. Too Good to be Bad: On the Failure of LLMs to Role-Play Villains


18. A benchmark multimodal oro-dental dataset for large vision-language models


19. BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models


20. You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models


21. Software Defined Vehicle Code Generation: A Few-Shot Prompting Approach


22. PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference


23. Trustworthiness Calibration Framework for Phishing Email Detection Using Large Language Models


24. IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs


25. Learning to reason about rare diseases through retrieval-augmented agents


26. First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation



28. Jailbreaking in the Haystack


29. Prioritize Economy or Climate Action? Investigating ChatGPT Response Differences Based on Inferred Political Orientation


30. Measuring what Matters: Construct Validity in Large Language Model Benchmarks


31. Separate the Wheat from the Chaff: Winnowing Down Divergent Views in Retrieval Augmented Generation


32. multiMentalRoBERTa: A Fine-tuned Multiclass Classifier for Mental Health Disorder


33. Simulating Misinformation Vulnerabilities With Agent Personas


34. EncouRAGe: Evaluating RAG Local, Fast, and Reliable


35. Reasoning Up the Instruction Ladder for Controllable Language Models


36. Adaptive Testing for LLM Evaluation: A Psychometric Alternative to Static Benchmarks


37. Stateful KV Cache Management for LLMs: Balancing Space, Time, Accuracy, and Positional Fidelity