LLM 관련 주요 논문 - 2025-11-07

1. Outbidding and Outbluffing Elite Humans: Mastering Liar’s Poker via Self-Play and Reinforcement Learning


2. Towards Scalable Web Accessibility Audit with MLLMs as Copilots


3. From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers


4. Toward Autonomous Engineering Design: A Knowledge-Guided Multi-Agent Framework


5. A Proprietary Model-Based Safety Response Framework for AI Agents


6. Using Multi-modal Large Language Model to Boost Fireworks Algorithm’s Ability in Settling Challenging Optimization Tasks


7. Large language models require a new form of oversight: capability-based monitoring


8. SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators


9. Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge


10. No-Human in the Loop: Agentic Evaluation at Scale for Recommendation


11. PublicAgent: Multi-Agent Design Principles From an LLM-Based Open Data Analysis Framework


12. Grounded Misunderstandings in Asymmetric Dialogue: A Perspectivist Annotation Scheme for MapTask


13. AnaFlow: Agentic LLM-based Workflow for Reasoning-Driven Explainable and Sample-Efficient Analog Circuit Sizing


14. The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents


15. Whisper Leak: a side-channel attack on Large Language Models


16. Watermarking Large Language Models in Europe: Interpreting the AI Act in Light of Technology


17. LiveTradeBench: Seeking Real-World Alpha with Large Language Models


18. Step-Audio-EditX Technical Report


19. PerfDojo: Automated ML Library Generation for Heterogeneous Architectures


20. AILA–First Experiments with Localist Language Models


21. MultiZebraLogic: A Multilingual Logical Reasoning Benchmark


22. Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding


23. SOLVE-Med: Specialized Orchestration for Leading Vertical Experts across Medical Specialties


24. ROSBag MCP Server: Analyzing Robot Data with LLMs for Agentic Embodied AI Applications


25. CareMedEval dataset: Evaluating Critical Appraisal and Reasoning in the Biomedical Field


26. Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design-A2A, AP2, ERC-8004, and Beyond


27. Light over Heavy: Automated Performance Requirements Quantification with Linguistic Inducement


28. Computational Imaging Meets LLMs: Zero-Shot IDH Mutation Prediction in Brain Gliomas


29. Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models


30. Open Source State-Of-the-Art Solution for Romanian Speech Recognition


31. Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks


32. Comparing the Performance of LLMs in RAG-based Question-Answering: A Case Study in Computer Science Literature


33. Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification


34. LGM: Enhancing Large Language Models with Conceptual Meta-Relations and Iterative Retrieval


35. QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models


36. RefAgent: A Multi-agent LLM-based Framework for Automatic Software Refactoring


37. Who Sees the Risk? Stakeholder Conflicts and Explanatory Policies in LLM-based Risk Assessment


38. From Measurement to Expertise: Empathetic Expert Adapters for Context-Based Empathy in Conversational AI Agents


39. Control Barrier Function for Aligning Large Language Models


40. CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic


41. Reading Between the Lines: The One-Sided Conversation Problem


42. Systematizing LLM Persona Design: A Four-Quadrant Technical Taxonomy for AI Companion Applications


43. Zero-shot data citation function classification using transformer-based large language models (LLMs)


44. AgentSLA : Towards a Service Level Agreement for AI Agents


45. FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels


46. Analysis of AdvFusion: Adapter-based Multilingual Learning for Code Large Language Models


47. LM-Fix: Lightweight Bit-Flip Detection and Rapid Recovery Framework for Language Models


48. Mathematical exploration and discovery at scale


49. SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in Test-Time Scaling for Code Generation


50. Digital Transformation Chatbot (DTchatbot): Integrating Large Language Model-based Chatbot in Acquiring Digital Transformation Needs