LLM 관련 주요 논문 - 2025-12-04

1. From Moderation to Mediation: Can LLMs Serve as Mediators in Online Flame Wars?


2. Invasive Context Engineering to Control Large Language Models


3. Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning


4. Radiologist Copilot: An Agentic Assistant with Orchestrated Tools for Radiology Reporting with Quality Control


5. AuditCopilot: Leveraging LLMs for Fraud Detection in Double-Entry Bookkeeping


6. StockMem: An Event-Reflection Memory Framework for Stock Forecasting


7. Menta: A Small Language Model for On-Device Mental Health Prediction


8. Training Data Attribution for Image Generation using Ontology-Aligned Knowledge Graphs


9. Learning What to Attend First: Modality-Importance-Guided Reasoning for Reliable Multimodal Emotion Understanding


10. Exploring Depth Generalization in Large Language Models for Solving Recursive Logic Tasks


11. PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing


12. COPE: Chain-Of-Thought Prediction Engine for Open-Source Large Language Model Based Stroke Outcome Prediction from Clinical Notes


13. Semantic Trading: Agentic AI for Clustering and Relationship Discovery in Prediction Markets


14. Synthetic Error Injection Fails to Elicit Self-Correction In Language Models


15. Beyond Playtesting: A Generative Multi-Agent Simulation System for Massively Multiplayer Online Games


16. Reasoning Path and Latent State Analysis for Multi-view Visual Spatial Reasoning: A Cognitive Science Perspective


17. OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning


18. DialogGuard: Multi-Agent Psychosocial Safety Evaluation of Sensitive LLM Responses


19. TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?


20. Benchmarking LLM Agents for Wealth-Management Workflows


21. STRIDE: A Systematic Framework for Selecting AI Modalities - Agentic AI, AI Assistants, or LLM Calls


22. Flowchart2Mermaid: A Vision-Language Model Powered System for Converting Flowcharts into Editable Diagram Code


23. The 4/$δ$ Bound: Designing Predictable LLM-Verifier Systems for Formal Method Guarantee


24. The Moral Consistency Pipeline: Continuous Ethical Evaluation for Large Language Models


25. LORE: A Large Generative Model for Search Relevance


26. TokenPowerBench: Benchmarking the Power Consumption of LLM Inference


27. Distribution-Calibrated Inference time compute for Thinking LLM-as-a-Judge


28. Fine-Tuned Large Language Models for Logical Translation: Reducing Hallucinations with Lang2Logic


29. Lumos: Let there be Language Model System Certification


30. In Silico Development of Psychometric Scales: Feasibility of Representative Population Data Simulation with LLMs


31. MRD: Multi-resolution Retrieval-Detection Fusion for High-Resolution Image Understanding


32. FAIRY2I: Universal Extremely-Low Bit QAT framework via Widely-Linear Representation and Phase-Aware Quantization


33. OptPO: Optimal Rollout Allocation for Test-time Policy Optimization


34. GraphMatch: Fusing Language and Graph Representations in a Dynamic Two-Sided Work Marketplace


35. Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages


36. ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning


37. A Comparative Study on How Data Normalization Affects Zero-Shot Generalization in Time Series Foundation Models


38. Phase-Adaptive LLM Framework with Multi-Stage Validation for Construction Robot Task Allocation: A Systematic Benchmark Against Traditional Optimization Algorithms


39. SurveyEval: Towards Comprehensive Evaluation of LLM-Generated Academic Surveys


40. Reasoning-Aware Multimodal Fusion for Hateful Video Detection


41. Emergent Bayesian Behaviour and Optimal Cue Combination in LLMs


42. An Empirical Survey of Model Merging Algorithms for Social Bias Mitigation


43. Beyond Single-Agent Safety: A Taxonomy of Risks in LLM-to-LLM Interactions


44. Graph VQ-Transformer (GVT): Fast and Accurate Molecular Generation via High-Fidelity Discrete Latents


45. CryptoQA: A Large-scale Question-answering Dataset for AI-assisted Cryptography


46. Feedback Loops and Code Perturbations in LLM-based Software Engineering: A Case Study on a C-to-Rust Translation System


47. From Panel to Pixel: Zoom-In Vision-Language Pretraining from Biomedical Scientific Literature


48. EZYer: A simulacrum of high school with generative agent


49. ADORE: Autonomous Domain-Oriented Relevance Engine for E-commerce


50. CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning


51. AskNearby: An LLM-Based Application for Neighborhood Information Retrieval and Personalized Cognitive-Map Recommendations


52. Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding


53. UCAgents: Unidirectional Convergence for Visual Evidence Anchored Multi-Agent Medical Decision-Making


54. When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents


55. Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing Resources


56. WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning


57. WISE: Weighted Iterative Society-of-Experts for Robust Multimodal Multi-Agent Debate


58. Process-Centric Analysis of Agentic Software Systems


59. Memory-Augmented Knowledge Fusion with Safety-Aware Decoding for Domain-Adaptive Question Answering


60. VACoT: Rethinking Visual Data Augmentation with VLMs


61. COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers


62. HealthContradict: Evaluating Biomedical Knowledge Conflicts in Language Models


63. Progressive Image Restoration via Text-Conditioned Video Generation


64. DETAIL Matters: Measuring the Impact of Prompt Specificity on Reasoning in Large Language Models


65. See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models


66. A Knowledge-Based Language Model: Deducing Grammatical Knowledge in a Multi-Agent Language Acquisition Simulation


67. Think Before You Prune: Self-Reflective Structured Pruning for Reasoning Language Models


68. Young Children’s Anthropomorphism of AI Chatbots and the Role of Parent Co-Presence


69. Large Language Model based Smart Contract Auditing with LLMBugScanner


70. Reversing Large Language Models for Efficient Training and Fine-Tuning


71. Deep Research: A Systematic Survey


72. Do Large Language Models Walk Their Talk? Measuring the Gap Between Implicit Associations, Self-Report, and Behavioral Altruism


73. Graphing the Truth: Structured Visualizations for Automated Hallucination Detection in LLMs