LLM 관련 주요 논문 - 2025-12-02

1. Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction


2. Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent


3. Hierarchical AI-Meteorologist: LLM-Agent System for Multi-Scale and Explainable Weather Forecast Reporting


4. OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning


5. Adapting Like Humans: A Metacognitive Agent with Test-time Reasoning


6. AgriCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture


7. Evolutionary Discovery of Heuristic Policies for Traffic Signal Control


8. Does Self-Evaluation Enable Wireheading in Language Models?


9. TIM-PRM: Verifying multimodal reasoning with Tool-Integrated PRM


10. ORION: Teaching Language Models to Reason Efficiently in the Language of Thought


11. InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents


12. Solving Context Window Overflow in AI Agents


13. Geometrically-Constrained Agent for Spatial Reasoning


14. AI Deception: Risks, Dynamics, and Controls


15. DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning


16. Structured Extraction from Business Process Diagrams Using Vision-Language Models


17. Swarms of Large Language Model Agents for Protein Sequence Design with Experimental Validation


18. Enhanced Conditional Generation of Double Perovskite by Knowledge-Guided Language Model Feedback


19. RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems


20. Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation


21. WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios


22. Hybrid Stackelberg Game and Diffusion-based Auction for Two-tier Agentic AI Task Offloading in Internet of Agents


23. Real-Time Procedural Learning From Experience for AI Agents


24. Pathology-Aware Prototype Evolution via LLM-Driven Semantic Disambiguation for Multicenter Diabetic Retinopathy Diagnosis


25. Evaluating Strategies for Synthesizing Clinical Notes for Medical Multimodal AI


26. The Price of Progress: Algorithmic Efficiency and the Falling Cost of AI Inference


27. Evaluating LLMs for One-Shot Patching of Real and Artificial Vulnerabilities


28. Towards Improving Interpretability of Language Model Generation through a Structured Knowledge Discovery Approach


29. Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models


30. Toward Automatic Safe Driving Instruction: A Large-Scale Vision Language Model Approach


31. Obstruction reasoning for robotic grasping


32. Automated Generation of MDPs Using Logic Programming and LLMs for Robotic Applications


33. Multi-chain Graph Refinement and Selection for Reliable Reasoning in Large Language Models


34. Mind Reading or Misreading? LLMs on the Big Five Personality Test


35. SpaceMind: Camera-Guided Modality Fusion for Spatial Reasoning in Vision-Language Models


36. Conveying Imagistic Thinking in TCM Translation: A Prompt Engineering and LLM-Based Evaluation Framework


37. From Illusion to Intention: Visual Rationale Learning for Vision-Language Reasoning


38. AgentShield: Make MAS more secure and efficient


39. Leveraging Textual Compositional Reasoning for Robust Change Captioning


40. Serving Heterogeneous LoRA Adapters in Distributed LLM Inference Systems


41. Improving Robotic Manipulation Robustness via NICE Scene Surgery


42. VeriDispatcher: Multi-Model Dispatching through Pre-Inference Difficulty Prediction for RTL Generation Optimization


43. All Centers Are at most a Few Tokens Apart: Knowledge Distillation with Domain Invariant Prompt Tuning


44. ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering


45. CoFiRec: Coarse-to-Fine Tokenization for Generative Recommendation


46. Test-time scaling of diffusions with flow maps


47. Automated Design Optimization via Strategic Search with Large Language Models


48. HarmoCLIP: Harmonizing Global and Regional Representations in Contrastive Vision-Language Models


49. Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization


50. CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving


51. Exploring Performance Variations in Finetuned Translators of Ultra-Low Resource Languages: Do Linguistic Differences Matter?


52. GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents


53. Mapping Clinical Doubt: Locating Linguistic Uncertainty in LLMs


54. SuRe: Surprise-Driven Prioritised Replay for Continual LLM Learning


55. BINDER: Instantly Adaptive Mobile Manipulation with Open-Vocabulary Commands


56. Edge Deployment of Small Language Models, a comprehensive comparison of CPU, GPU and NPU backends


57. Evaluating Embedding Models and Pipeline Optimization for AI Search Quality


58. DeepPNI: Language- and graph-based model for mutation-driven protein-nucleic acid energetics


59. From Compound Figures to Composite Understanding: Developing a Multi-Modal LLM from Biomedical Literature with Medical Multiple-Image Benchmarking and Validation


60. Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information


61. A Theoretically Grounded Hybrid Ensemble for Reliable Detection of LLM-Generated Text


62. Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs


63. Distillability of LLM Security Logic: Predicting Attack Success Rate of Outline Filling Attack via Ranking Regression


64. MedEyes: Learning Dynamic Visual Focus for Medical Progressive Diagnosis


65. AfriStereo: A Culturally Grounded Dataset for Evaluating Stereotypical Bias in Large Language Models


66. DialBench: Towards Accurate Reading Recognition of Pointer Meter using Large Foundation Models


67. Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs


68. Toward Automated and Trustworthy Scientific Analysis and Visualization with LLM-Generated Code


69. LLM-Empowered Event-Chain Driven Code Generation for ADAS in SDV systems


70. Improving Score Reliability of Multiple Choice Benchmarks with Consistency Evaluation and Altered Answer Choices


71. FLAWS: A Benchmark for Error Identification and Localization in Scientific Papers


72. Tacit Bidder-Side Collusion: Artificial Intelligence in Dynamic Auctions


73. Factors That Support Grounded Responses in LLM Conversations: A Rapid Review


74. fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding


75. A Longitudinal Measurement of Privacy Policy Evolution for Large Language Models


76. Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs




79. Semantics as a Shield: Label Disguise Defense (LDD) against Prompt Injection in LLM Sentiment Classification


80. SO-Bench: A Structural Output Evaluation of Multimodal LLMs


81. Proactive Defense: Compound AI for Detecting Persuasion Attacks and Measuring Inoculation Effectiveness


82. Building Domain-Specific Small Language Models via Guided Data Generation


83. QuantumChem-200K: A Large-Scale Open Organic Molecular Dataset for Quantum-Chemistry Property Screening and Language Model Benchmarking


84. EduMod-LLM: A Modular Approach for Designing Flexible and Transparent Educational Assistants


85. Decoding inner speech with an end-to-end brain-to-text neural interface


86. The Rapid Growth of AI Foundation Model Usage in Science


87. Polarity-Aware Probing for Quantifying Latent Alignment in Language Models


88. R2Q: Towards Robust 2-Bit Large Language Models via Residual Refinement Quantization


89. Asking LLMs to Verify First is Almost Free Lunch


90. RoSA: Enhancing Parameter-Efficient Fine-Tuning via RoPE-aware Selective Adaptation in Large Language Models


91. HUMORCHAIN: Theory-Guided Multi-Stage Reasoning for Interpretable Multimodal Humor Generation


92. Identifying Quantum Structure in AI Language: Evidence for Evolutionary Convergence of Human and Artificial Cognition


93. A Benchmark for Procedural Memory Retrieval in Language Agents


94. Affective Multimodal Agents with Proactive Knowledge Grounding for Emotionally Aligned Marketing Dialogue


95. Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks


96. PromptTailor: Multi-turn Intent-Aligned Prompt Synthesis for Lightweight LLMs


97. German General Personas: A Survey-Derived Persona Prompt Collection for Population-Aligned LLM Studies


98. EulerESG: Automating ESG Disclosure Analysis with LLMs


99. Quantifying and Mitigating Selection Bias in LLMs: A Transferable LoRA Fine-Tuning and Efficient Majority Voting Approach


100. Lost in the Pipeline: How Well Do Large Language Models Handle Data Preparation?


101. A General Highly Accurate Online Planning Method Integrating Large Language Models into Nested Rollout Policy Adaptation for Dialogue Tasks


102. Evaluating Embedding Generalization: How LLMs, LoRA, and SLERP Shape Representational Geometry


103. CSV-Decode: Certifiable Sub-Vocabulary Decoding for Efficient Large Language Model Inference


104. Cacheback: Speculative Decoding With Nothing But Cache


105. TIP and Polish: Text-Image-Prototype Guided Multi-Modal Generation via Commonality-Discrepancy Modeling and Refinement


106. On the Role of Preference Variance in Preference Optimization


107. Temporal Consistency for LLM Reasoning Process Error Identification