LLM 관련 주요 논문 - 2025-12-17

1. MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph


2. neuralFOMO: Can LLMs Handle Being Second Best? Measuring Envy-Like Preferences in Multi-Agent Settings


3. Behavior and Representation in Large Language Models for Combinatorial Optimization: From Feature Extraction to Algorithm Selection


4. Error-Driven Prompt Optimization for Arithmetic Reasoning


5. Reflective Preference Optimization (RPO): Enhancing On-Policy Alignment via Hint-Guided Reflection


6. Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows


7. SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning


8. Can AI Understand What We Cannot Say? Measuring Multilevel Alignment Through Abortion Stigma Across Cognitive, Interpersonal, and Structural Levels


9. Socratic Students: Teaching Language Models to Learn by Asking Questions


10. M-GRPO: Stabilizing Self-Supervised Reinforcement Learning for Large Language Models with Momentum-Anchored Policy Optimization


11. Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution


12. Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning


13. WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment


14. Memoria: A Scalable Agentic Memory Framework for Personalized Conversational AI


15. AgentSHAP: Interpreting LLM Agent Tool Importance with Monte Carlo Shapley Value Estimation


16. Large Language Newsvendor: Decision Biases and Cognitive Mechanisms


17. KidsArtBench: Multi-Dimensional Children’s Art Evaluation with Attribute-Aware MLLMs


18. AI Transparency Atlas: Framework, Scoring, and Real-Time Model Card Evaluation Pipeline


19. Feeling the Strength but Not the Source: Partial Introspection in LLMs


20. Floorplan2Guide: LLM-Guided Floorplan Parsing for BLV Indoor Navigation


21. Rethinking Label Consistency of In-Context Learning: An Implicit Transductive Label Propagation Perspective


22. The Forecast Critic: Leveraging Large Language Models for Poor Forecast Identification


23. Log Anomaly Detection with Large Language Models via Knowledge-Enriched Fusion


24. AGAPI-Agents: An Open-Access Agentic AI Platform for Accelerated Materials Design on AtomGPT.org


25. CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving


26. Robustness of Probabilistic Models to Low-Quality Data: A Multi-Perspective Analysis


27. Causal Strengths and Leaky Beliefs: Interpreting LLM Reasoning via Noisy-OR Causal Bayes Nets


28. Structured Personalization: Modeling Constraints as Matroids for Data-Minimal LLM Agents


29. A Monad-Based Clause Architecture for Artificial Age Score (AAS) in Large Language Models


30. Embedding-Based Rankings of Educational Resources based on Learning Outcome Alignment: Benchmarking, Expert Validation, and Learner Performance


31. Large-Language Memorization During the Classification of United States Supreme Court Cases


32. ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding


33. Memory in the Age of AI Agents


34. SkipCat: Rank-Maximized Low-Rank Compression of Large Language Models via Shared Projection and Block Skipping


35. Non-Resolution Reasoning: A Framework for Preserving Semantic Ambiguity in Language Models


36. From User Interface to Agent Interface: Efficiency Optimization of UI Representations for LLM Agents


37. FIN-bench-v2: A Unified and Robust Benchmark Suite for Evaluating Finnish Large Language Models


38. Security and Detectability Analysis of Unicode Text Watermarking Methods Against Large Language Models


39. MiniLingua: A Small Open-Source LLM for European Languages


40. Efficient Adaptive Rejection Sampling for Accelerating Speculative Decoding in Large Language Models


41. PolySet: Restoring the Statistical Ensemble Nature of Polymers for Machine Learning


42. Uncovering the Role of Initial Saliency in U-Shaped Attention Bias: Scaling Initial Token Weight for Enhanced Long-Text Processing


43. TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning


44. A Simple and Effective Framework for Symmetric Consistent Indexing in Large-Scale Dense Retrieval


45. LLM Rationalis? Measuring Bargaining Capabilities of AI Negotiators


46. GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training



48. Cisco Integrated AI Security and Safety Framework Report


49. CTIGuardian: A Few-Shot Framework for Mitigating Privacy Leakage in Fine-Tuned LLMs


50. SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition


51. Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM


52. Information-Consistent Language Model Recommendations through Group Relative Policy Optimization


53. Hindsight is 20/20: Building Agent Memory that Retains, Recalls, and Reflects


54. Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, LLaMA


55. A Disproof of Large Language Model Consciousness: The Necessity of Continual Learning for Consciousness


56. Beyond Task Completion: An Assessment Framework for Evaluating Agentic AI Systems


57. State over Tokens: Characterizing the Role of Reasoning Tokens


58. Adaptive Edge-Cloud Inference for Speech-to-Action Systems Using ASR and Large Language Models (ASTA)


59. Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches


60. DiG: Differential Grounding for Enhancing Fine-Grained Perception in Multimodal Large Language Model


61. ORIBA: Exploring LLM-Driven Role-Play Chatbot as a Creativity Support Tool for Original Character Artists


62. Understanding Syllogistic Reasoning in LLMs from Formal and Natural Language Perspectives


63. Human-Inspired Learning for Large Language Models via Obvious Record and Maximum-Entropy Method Discovery


64. Content-Aware Ad Banner Layout Generation with Two-Stage Chain-of-Thought in Vision Language Models


65. Detecting Prompt Injection Attacks Against Application Using Classifiers


66. Coupled Variational Reinforcement Learning for Language Model General Reasoning


67. StreamingAssistant: Efficient Visual Token Pruning for Accelerating Online Video Understanding


68. Diverse LLMs vs. Vulnerabilities: Who Detects and Fixes Them Better?


69. Explainable AI as a Double-Edged Sword in Dermatology: The Impact on Clinicians versus The Public


70. Mage: Cracking Elliptic Curve Cryptography with Cross-Axis Transformers


71. SCIR: A Self-Correcting Iterative Refinement Framework for Enhanced Information Extraction Based on Schema


72. V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval


73. Semantic Distance Measurement based on Multi-Kernel Gaussian Processes


74. Training Versatile Coding Agents in Synthetic Environments


75. Epistemoverse: Toward an AI-Driven Knowledge Metaverse for Intellectual Heritage Preservation



77. Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings


78. MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models


79. Rethinking Jailbreak Detection of Large Vision Language Models with Representational Contrastive Scoring


80. The Instability of Safety: How Random Seeds and Temperature Expose Inconsistent LLM Refusal Behavior


81. Instruction-Tuning Open-Weight Language Models for BPMN Model Generation


82. Hold Onto That Thought: Assessing KV Cache Compression On Reasoning


83. V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions


84. Semantic search for 100M+ galaxy images using AI-generated captions


85. How AI Agents Follow the Herd of AI? Network Effects, History, and Machine Optimism


86. DynaPURLS: Dynamic Refinement of Part-aware Representations for Skeleton-based Zero-Shot Action Recognition


87. The Agentic Regulator: Risks for AI in Finance and a Proposed Agent-based Framework for Governance


88. Evolutionary Reinforcement Learning based AI tutor for Socratic Interdisciplinary Instruction


89. FloraForge: LLM-Assisted Procedural Generation of Editable and Analysis-Ready 3D Plant Geometric Models For Agricultural Applications


90. A fine-grained look at causal effects in causal spaces


91. Advancing Autonomous Driving System Testing: Demands, Challenges, and Future Directions


92. An Experience Report on a Pedagogically Controlled, Curriculum-Constrained AI Tutor for SE Education


93. Understanding Structural Representation in Foundation Models for Polymers


94. WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving


95. KV Cache Recycling to Expand Usable Context Capacity in Low Parameter LLMs


96. Assessing Greenspace Attractiveness with ChatGPT, Claude, and Gemini: Do AI Models Reflect Human Perceptions?


97. The Ontological Dissonance Hypothesis: AI-Triggered Delusional Ideation as Folie a Deux Technologique


98. Enhancing Urban Visual Place Recognition for Crowdsourced Flood Imagery via LLM-Guided Attention


99. EMNLP: Educator-role Moral and Normative Large Language Models Profiling