LLM 관련 주요 논문 - 2025-12-24

1. LongVideoAgent: Multi-Agent Reasoning with Long Videos


2. Automated stereotactic radiosurgery planning using a human-in-the-loop reasoning large language model agent


3. Benchmarking LLMs for Predictive Applications in the Intensive Care Units


4. A DeepSeek-Powered AI System for Automated Chest Radiograph Interpretation in Clinical Practice


5. SynCraft: Guiding Large Language Models to Predict Edit Sequences for Molecular Synthesizability Optimization


6. Synthesizing Procedural Memory: Challenges and Architectures in Automated Workflow Generation


7. ActionFlow: A Pipelined Action Acceleration for Vision Language Models on Edge


8. Graph-Symbolic Policy Enforcement and Control (G-SPEC): A Neuro-Symbolic Framework for Safe Agentic AI in 5G Autonomous Networks


9. MemR$^3$: Memory Retrieval via Reflective Reasoning for LLM Agents


10. TongSIM: A General Platform for Simulating Intelligent Machines


11. Concept Generalization in Humans and Large Language Models: Insights from the Number Game


12. Enhancing Zero-Shot Time Series Forecasting in Off-the-Shelf LLMs via Noise Injection


13. MolAct: An Agentic RL Framework for Molecular Editing and Property Optimization


14. Adaptive Financial Sentiment Analysis for NIFTY 50 via Instruction-Tuned LLMs , RAG and Reinforcement Learning Approaches


15. Reason2Decide: Rationale-Driven Multi-Task Learning


16. Scaling Reinforcement Learning for Content Moderation with Large Language Models


17. S$^3$IT: A Benchmark for Spatially Situated Social Intelligence Test


18. Interpolative Decoding: Exploring the Spectrum of Personality Traits in LLMs


19. PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research


20. Cube Bench: A Benchmark for Spatial Visual Reasoning in MLLMs


21. Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs


22. SweRank+: Multilingual, Multi-Turn Code Ranking for Software Issue Localization


23. Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen’s Kappa and Semantic Similarity for Qualitative Research Validation


24. Toward Explaining Large Language Models in Software Engineering Tasks


25. TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning


26. KnowVal: A Knowledge-Augmented and Value-Guided Autonomous Driving System


27. Memory as Resonance: A Biomimetic Architecture for Infinite Context Memory on Ergodic Phonetic Manifolds


28. Corpus of Cross-lingual Dialogues with Minutes and Detection of Misunderstandings


29. Asynchronous Fast-Slow Vision-Language-Action Policies for Whole-Body Robotic Manipulation


30. FaithLens: Detecting and Explaining Faithfulness Hallucination


31. Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography


32. AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications


33. AXIOM: Benchmarking LLM-as-a-Judge for Code via Rule-Based Perturbation and Multisource Quality Calibration


34. Fun-Audio-Chat Technical Report


35. M$^3$KG-RAG: Multi-hop Multimodal Knowledge Graph-enhanced Retrieval-Augmented Generation


36. ABBEL: LLM Agents Acting through Belief Bottlenecks Expressed in Language


37. Spatio-Temporal Graphs Beyond Grids: Benchmark for Maritime Anomaly Detection


38. QE-Catalytic: A Graph-Language Multimodal Base Model for Relaxed-Energy Prediction in Catalytic Adsorption


39. CBA: Communication-Bound-Aware Cross-Domain Resource Assignment for Pipeline-Parallel Distributed LLM Training in Dynamic Multi-DC Optical Networks


40. On the Effectiveness of Instruction-Tuning Local LLMs for Identifying Software Vulnerabilities


41. Schoenfeld’s Anatomy of Mathematical Reasoning by Language Models


42. Neuron-Guided Interpretation of Code LLMs: Where, Why, and How?


43. Conditional Adversarial Fragility in Financial Machine Learning under Macroeconomic Stress


44. Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning


45. Demystifying LLM-as-a-Judge: Analytically Tractable Model for Inference-Time Scaling


46. Fine-Tuned In-Context Learners for Efficient Adaptation


47. HARMON-E: Hierarchical Agentic Reasoning for Multimodal Oncology Notes to Extract Structured Data


48. UCCL-EP: Portable Expert-Parallel Communication


49. A Declarative Language for Building And Orchestrating LLM-Powered Agent Workflows


50. Attention Distance: A Novel Metric for Directed Fuzzing with Large Language Models


51. QMBench: A Research Level Benchmark for Quantum Materials Research


52. Large Language Models for EDA Cloud Job Resource and Lifetime Prediction


53. Automated Fault Detection in 5G Core Networks Using Large Language Models


54. Brain-Grounded Axes for Reading and Steering LLM States