LLM 관련 주요 논문 - 2025-12-25

1. RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic


2. A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care


3. Beyond Context: Large Language Models Failure to Grasp Users Intent


4. LLM Personas as a Substitute for Field Experiments in Method Benchmarking


5. Agentic Explainable Artificial Intelligence (Agentic XAI) Approach To Explore Better Explanation


6. TrafficSimAgent: A Hierarchical Agent Framework for Autonomous Traffic Simulation with MCP Control


7. The Silent Scholar Problem: A Probabilistic Framework for Breaking Epistemic Asymmetry in LLM Agents


8. MAR:Multi-Agent Reflexion Improves Reasoning Abilities in LLMs


9. Safety Alignment of LMs via Non-cooperative Games


10. A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents


11. AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent


12. Eidoku: A Neuro-Symbolic Verification Gate for LLM Reasoning via Structural Constraint Satisfaction


13. Quantifying Laziness, Decoding Suboptimality, and Context Degradation in Large Language Models


14. From Fake Focus to Real Precision: Confusion-Driven Adversarial Attention Learning in Transformers


15. AI-Driven Decision-Making System for Hiring Process


16. Memory Bear AI A Breakthrough from Memory to Cognition Toward Artificial General Intelligence


17. AIAuditTrack: A Framework for AI Security system


18. Reasoning Relay: Evaluating Stability and Interchangeability of Large Language Models in Mathematical Reasoning


19. MicroProbe: Efficient Reliability Assessment for Foundation Models with Minimal Data


20. MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation


21. BitRL-Light: 1-bit LLM Agents with Deep Reinforcement Learning for Energy-Efficient Smart Home Lighting Optimization


22. C2LLM Technical Report: A New Frontier in Code Retrieval via Adaptive Cross-Attention Pooling


23. Measuring all the noises of LLM Evals


24. Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Consulting, Data Analyst, and Management Tasks


25. SMART SLM: Structured Memory and Reasoning Transformer, A Small Language Model for Accurate Document Assistance


26. LookPlanGraph: Embodied Instruction Following Method with VLM Graph Augmentation


27. Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking


28. SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation


29. AutoBaxBuilder: Bootstrapping Code Security Benchmarking


30. Semi-Supervised Learning for Large Language Models Safety and Content Moderation


31. Semantic Refinement with LLMs for Graph Representations


32. Policy-Conditioned Policies for Multi-Agent Task Solving


33. Rethinking Supervised Fine-Tuning: Emphasizing Key Answer Tokens for Improved LLM Accuracy


34. LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics


35. Distilling the Essence: Efficient Reasoning Distillation via Sequence Truncation


36. Automatic Replication of LLM Mistakes in Medical Conversations


37. GenTSE: Enhancing Target Speaker Extraction via a Coarse-to-Fine Generative Language Model


38. Mesh-Attention: A New Communication-Efficient Distributed Attention with Improved Data Locality


39. Can Agentic AI Match the Performance of Human Data Scientists?


40. One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents


41. Reflection Pretraining Enables Token-Level Self-Correction in Biological Sequence Models


42. MultiMind at SemEval-2025 Task 7: Crosslingual Fact-Checked Claim Retrieval via Multi-Source Alignment


43. Neural Probe-Based Hallucination Detection for Large Language Models


44. Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning


45. RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks


46. Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning


47. NotSoTiny: A Large, Living Benchmark for RTL Code Generation


48. MediEval: A Unified Medical Benchmark for Patient-Contextual and Knowledge-Grounded Reasoning in LLMs


49. X-GridAgent: An LLM-Powered Agentic AI System for Assisting Power Grid Analysis


50. Generalization of RLVR Using Causal Reasoning as a Testbed


51. PHOTON: Hierarchical Autoregressive Modeling for Lightspeed and Memory-Efficient Language Generation


52. Revisiting the Learning Objectives of Vision-Language Reward Models


53. HyDRA: Hierarchical and Dynamic Rank Adaptation for Mobile Vision Language Model


54. Managing the Stochastic: Foundations of Learning in Neuro-Symbolic Systems for Software Engineering


55. Uncovering Competency Gaps in Large Language Models and Their Benchmarks


56. Data-Free Pruning of Self-Attention Layers in LLMs


57. Real Time Detection and Quantitative Analysis of Spurious Forgetting in Continual Learning


58. Enhancing Lung Cancer Treatment Outcome Prediction through Semantic Feature Engineering Using Large Language Models


59. Learning Evolving Latent Strategies for Multi-Agent Language Systems without Model Fine-Tuning


60. Efficient Asynchronous Federated Evaluation with Strategy Similarity Awareness for Intent-Based Networking in Industrial Internet of Things