LLM 관련 주요 논문 - 2026-01-27

1. Empowering Medical Equipment Sustainability in Low-Resource Settings: An AI-Powered Diagnostic and Support Platform for Biomedical Technicians


2. Spatial-Agent: Agentic Geo-spatial Reasoning with Scientific Core Concepts


3. AgentDrive: An Open Benchmark Dataset for Agentic AI Reasoning with LLM-Generated Scenarios in Autonomous Systems


4. Reasoning Promotes Robustness in Theory of Mind Tasks


5. AgentsEval: Clinically Faithful Evaluation of Medical Imaging Reports via Multi-Agent Reasoning


6. LUMINA: Long-horizon Understanding for Multi-turn Interactive Agents


7. LLM is Not All You Need: A Systematic Evaluation of ML vs. Foundation Models for text and image based Medical Classification


8. SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care


9. Doc2AHP: Inferring Structured Multi-Criteria Decision Models via Semantic Trees with LLMs


10. SemanticALLI: Caching Reasoning, Not Just Responses, in Agentic Systems


11. When Agents Fail to Act: A Diagnostic Framework for Tool Invocation Reliability in Multi-Agent LLM Systems


12. A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs


13. GRIP: Algorithm-Agnostic Machine Unlearning for Mixture-of-Experts via Geometric Router Constraints


14. Evaluating Large Vision-language Models for Surgical Tool Detection


15. LLM-Based Adversarial Persuasion Attacks on Fact-Checking Systems


16. Privacy in Human-AI Romantic Relationships: Concerns, Boundaries, and Agency


17. Trapped in the past? Disentangling fluid and crystallized intelligence of large language models using chess


18. GTA: Generative Traffic Agents for Simulating Realistic Mobility Behavior


19. Do LLM hallucination detectors suffer from low-resource effect?


20. Standardizing Longitudinal Radiology Report Evaluation via Large Language Model Annotation



22. Revisiting the Role of Natural Language Code Comments in Code Translation


23. Attention-MoA: Enhancing Mixture-of-Agents via Inter-Agent Semantic Attention and Deep Residual Synthesis


24. CORD: Bridging the Audio-Text Reasoning Gap via Weighted On-policy Cross-modal Distillation


25. Do Models Hear Like Us? Probing the Representational Alignment of Audio LLMs and Naturalistic EEG


26. TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning


27. SafeThinker: Reasoning about Risk to Deepen Safety Beyond Shallow Alignment


28. MRAG: Benchmarking Retrieval-Augmented Generation for Bio-medicine


29. EvoConfig: Self-Evolving Multi-Agent Systems for Efficient Autonomous Environment Configuration


30. Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic


31. Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding


32. AlphaFace: High Fidelity and Real-time Face Swapper Robust to Facial Pose


33. Jacobian Scopes: token-level causal attributions in LLMs


34. ResAgent: Entropy-based Prior Point Discovery and Visual Reasoning for Referring Expression Segmentation


35. Cross-Lingual Activation Steering for Multilingual Language Models


36. Cognitively-Inspired Tokens Overcome Egocentric Bias in Multimodal Models


37. NOIR: Privacy-Preserving Generation of Code with Open-Source LLMs


38. Regional Bias in Large Language Models


39. Machine-Assisted Grading of Nationwide School-Leaving Essay Exams with LLMs and Statistical NLP



41. Generating Literature-Driven Scientific Theories at Scale


42. Better as Generators Than Classifiers: Leveraging LLMs and Synthetic Data for Low-Resource Multilingual Classification


43. GameTalk: Training LLMs for Strategic Conversation


44. VibeTensor: System Software for Deep Learning, Fully Generated by AI Agents


45. SoundBreak: A Systematic Study of Audio-Only Adversarial Attacks on Trimodal Models


46. Zero-Shot Speech LLMs for Multi-Aspect Evaluation of L2 Speech: Challenges and Opportunities


47. ES4R: Speech Encoding Based on Prepositive Affective Modeling for Empathetic Response Generation


48. Domain Specific Specialization in Low-Resource Settings: The Efficacy of Offline Response-Based Knowledge Distillation in Large Language Models


49. M3Kang: Evaluating Multilingual Multimodal Mathematical Reasoning in Vision-Language Models


50. ChiEngMixBench: Evaluating Large Language Models on Spontaneous and Natural Chinese-English Code-Mixed Generation