LLM 관련 주요 논문 - 2026-03-16

1. Semantic Invariance in Agentic AI


2. Developing and evaluating a chatbot to support maternal health care


3. Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis and Dual-Track Knowledge Distillation


4. Structured Distillation for Personalized Agent Memory: 11x Token Reduction with Retrieval Preservation


5. Efficient and Interpretable Multi-Agent LLM Routing via Ant Colony Optimization


6. Context is all you need: Towards autonomous model-based process design using agentic AI in flowsheet simulations


7. ToolTree: Efficient LLM Agent Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning


8. AI Planning Framework for LLM-Based Web Agents


9. Visual-ERM: Reward Modeling for Visual Equivalence


10. From Experiments to Expertise: Scientific Knowledge Consolidation for AI-Driven Computational Research


11. LLM Constitutional Multi-Agent Governance


12. ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation


13. Developing the PsyCogMetrics AI Lab to Evaluate Large Language Models and Advance Cognitive Science – A Three-Cycle Action Design Science Study


14. Geometry-Guided Camera Motion Understanding in VideoLLMs


15. Evaluating VLMs’ Spatial Reasoning Over Robot Motion: A Step Towards Robot Planning with Motion Preferences


16. Human-in-the-Loop LLM Grading for Handwritten Mathematics Assessments


17. ARL-Tangram: Unleash the Resource Efficiency in Agentic Reinforcement Learning


18. Is Human Annotation Necessary? Iterative MBR Distillation for Error Span Detection in Machine Translation


19. Delta1 with LLM: symbolic and neural integration for credible and explainable reasoning


20. Learning from Child-Directed Speech in Two-Language Scenarios: A French-English Case Study


21. Human-Centered Evaluation of an LLM-Based Process Modeling Copilot: A Mixed-Methods Study with Domain Experts


22. Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models


23. Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation


24. Empowering Semantic-Sensitive Underwater Image Enhancement with VLM


25. Altered Thoughts, Altered Actions: Probing Chain-of-Thought Vulnerabilities in VLA Robotic Manipulation


26. Cost-Efficient Multimodal LLM Inference via Cross-Tier GPU Heterogeneity


27. Experimental evidence of progressive ChatGPT models self-convergence


28. MetaKE: Meta-learning Aligned Knowledge Editing via Bi-level Optimization


29. RetroReasoner: A Reasoning LLM for Strategic Retrosynthesis Prediction


30. From Text to Forecasts: Bridging Modality Gap with Temporal Evolution Semantic Space


31. Continual Learning in Large Language Models: Methods, Challenges, and Opportunities


32. LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing


33. Spend Less, Reason Better: Budget-Aware Value Tree Search for LLM Agents


34. Towards unified brain-to-text decoding across speech production and perception


35. VLM4Rec: Multimodal Semantic Representation for Recommendation with Large Vision-Language Models


36. When Drafts Evolve: Speculative Decoding Meets Online Learning


37. Literary Narrative as Moral Probe : A Cross-System Framework for Evaluating AI Ethical Reasoning and Refusal Behavior


38. Feynman: Knowledge-Infused Diagramming Agent for Scalable Visual Designs


39. AgentDrift: Unsafe Recommendation Drift Under Tool Corruption Hidden by Ranking Metrics in LLM Agents


40. Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages


41. Spatio-Semantic Expert Routing Architecture with Mixture-of-Experts for Referring Image Segmentation


42. LLM BiasScope: A Real-Time Bias Analysis Platform for Comparative LLM Evaluation


43. When LLM Judge Scores Look Good but Best-of-N Decisions Fail


44. Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies


45. TRACE: Temporal Rule-Anchored Chain-of-Evidence on Knowledge Graphs for Interpretable Stock Movement Prediction


46. Shattering the Shortcut: A Topology-Regularized Benchmark for Multi-hop Medical Reasoning in LLMs


47. Test-Time Strategies for More Efficient and Accurate Agentic RAG


48. SPARROW: Learning Spatial Precision and Temporal Referential Consistency in Pixel-Grounded Video MLLMs


49. Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection


50. VQQA: An Agentic Approach for Video Evaluation and Quality Improvement


51. Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency


52. Detecting Miscitation on the Scholarly Web through LLM-Augmented Text-Rich Graph Learning


53. Prompt Injection as Role Confusion


54. Aligning Language Models from User Interactions


55. Diagnosing Retrieval Bias Under Multiple In-Context Knowledge Updates in Large Language Models


56. Task-Specific Knowledge Distillation via Intermediate Probes