LLM 관련 주요 논문 - 2026-03-30

1. CADSmith: Multi-Agent CAD Generation with Programmatic Geometric Validation


2. AIRA_2: Overcoming Bottlenecks in AI Research Agents


3. GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation


4. Semi-Automated Knowledge Engineering and Process Mapping for Total Airport Management


5. AutoB2G: A Large Language Model-Driven Agentic Framework For Automated Building-Grid Co-Simulation


6. BeSafe-Bench: Unveiling Behavioral Safety Risks of Situated Agents in Functional Environments


7. Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification


8. Make Geometry Matter for Spatial Reasoning


9. Sustainability Is Not Linear: Quantifying Performance, Energy, and Privacy Trade-offs in On-Device Intelligence


10. Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering


11. How Open Must Language Models be to Enable Reliable Scientific Inference?


12. ALBA: A European Portuguese Benchmark for Evaluating Language and Linguistic Dimensions in Generative LLMs


13. JAL-Turn: Joint Acoustic-Linguistic Modeling for Real-Time and Robust Turn-Taking Detection in Full-Duplex Spoken Dialogue Systems


14. AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese


15. Rocks, Pebbles and Sand: Modality-aware Scheduling for Multimodal Large Language Model Inference


16. Automated near-term quantum algorithm discovery for molecular ground states


17. Generative Score Inference for Multimodal Data


18. Reflect to Inform: Boosting Multimodal Reasoning via Information-Gain-Driven Verification



20. Mitigating the Reasoning Tax in Vision-Language Fine-Tuning with Input-Adaptive Depth Aggregation


21. From Human Cognition to Neural Activations: Probing the Computational Primitives of Spatial Reasoning in LLMs


22. Label-Free Cross-Task LoRA Merging with Null-Space Compression


23. Distilling Conversations: Abstract Compression of Conversational Audio Context for LLM-based ASR


24. Automating Domain-Driven Design: Experience with a Prompting Framework


25. Clawed and Dangerous: Can We Trust Open Agentic Systems?


26. Towards GUI Agents: Vision-Language Diffusion Models for GUI Grounding


27. Sparse Auto-Encoders and Holism about Large Language Models


28. ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation


29. SWE-PRBench: Benchmarking AI Code Review Quality Against Pull Request Feedback


30. Finding Distributed Object-Centric Properties in Self-Supervised Transformers


31. SkinGPT-X: A Self-Evolving Collaborative Multi-Agent System for Transparent and Trustworthy Dermatological Diagnosis


32. “Oops! ChatGPT is Temporarily Unavailable!”: A Diary Study on Knowledge Workers’ Experiences of LLM Withdrawal


33. Selective Deficits in LLM Mental Self-Modeling in a Behavior-Based Test of Theory of Mind


34. H-Node Attack and Defense in Large Language Models


35. VLAgeBench: Benchmarking Large Vision-Language Models for Zero-Shot Human Age Estimation


36. FairLLaVA: Fairness-Aware Parameter-Efficient Fine-Tuning for Large Vision-Language Assistants


37. When Chain-of-Thought Backfires: Evaluating Prompt Sensitivity in Medical Language Models


38. Collision-Aware Vision-Language Learning for End-to-End Driving with Multimodal Infraction Datasets



40. Reinforcing Structured Chain-of-Thought for Video Understanding


41. DiReCT: Disentangled Regularization of Contrastive Trajectories for Physics-Refined Video Generation


42. On Integrating Resilience and Human Oversight into LLM-Assisted Modeling Workflows for Digital Twins


43. GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding


44. MAGNET: Autonomous Expert Model Generation via Decentralized Autoresearch and BitNet Training


45. ReCUBE: Evaluating Repository-Level Context Utilization in Code Generation


46. IncreRTL: Traceability-Guided Incremental RTL Generation under Requirement Evolution


47. UCAgent: An End-to-End Agent for Block-Level Functional Verification


48. ETA-VLA: Efficient Token Adaptation via Temporal Fusion and Intra-LLM Sparsification for Vision-Language-Action Models


49. Consistency Amplifies: How Behavioral Variance Shapes Agent Accuracy


50. Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models