LLM 관련 주요 논문 - 2025-11-06

1. Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything


2. When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning


3. LLM-Supported Formal Knowledge Representation for Enhancing Control Engineering Content with an Interactive Semantic Layer


4. CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents


5. DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning


6. The ORCA Benchmark: Evaluating Real-World Calculation Accuracy in Large Language Models


7. Knowledge Graph-enhanced Large Language Model for Incremental Game PlayTesting


8. Auditable-choice reframing unlocks RL-based verification for open-ended tasks


9. ReAcTree: Hierarchical LLM Agent Trees with Control Flow for Long-Horizon Task Planning


10. Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation


11. When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs


12. Deep Ideation: Designing LLM Agents to Generate Novel Research Ideas on Scientific Concept Network


13. TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data


14. Training Proactive and Personalized LLM Agents


15. Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration


16. Personalized Decision Modeling: Utility Optimization or Textualized-Symbolic Reasoning


17. InsurAgent: A Large Language Model-Empowered Agent for Simulating Individual Behavior in Purchasing Flood Insurance


18. Deep Value Benchmark: Measuring Whether Models Generalize Deep values or Shallow Preferences


19. Automated Reward Design for Gran Turismo


20. Human-AI Co-Embodied Intelligence for Scientific Experimentation and Manufacturing


21. MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning


22. AI Diffusion in Low Resource Language Countries


23. LLEXICORP: End-user Explainability of Convolutional Neural Networks


24. Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes


25. Apriel-H1: Towards Efficient Enterprise Reasoning Models


26. Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks


27. On The Dangers of Poisoned LLMs In Security Automation


28. Next Token Knowledge Tracing: Exploiting Pretrained LLM Representations to Decode Student Behaviour


29. Causal Graph Neural Networks for Healthcare


30. BRAINS: A Retrieval-Augmented System for Alzheimer’s Detection and Monitoring


31. Modeling Hawkish-Dovish Latent Beliefs in Multi-Agent Debate-Based LLMs for Monetary Policy Decision Classification


32. EvoDev: An Iterative Feature-Driven Framework for End-to-End Software Development with LLM-based Agents


33. AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models


34. AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda


35. Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation


36. The Sequential Edge: Inverse-Entropy Voting Beats Parallel Self-Consistency at Matched Compute


37. LA-MARRVEL: A Knowledge-Grounded and Language-Aware LLM Reranker for AI-MARRVEL in Rare Disease Diagnosis


38. Demo: Statistically Significant Results On Biases and Errors of LLMs Do Not Guarantee Generalizable Results


39. LACY: A Vision-Language Model-based Language-Action Cycle for Self-Improving Robotic Manipulation


40. Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live


41. Open the Oyster: Empirical Evaluation and Improvement of Code Reasoning Confidence in LLMs


42. Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models


43. Metamorphic Testing of Large Language Models for Natural Language Processing


44. Watermarking Discrete Diffusion Language Models


45. Regularization Through Reasoning: Systematic Improvements in Language Model Classification via Explanation-Enhanced Fine-Tuning


46. Shared Parameter Subspaces and Cross-Task Linearity in Emergently Misaligned Behavior


47. InteracSPARQL: An Interactive System for SPARQL Query Refinement Using Natural Language Explanations


48. TRACE: Textual Reasoning for Affordance Coordinate Extraction


49. Vibe Learning: Education in the age of AI


50. Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing


51. Detecting Vulnerabilities from Issue Reports for Internet-of-Things


52. Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR


53. Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch


54. iFlyBot-VLA Technical Report


55. EvoMem: Improving Multi-Agent Planning with Dual-Evolving Memory


56. Between Myths and Metaphors: Rethinking LLMs for SRH in Conservative Contexts


57. Thinking Like a Student: AI-Supported Reflective Planning in a Theory-Intensive Computer Science Course


58. LGCC: Enhancing Flow Matching Based Text-Guided Image Editing with Local Gaussian Coupling and Context Consistency


59. CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization


60. EdgeReasoning: Characterizing Reasoning LLM Deployment on Edge GPUs