LLM 관련 주요 논문 - 2025-11-14

1. CrochetBench: Can Vision-Language Models Move from Describing to Doing in Crochet Domain?


2. The 2025 Planning Performance of Frontier Large Language Models


3. BarrierBench : Evaluating Large Language Models for Safety Verification in Dynamical Systems


4. Not Everything That Counts Can Be Counted: A Case for Safe Qualitative AI


5. From Model Training to Model Raising – A call to reform AI model training paradigms from post-hoc alignment to intrinsic, identity-based development


6. Efficient Reasoning via Reward Model


7. History-Aware Reasoning for GUI Agents


8. OR-R1: Automating Modeling and Solving of Operations Research Optimization Problem via Test-Time Reinforcement Learning


9. Advancing Autonomous Emergency Response Systems: A Generative AI Perspective


10. Solving a Million-Step LLM Task with Zero Errors


11. AI Founding Fathers: A Case Study of GIS Search in Multi-Agent Pipelines


12. AlphaCast: A Human Wisdom-LLM Intelligence Co-Reasoning Framework for Interactive Time Series Forecasting


13. Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds


14. UCO: A Multi-Turn Interactive Reinforcement Learning Method for Adaptive Teaching with Large Language Models


15. Bridging Natural Language and ASP: A Hybrid Approach Using LLMs and AMR Parsing


16. AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting


17. LLM-Guided Dynamic-UMAP for Personalized Federated Graph Learning


18. Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque


19. Self-Correcting Large Language Models: Generation vs. Multiple Choice


20. MTQ-Eval: Multilingual Text Quality Evaluation for Language Models


21. TaskSense: Cognitive Chain Modeling and Difficulty Estimation for GUI Tasks


22. C$^3$TG: Conflict-aware, Composite, and Collaborative Controlled Text Generation


23. Leveraging Large Language Models for Use Case Model Generation from Software Requirements


24. A Hybrid Search for Complex Table Question Answering in Securities Report


25. Enabling Agents to Communicate Entirely in Latent Space


26. LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls


27. Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning


28. Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment


29. Improving Sustainability of Adversarial Examples in Class-Incremental Learning


30. Tele-LLM-Hub: Building Context-Aware Multi-Agent LLM Systems for Telecom Networks


31. PAN: A World Model for General, Interactable, and Long-Horizon World Simulation


32. Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs


33. A Neurosymbolic Approach to Natural Language Formalization and Verification


34. Think, Remember, Navigate: Zero-Shot Object-Goal Navigation with VLM-Powered Reasoning


35. TiDAR: Think in Diffusion, Talk in Autoregression


36. iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification


37. Hallucinate or Memorize? The Two Sides of Probabilistic Learning in Large Language Models


38. Structured Uncertainty guided Clarification for LLM Agents


39. Benevolent Dictators? On LLM Agent Behavior in Dictator Games


40. Convergence dynamics of Agent-to-Agent Interactions with Misaligned objectives


41. Accelerating Training Speed of Tiny Recursive Models via Curriculum Guided Adaptive Recursion


42. Bio AI Agent: A Multi-Agent Artificial Intelligence System for Autonomous CAR-T Cell Therapy Development with Integrated Target Discovery, Toxicity Prediction, and Rational Molecular Design



44. Hope, Aspirations, and the Impact of LLMs on Female Programming Learners in Afghanistan


45. The LLM Pro Finance Suite: Multilingual Large Language Models for Financial Applications


46. Learn More, Forget Less: A Gradient-Aware Data Selection Approach for LLM


47. Reasoning on Time-Series for Financial Technical Analysis


48. Case Study: Transformer-Based Solution for the Automatic Digitization of Gas Plants


49. OKBench: Democratizing LLM Evaluation with Fully Automated, On-Demand, Open Knowledge Benchmarking


50. Self-HarmLLM: Can Large Language Model Harm Itself?


51. What About the Scene with the Hitler Reference? HAUNT: A Framework to Probe LLMs’ Self-consistency Via Adversarial Nudge


52. Chopping Trees: Semantic Similarity Based Dynamic Pruning for Tree-of-Thought Reasoning


53. The Collective Turing Test: Large Language Models Can Generate Realistic Multi-User Discussions


54. Conversational Agents for Building Energy Efficiency – Advising Housing Cooperatives in Stockholm on Reducing Energy Consumption