LLM 관련 주요 논문 - 2025-10-27

1. A Multimodal Benchmark for Framing of Oil & Gas Advertising and Potential Greenwashing Detection


2. DeepAgent: A General Reasoning Agent with Scalable Toolsets


3. Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine


4. Co-Sight: Enhancing LLM-Based Agents via Conflict-Aware Meta-Verification and Trustworthy Reasoning with Structured Facts


5. EU-Agent-Bench: Measuring Illegal Behavior of LLM Agents Under EU Law


6. AutoOpt: A Dataset and a Unified Framework for Automating Optimization Problem Solving


7. Advancing Symbolic Integration in Large Language Models: Beyond Conventional Neurosymbolic AI


8. Boosting Accuracy and Efficiency of Budget Forcing in LLMs via Reinforcement Learning for Mathematical Reasoning


9. Magellan: Guided MCTS for Latent Space Exploration and Novelty Generation


10. CXRAgent: Director-Orchestrated Multi-Stage Reasoning for Chest X-Ray Interpretation


11. Towards Reliable Code-as-Policies: A Neuro-Symbolic Framework for Embodied Task Planning


12. OutboundEval: A Dual-Dimensional Benchmark for Expert-Level Intelligent Outbound Evaluation of Xbench’s Professional-Aligned Series


13. Memory-Free Continual Learning with Null Space Adaptation for Zero-Shot Vision-Language Models


14. String Seed of Thought: Prompting LLMs for Distribution-Faithful and Diverse Generation


15. How to Auto-optimize Prompts for Domain Tasks? Adaptive Prompting and Reasoning through Evolutionary Domain Knowledge Adaptation


16. NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge


17. MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning


18. From Questions to Queries: An AI-powered Multi-Agent Framework for Spatial Text-to-SQL


19. Customizing Open Source LLMs for Quantitative Medication Attribute Extraction across Heterogeneous EHR Systems


20. Cultural Alien Sampler: Open-ended art generation balancing originality and coherence


21. Sketch2BIM: A Multi-Agent Human-AI Collaborative Pipeline to Convert Hand-Drawn Floor Plans to 3D BIM


22. The Universal Landscape of Human Reasoning


23. From Polyester Girlfriends to Blind Mice: Creating the First Pragmatics Understanding Benchmarks for Slovene


24. GranViT: A Fine-Grained Vision Model With Autoregressive Perception For MLLMs


25. REMONI: An Autonomous System Integrating Wearables and Multimodal Large Language Models for Enhanced Remote Health Monitoring


26. Does Model Size Matter? A Comparison of Small and Large Language Models for Requirements Classification


27. Vision Language Models for Dynamic Human Activity Recognition in Healthcare Settings


28. Large Language Models as Model Organisms for Human Associative Learning


29. REvolution: An Evolutionary Framework for RTL Generation driven by Large Language Models


30. HIKMA: Human-Inspired Knowledge by Machine Agents through a Multi-Agent Framework for Semi-Autonomous Scientific Conferences


31. TripTide: A Benchmark for Adaptive Travel Planning under Disruptions


32. A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization


33. Efficient semantic uncertainty quantification in language models via diversity-steered sampling


34. Sparser Block-Sparse Attention via Token Permutation


35. Correlation Dimension of Auto-Regressive Large Language Models


36. Securing AI Agent Execution


37. Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference


38. Quantifying CBRN Risk in Frontier Models


39. Large Language Models Meet Text-Attributed Graphs: A Survey of Integration Frameworks and Applications


40. Generalizable Hierarchical Skill Learning via Object-Centric Representation


41. The Gray Zone of Faithfulness: Taming Ambiguity in Unfaithfulness Detection


42. Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only


43. CDrugRed: A Chinese Drug Recommendation Dataset for Discharge Medications in Metabolic Diseases


44. On the Sample Complexity of Differentially Private Policy Optimization


45. Reasoning’s Razor: Reasoning Improves Accuracy but Can Hurt Recall at Critical Operating Points in Safety and Hallucination Detection


46. Race and Gender in LLM-Generated Personas: A Large-Scale Audit of 41 Occupations


47. VESSA: Video-based objEct-centric Self-Supervised Adaptation for Visual Foundation Models


48. Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression


49. REx86: A Local Large Language Model for Assisting in x86 Assembly Reverse Engineering


50. 3DReasonKnee: Advancing Grounded Reasoning in Medical Vision Language Models


51. Do LLMs Truly Understand When a Precedent Is Overruled?


52. Security Logs to ATT&CK Insights: Leveraging LLMs for High-Level Threat Understanding and Cognitive Trait Inference


53. Code-enabled language models can outperform reasoning models on diverse tasks


54. Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People


55. HA-RAG: Hotness-Aware RAG Acceleration via Mixed Precision and Data Placement


56. Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards