LLM 관련 주요 논문 - 2025-10-31

1. Unveiling Intrinsic Text Bias in Multimodal Large Language Models through Attention Key-Space Analysis


2. Delegated Authorization for Agents Constrained to Semantic Task-to-Scope Matching


3. The Era of Agentic Organization: Learning to Organize with Language Models


4. Normative Reasoning in Large Language Models: A Comparative Benchmark from Logical and Modal Perspectives


5. Agentic AI Home Energy Management System: A Large Language Model Framework for Residential Load Scheduling



7. Who Has The Final Say? Conformity Dynamics in ChatGPT’s Selections


8. MedSAE: Dissecting MedCLIP Representations with Sparse Autoencoders


9. Autograder+: A Multi-Faceted AI Framework for Rich Pedagogical Feedback in Programming Education


10. Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings


11. BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning


12. GraphCompliance: Aligning Policy and Context Graphs for LLM-Based Regulatory Compliance


13. Graph-Enhanced Policy Optimization in LLM Agent Training


14. Retrieval Augmented Generation-Enhanced Distributed LLM Agents for Generalizable Traffic Signal Control with Emergency Vehicles


15. Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses


16. One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning


17. The FM Agent


18. Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math


19. Beyond Benchmarks: The Economics of AI Inference


20. GUI Knowledge Bench: Revealing the Knowledge Gap Behind VLM Failures in GUI Tasks


21. Large Language Model-assisted Autonomous Vehicle Recovery from Immobilization


22. AutoSurvey2: Empowering Researchers with Next Level Automated Literature Surveys


23. From Queries to Insights: Agentic LLM Pipelines for Spatio-Temporal Text-to-SQL


24. Humains-Junior: A 3.8B Language Model Achieving GPT-4o-Level Factual Accuracy by Directed Exoskeleton Reasoning


25. FinOps Agent – A Use-Case for IT Infrastructure and Cost Optimization


26. SciTrust 2.0: A Comprehensive Framework for Evaluating Trustworthiness of Large Language Models in Scientific Applications


27. Approximating Human Preferences Using a Multi-Judge Learned System


28. Through the Judge’s Eyes: Inferred Thinking Traces Improve Reliability of LLM Raters


29. Symbolically Scaffolded Play: Designing Role-Sensitive Prompts for Generative NPC Dialogue


30. Gistify! Codebase-Level Understanding via Runtime Execution


31. Defeating the Training-Inference Mismatch via FP16


32. STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization


33. AMO-Bench: Large Language Models Still Struggle in High School Math Competitions


34. ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference


35. The End of Manual Decoding: Towards Truly End-to-End Language Models


36. Evontree: Ontology Rule-Guided Self-Evolution of Large Language Models


37. Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems


38. The Structure of Relation Decoding Linear Operators in Large Language Models


39. Inside CORE-KG: Evaluating Structured Prompting and Coreference Resolution for Knowledge Graphs


40. Simulating and Experimenting with Social Media Mobilization Using LLM Agents


41. Bayesian Network Fusion of Large Language Models for Sentiment Analysis


42. Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing


43. SecureReviewer: Enhancing Large Language Models for Secure Code Review through Secure-aware Fine-tuning


44. The Geometry of Dialogue: Graphing Language Models to Reveal Synergistic Teams for Multi-Agent Collaboration


45. MisSynth: Improving MISSCI Logical Fallacies Classification with Synthetic Data


46. From Amateur to Master: Infusing Knowledge into LLMs via Automated Curriculum Learning


47. Unravelling the Mechanisms of Manipulating Numbers in Language Models


48. Angular Steering: Behavior Control via Rotation in Activation Space


49. Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space


50. Hybrid LLM and Higher-Order Quantum Approximate Optimization for CSA Collateral Management


51. Towards Global Retrieval Augmented Generation: A Benchmark for Corpus-Level Reasoning


52. What’s In My Human Feedback? Learning Interpretable Descriptions of Preference Data


53. Don’t Let It Fade: Preserving Edits in Diffusion Language Models via Token Timestep Allocation


54. Linking Heterogeneous Data with Coordinated Agent Flows for Social Media Analysis


55. MV-MLM: Bridging Multi-View Mammography and Language for Breast Cancer Diagnosis and Risk Prediction


56. Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation


57. WOD-E2E: Waymo Open Dataset for End-to-End Driving in Challenging Long-tail Scenarios


58. Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism


59. Dynamic VLM-Guided Negative Prompting for Diffusion Models


60. SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning


61. Rethinking Cross-lingual Alignment: Balancing Transfer and Cultural Erasure in Multilingual LLMs


62. PORTool: Tool-Use LLM Training with Rewarded Tree


63. Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning


64. Revisiting Multilingual Data Mixtures in Language Model Pretraining


65. Evaluating the Impact of LLM-Assisted Annotation in a Perspectivized Setting: the Case of FrameNet Annotation


66. PRISM: Proof-Carrying Artifact Generation through LLM x MDE Synergy and Stratified Constraints


67. Metis-SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start


68. MemEIC: A Step Toward Continual and Compositional Knowledge Editing


69. zFLoRA: Zero-Latency Fused Low-Rank Adapters


70. Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets