LLM 관련 주요 논문 - 2025-09-10

1. Probing the Preferences of a Language Model: Integrating Verbal and Behavioral Tests of AI Welfare


2. SCoder: Iterative Self-Distillation for Bootstrapping Small-Scale Data Synthesizers to Empower Code LLMs


3. Aligning LLMs for the Classroom with Knowledge-Based Retrieval – A Comparative RAG Study


4. Certainty-Guided Reasoning in Large Language Models: A Dynamic Thinking Budget Approach


5. FHIR-RAG-MEDS: Integrating HL7 FHIR with Retrieval-Augmented Large Language Models for Enhanced Medical Decision Support


6. Unleashing the True Potential of LLMs: A Feedback-Triggered Self-Correction with Long-Term Multipath Decoding


7. Getting In Contract with Large Language Models – An Agency Theory Perspective On Large Language Model Alignment


8. Transferable Direct Prompt Injection via Activation-Guided MCMC Sampling


9. SheetDesigner: MLLM-Powered Spreadsheet Layout Generation with Rule-Based and Vision-Based Reflection


10. Language Self-Play For Data-Free Training


11. Autonomous Code Evolution Meets NP-Completeness


12. Performative Thinking? The Brittle Correlation Between CoT Length and Problem Complexity


13. HealthSLM-Bench: Benchmarking Small Language Models for Mobile and Wearable Healthcare Monitoring



15. PaVeRL-SQL: Text-to-SQL via Partial-Match Rewards and Verbal Reinforcement Learning


16. From Eigenmodes to Proofs: Integrating Graph Spectral Operators with Symbolic Interpretable Reasoning


17. ImportSnare: Directed “Code Manual” Hijacking in Retrieval-Augmented Code Generation


18. Breaking Android with AI: A Deep Dive into LLM-Powered Exploitation


19. GENUINE: Graph Enhanced Multi-level Uncertainty Estimation for Large Language Models


20. Uncovering Scaling Laws for Large Language Models via Inverse Problems


21. Small Open Models Achieve Near Parity with Large Models in Low Resource Literary Translation at a Fraction of the Cost


22. Are LLMs Enough for Hyperpartisan, Fake, Polarized and Harmful Content Detection? Evaluating In-Context Learning vs. Fine-Tuning


23. What Were You Thinking? An LLM-Driven Large-Scale Study of Refactoring Motivations in Open-Source Projects


24. BALI: Enhancing Biomedical Language Representations through Knowledge Graph and Language Model Alignment


25. Towards Generalized Routing: Model and Agent Orchestration for Adaptive and Efficient Inference


26. $ΔL$ Normalization: Rethink Loss Aggregation in RLVR


27. Avoiding Knowledge Edit Skipping in Multi-hop Question Answering with Guided Decomposition


28. Competitive Audio-Language Models with Data-Efficient Single-Stage Training on Public Data


29. ALLabel: Three-stage Active Learning for LLM-based Entity Recognition using Demonstration Retrieval


30. Astra: A Multi-Agent System for GPU Kernel Performance Optimization


31. Fine-Tuning Vision-Language Models for Visual Navigation Assistance


32. HALT-RAG: A Task-Adaptable Framework for Hallucination Detection with Calibrated NLI Ensembles and Abstention


33. DepthVision: Robust Vision-Language Understanding through GAN-Based LiDAR-to-RGB Synthesis


34. Text2Touch: Tactile In-Hand Manipulation with LLM-Designed Reward Functions


35. The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward


36. Talking with Oompa Loompas: A novel framework for evaluating linguistic acquisition of LLM agents


37. Mitigating Attention Localization in Small Scale: Self-Attention Refinement via One-step Belief Propagation


38. Does This Look Familiar to You? Knowledge Analysis via Model Internal Representations


39. Paladin: Defending LLM-enabled Phishing Emails with a New Trigger-Tag Paradigm


40. Benchmarking Information Retrieval Models on Complex Retrieval Tasks


41. Systematic Optimization of Open Source Large Language Models for Mathematical Reasoning


42. A transformer-based generative model for planetary systems


43. DischargeSim: A Simulation Benchmark for Educational Doctor-Patient Communication at Discharge


44. Measuring Uncertainty in Transformer Circuits with Effective Information Consistency


45. Toward Purpose-oriented Topic Model Evaluation enabled by Large Language Models


46. SVGauge: Towards Human-Aligned Evaluation for SVG Generation


47. Automated Evaluation of Gender Bias Across 13 Large Multimodal Models


48. Methodological Insights into Structural Causal Modelling and Uncertainty-Aware Forecasting for Economic Indicators


49. 1 bit is all we need: binary normalized neural networks


50. Preventing Another Tessa: Modular Safety Middleware For Health-Adjacent AI Assistants


51. Human-in-the-Loop: Quantitative Evaluation of 3D Models Generation by Large Language Models


52. ArGen: Auto-Regulation of Generative AI via GRPO and Policy-as-Code


53. Not All Splits Are Equal: Rethinking Attribute Generalization Across Unrelated Categories


54. Visible Yet Unreadable: A Systematic Blind Spot of Vision Language Models Across Writing Systems


55. CARE: Decoding Time Safety Alignment via Rollback and Introspection Intervention


56. RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use


57. A Knowledge-Guided Cross-Modal Feature Fusion Model for Local Traffic Demand Prediction


58. VoltanaLLM: Feedback-Driven Frequency Control and State-Space Routing for Energy-Efficient LLM Serving