LLM 관련 주요 논문 - 2025-08-29

1. SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control


2. Tracking World States with Language Models: State-Based Evaluation Using Chess


3. InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning


4. Instructional Agents: LLM Agents on Automated Course Material Generation for Teaching Faculties


5. ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding


6. Democracy-in-Silico: Institutional Design as Alignment in AI-Governed Polities


7. Caught in the Act: a mechanistic approach to detecting deception


8. SLIM: Subtrajectory-Level Elimination for More Effective Reasoning


9. Reliable Weak-to-Strong Monitoring of LLM Agents


10. Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs


11. Large Language Models (LLMs) for Electronic Design Automation (EDA)


12. Symphony: A Decentralized Multi-Agent Framework for Scalable Collective Intelligence


13. Decomposing Behavioral Phase Transitions in LLMs: Order Parameters for Emergent Misalignment


14. MathBuddy: A Multimodal System for Affective Math Tutoring


15. Diffusion Language Models Know the Answer Before Decoding


16. GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity


17. Dhati+: Fine-tuned Large Language Models for Arabic Subjectivity Evaluation


18. Logical Reasoning with Outcome Reward Models for Test-Time Scaling


19. AI-Powered Detection of Inappropriate Language in Medical School Curricula



21. PSO-Merging: Merging Models Based on Particle Swarm Optimization


22. Bootstrapping Learned Cost Models with Synthetic SQL Queries


23. NLKI: A lightweight Natural Language Knowledge Integration Framework for Improving Small VLMs in Commonsense VQA Tasks


24. Safety Alignment Should Be Made More Than Just A Few Attention Heads


25. Survey of Specialized Large Language Model


26. LFD: Layer Fused Decoding to Exploit External Knowledge in Retrieval-Augmented Generation


27. Towards a Holistic and Automated Evaluation Framework for Multi-Level Comprehension of LLMs in Book-Length Contexts


28. Generative Models for Synthetic Data: Transforming Data Mining in the GenAI Era


29. Just Because You Can, Doesn’t Mean You Should: LLMs for Data Fitting


30. Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference


31. Language Models Identify Ambiguities and Exploit Loopholes


32. Orchid: Orchestrating Context Across Creative Workflows with Generative AI


33. Learning Game-Playing Agents with Generative Code Optimization


34. Improving Low-Resource Translation with Dictionary-Guided Fine-Tuning and RL: A Spanish-to-Wayuunaiki Study


35. Automatic Question & Answer Generation Using Generative Large Language Model (LLM)



37. Bridging Language Gaps: Enhancing Few-Shot Language Adaptation


38. “She was useful, but a bit too optimistic”: Augmenting Design with Interactive Virtual Personas


39. A perishable ability? The future of writing in the face of generative artificial intelligence


40. One Joke to Rule them All? On the (Im)possibility of Generalizing Humor


41. Fine-Tuning Vision-Language Models for Neutrino Event Analysis in High-Energy Physics Experiments


42. Database Entity Recognition with Data Augmentation and Deep Learning


43. Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in multimodal LLMs


44. LongReasonArena: A Long Reasoning Benchmark for Large Language Models


45. Reflective Agreement: Combining Self-Mixture of Agents with a Sequence Tagger for Robust Event Extraction


46. AT-CXR: Uncertainty-Aware Agentic Triage for Chest X-rays


47. An Investigation on Group Query Hallucination Attacks


48. MIDAS: Multimodal Interactive Digital-humAn Synthesis via Real-time Autoregressive Video Generation


49. DemoBias: An Empirical Study to Trace Demographic Biases in Vision Foundation Models


50. Object Detection with Multimodal Large Vision-Language Models: An In-depth Review


51. Stand on The Shoulders of Giants: Building JailExpert from Previous Attack Experience


52. Seeing Like a Designer Without One: A Study on Unsupervised Slide Quality Assessment via Designer Cue Augmentation


53. Tricking LLM-Based NPCs into Spilling Secrets


54. Prompt-in-Content Attacks: Exploiting Uploaded Inputs to Hijack LLM Behavior


55. RL-Finetuned LLMs for Privacy-Preserving Synthetic Rewriting


56. CORE: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning


57. FLAIRR-TS – Forecasting LLM-Agents with Iterative Refinement and Retrieval for Time Series


58. POT: Inducing Overthinking in LLMs via Black-Box Iterative Optimization


59. Rethinking Reasoning in LLMs: Neuro-Symbolic Local RetoMaton Beyond ICL and CoT


60. Should LLMs be WEIRD? Exploring WEIRDness and Human Rights in Large Language Models


61. MultiPL-MoE: Multi-Programming-Lingual Extension of Large Language Models through Hybrid Mixture-of-Experts


62. Lossless Compression of Neural Network Components: Weights, Checkpoints, and K/V Caches in Low-Precision Formats


63. Real-Time Intuitive AI Drawing System for Collaboration: Enhancing Human Creativity through Formal and Contextual Intent Integration


64. Federated Fine-Tuning of Sparsely-Activated Large Language Models on Resource-Constrained Devices


65. MovieCORE: COgnitive REasoning in Movies