LLM 관련 주요 논문 - 2025-09-15

1. Abduct, Act, Predict: Scaffolding Causal Inference for Automated Failure Attribution in Multi-Agent Systems


2. The Morality of Probability: How Implicit Moral Biases in LLMs May Shape the Future of Human-AI Symbiosis


3. Investigating Language Model Capabilities to Represent and Process Formal Knowledge: A Preliminary Study to Assist Ontology Engineering


4. Towards Fully Automated Molecular Simulations: Multi-Agent Framework for Simulation Setup and Force Field Extraction


5. XAgents: A Unified Framework for Multi-Agent Cooperation via IF-THEN Rules and Multipolar Task Processing Graph


6. GAMA: A General Anonymizing Multi-Agent System for Privacy Preservation Enhanced by Domain Rules and Disproof Method


7. LLMs as Agentic Cooperative Players in Multiplayer UNO


8. Towards an AI-based knowledge assistant for goat farmers based on Retrieval-Augmented Generation


9. Towards a Common Framework for Autoformalization


10. How well can LLMs provide planning feedback in grounded environments?


11. Human-AI Collaboration Increases Efficiency in Regulatory Writing


12. Towards Understanding Visual Grounding in Visual Language Models


13. GLAM: Geometry-Guided Local Alignment for Multi-View VLP in Mammography


14. SignClip: Leveraging Mouthing Cues for Sign Language Translation by Multimodal Contrastive Fusion


15. SI-FACT: Mitigating Knowledge Conflict via Self-Improving Faithfulness-Aware Contrastive Tuning


16. Benchmark of stylistic variation in LLM-generated texts


17. Population-Aligned Persona Generation for LLM-based Social Simulation


18. Generating Energy-Efficient Code via Large-Language Models – Where are we now?


19. Established Psychometric vs. Ecologically Valid Questionnaires: Rethinking Psychological Assessments in Large Language Models


20. Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration


21. Unsupervised Hallucination Detection by Inspecting Reasoning Processes


22. Securing LLM-Generated Embedded Firmware through AI Agent-Driven Validation and Patching



24. Limited Reference, Reliable Generation: A Two-Component Framework for Tabular Data Generation in Low-Data Regimes


25. SmartCoder-R1: Towards Secure and Explainable Smart Contract Generation with Security-Aware Group Relative Policy Optimization


26. WALL: A Web Application for Automated Quality Assurance using Large Language Models


27. Tackling One Health Risks: How Large Language Models are leveraged for Risk Negotiation and Consensus-building


28. Emulating Public Opinion: A Proof-of-Concept of AI-Generated Synthetic Survey Responses for the Chilean Case


29. Vibe Check: Understanding the Effects of LLM-Based Conversational Agents’ Personality and Alignment on User Perceptions in Goal-Oriented Tasks


30. Latency and Token-Aware Test-Time Compute


31. SWE-Effi: Re-Evaluating Software AI Agent System Effectiveness Under Resource Constraints


32. HEFT: A Coarse-to-Fine Hierarchy for Enhancing the Efficiency and Accuracy of Language Model Reasoning


33. LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation


34. Meta-Learning Reinforcement Learning for Crypto-Return Prediction


35. HypoGeneAgent: A Hypothesis Language Agent for Gene-Set Cluster Resolution Selection Using Perturb-seq Datasets


36. World Modeling with Probabilistic Structure Integration


37. DiTTO-LLM: Framework for Discovering Topic-based Technology Opportunities via Large Language Model


38. ALIGNS: Unlocking nomological networks in psychological measurement through a large language model


39. VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions


40. Investigating Symbolic Triggers of Hallucination in Gemma Models Across HaluEval and TruthfulQA


41. How Small Transformation Expose the Weakness of Semantic Similarity Measures


42. HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering


43. The Thinking Therapist: Training Large Language Models to Deliver Acceptance and Commitment Therapy using Supervised Fine-Tuning and Odds Ratio Policy Optimization


44. Psychiatry-Bench: A Multi-Task Benchmark for LLMs in Psychiatry


45. Generating Individual Travel Diaries Using Large Language Models Informed by Census and Land-Use Data


46. Assisting Research Proposal Writing with Large Language Models: Evaluation and Refinement


47. Beyond I’m Sorry, I Can’t: Dissecting Large Language Model Refusal


48. LLM-Based Instance-Driven Heuristic Bias In the Context of a Biased Random Key Genetic Algorithm


49. Differential Robustness in Transformer Language Models: Empirical Evaluation Under Adversarial Text Attacks


50. Temporal Preferences in Language Models for Long-Horizon Assistance


51. CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation Backdoor


52. Creativity Benchmark: A benchmark for marketing creativity for LLM models


53. Cross-Layer Attention Probing for Fine-Grained Hallucination Detection


54. Personas within Parameters: Fine-Tuning Small Language Models with Low-Rank Adapters to Mimic User Behaviors


55. AI-Powered Assistant for Long-Term Access to RHIC Knowledge


56. GeoGPT.RAG Technical Report


57. TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation


58. DB3 Team’s Solution For Meta KDD Cup’ 25