LLM 관련 주요 논문 - 2025-10-15

1. Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics


2. CTRL-Rec: Controlling Recommender Systems With Natural Language


3. Multi-Agent Debate for LLM Judges with Adaptive Stability Detection


4. ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning


5. Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks


6. Using Medical Algorithms for Task-Oriented Dialogue in LLM-Based Medical Interviews


7. Evaluating and Mitigating LLM-as-a-judge Bias in Communication Systems


8. MTOS: A LLM-Driven Multi-topic Opinion Simulation Framework for Exploring Echo Chamber Dynamics


9. PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks


10. A Survey of Vibe Coding with Large Language Models


11. O-Forge: An LLM + Computer Algebra Framework for Asymptotic Analysis


12. RAG-Anything: All-in-One RAG Framework


13. $\mathbf{T^3}$: Reducing Belief Deviation in Reinforcement Learning for Active Reasoning


14. PromptFlow: Training Prompts Like Neural Networks


15. MedKGEval: A Knowledge Graph-Based Multi-Turn Evaluation Framework for Open-Ended Patient Interactions with Clinical LLMs


16. GOAT: A Training Framework for Goal-Oriented Agent with Tools


17. Evolution of meta’s llama models and parameter-efficient fine-tuning of large language models: a survey


18. MatSciBench: Benchmarking the Reasoning Ability of Large Language Models in Materials Science


19. Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing


20. ToPolyAgent: AI Agents for Coarse-Grained Topological Polymer Simulations


21. Evaluating the Quality of Randomness and Entropy in Tasks Supported by Large Language Models


22. EmboMatrix: A Scalable Training-Ground for Embodied Decision-Making


23. Empowering LLM Agents with Geospatial Awareness: Toward Grounded Reasoning for Wildfire Response


24. Do Large Language Models Respect Contracts? Evaluating and Enforcing Contract-Adherence in Code Generation


25. Asking Clarifying Questions for Preference Elicitation With Large Language Models


26. CGBench: Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research


27. Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation


28. Beyond Consensus: Mitigating the Agreeableness Bias in LLM Judge Evaluations


29. UniFusion: Vision-Language Model as Unified Encoder in Image Generation


30. Dr.LLM: Dynamic Layer Routing in LLMs


31. VQArt-Bench: A semantically rich VQA Benchmark for Art and Cultural Heritage


32. Hey, wait a minute: on at-issue sensitivity in Language Models


33. Beyond Postconditions: Can Large Language Models infer Formal Contracts for Automatic Software Verification?


34. Generation Space Size: Understanding and Calibrating Open-Endedness of LLM Generations


35. From Delegates to Trustees: How Optimizing for Long-Term Interests Shapes Bias and Alignment in LLM


36. Reasoning Pattern Matters: Learning to Reason without Human Rationales


37. Laminar: A Scalable Asynchronous RL Post-Training Framework


38. StyleDecipher: Robust and Explainable Detection of LLM-Generated Texts with Stylistic Analysis


39. BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet)


40. When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection


41. Tokenization Disparities as Infrastructure Bias: How Subword Systems Create Inequities in LLM Access and Efficiency


42. LLM-REVal: Can We Trust LLM Reviewers Yet?


43. (R)evolution of Programming: Vibe Coding as a Post-Coding Paradigm


44. HiLoRA: Adaptive Hierarchical LoRA Routing for Training-Free Domain Generalization


45. Shallow Robustness, Deep Vulnerabilities: Multi-Turn Evaluation of Medical LLMs


46. PromptLocate: Localizing Prompt Injection Attacks


47. MoRA: On-the-fly Molecule-aware Low-Rank Adaptation Framework for LLM-based Multi-Modal Molecular Assistant


48. Analysing Moral Bias in Finetuned LLMs through Mechanistic Interpretability


49. HALF: Harm-Aware LLM Fairness Evaluation Aligned with Deployment


50. CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs


51. From Knowledge to Treatment: Large Language Model Assisted Biomedical Concept Representation for Drug Repurposing


52. Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models


53. SafeMT: Multi-turn Safety for Multimodal Language Models


54. Understanding the Modality Gap: An Empirical Study on the Speech-Text Alignment Mechanism of Large Speech Language Models


55. Deep Associations, High Creativity: A Simple yet Effective Metric for Evaluating Large Language Models


56. An AI-Based Behavioral Health Safety Filter and Dataset for Identifying Mental Health Crises in Text-Based Conversations


57. Hierarchical Alignment: Surgical Fine-Tuning via Functional Layer Specialization in Large Language Models


58. Multi-stage Prompt Refinement for Mitigating Hallucinations in Large Language Models


59. CPR: Mitigating Large Language Model Hallucinations with Curative Prompt Refinement


60. Conjecturing: An Overlooked Step in Formal Mathematical Reasoning


61. Learning Dynamics of VLM Finetuning


62. CTIArena: Benchmarking LLM Knowledge and Reasoning Across Heterogeneous Cyber Threat Intelligence


63. Direct Multi-Token Decoding


64. TopoAlign: A Framework for Aligning Code to Math via Topological Decomposition


65. Indoor Localization using Compact, Telemetry-Agnostic, Transfer-Learning Enabled Decoder-Only Transformer


66. Countermind: A Multi-Layered Security Architecture for Large Language Models


67. Data or Language Supervision: What Makes CLIP Better than DINO?


68. BlackIce: A Containerized Red Teaming Toolkit for AI Security Testing


69. PHANTOM RECALL: When Familiar Puzzles Fool Smart Models


70. AwareCompiler: Agentic Context-Aware Compiler Optimization via a Synergistic Knowledge-Data Driven Framework


71. Zero-Shot Large Language Model Agents for Fully Automated Radiotherapy Treatment Planning


72. SeeingSounds: Learning Audio-to-Visual Alignment via Text


73. Scaling Law in LLM Simulated Personality: More Detailed and Realistic Persona Profile Is All You Need


74. Modeling Hypergraph Using Large Language Models


75. Leveraging LLMs, IDEs, and Semantic Embeddings for Automated Move Method Refactoring