LLM 관련 주요 논문 - 2026-01-13

1. Open-Vocabulary 3D Instruction Ambiguity Detection


2. TowerMind: A Tower Defence Game Learning Environment and Benchmark for LLM as Agents


3. StackPlanner: A Centralized Hierarchical Multi-Agent System with Task-Experience Memory Management


4. From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation


5. DynaDebate: Breaking Homogeneity in Multi-Agent Debate with Dynamic Path Generation


6. PII-VisBench: Evaluating Personally Identifiable Information Safety in Vision Language Models Along a Continuum of Visibility


7. Logic-Parametric Neuro-Symbolic NLI: Controlling Logical Formalisms for Verifiable LLM Reasoning


8. HAG: Hierarchical Demographic Tree-based Agent Generation for Topic-Adaptive Simulation


9. GenCtrl – A Formal Controllability Toolkit for Generative Models


10. Reinforcement Learning of Large Language Models for Interpretable Credit Card Fraud Detection


11. Crisis-Bench: Benchmarking Strategic Ambiguity and Reputation Management in Large Language Models


12. WildSci: Advancing Scientific Reasoning from In-the-Wild Literature


13. Safety Not Found (404): Hidden Risks of LLM-Based Robotics Decision Making


14. The Evaluation Gap in Medicine, AI and LLMs: Navigating Elusive Ground Truth & Uncertainty via a Probabilistic Paradigm


15. MMUEChange: A Generalized LLM Agent Framework for Intelligent Multi-Modal Urban Environment Change Analysis


16. ART: Adaptive Reasoning Trees for Explainable Claim Verification


17. Conformity and Social Impact on AI Agents


18. The Persona Paradox: Medical Personas as Behavioral Priors in Clinical Language Models


19. Effects of personality steering on cooperative behavior in Large Language Model agents


20. Mathematical Knowledge Graph-Driven Framework for Equation-Based Predictive and Reliable Additive Manufacturing


21. Naiad: Novel Agentic Intelligent Autonomous System for Inland Water Monitoring


22. AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs


23. The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning


24. Agentic LLMs as Powerful Deanonymizers: Re-identification of Participants in the Anthropic Interviewer Dataset


25. Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency


26. Can AI mediation improve democratic deliberation?


27. An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift


28. Gender Bias in LLMs: Preliminary Evidence from Shared Parenting Scenario in Czech Family Law


29. Continual-learning for Modelling Low-Resource Languages from Large Language Models


30. IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck


31. CLewR: Curriculum Learning with Restarts for Machine Translation Preference Learning


32. Router-Suggest: Dynamic Routing for Multimodal Auto-Completion in Visually-Grounded Dialogs


33. Decoding Workload and Agreement From EEG During Spoken Dialogue With Conversational AI


34. SceneFoundry: Generating Interactive Infinite 3D Worlds


35. EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis


36. Tensor-DTI: Enhancing Biomolecular Interaction Prediction with Contrastive Embedding Learning


37. VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit


38. Analysing Differences in Persuasive Language in LLM-Generated Text: Uncovering Stereotypical Gender Patterns


39. The Echo Chamber Multi-Turn LLM Jailbreak


40. Visualising Information Flow in Word Embeddings with Diffusion Tensor Imaging


41. Multimodal In-context Learning for ASR of Low-resource Languages


42. Open World Knowledge Aided Single-Cell Foundation Model with Robust Cross-Modal Cell-Language Pre-training


43. ACR: Adaptive Context Refactoring via Context Refactoring Operators for Multi-Turn Dialogue


44. Autoregressive Ranking: Bridging the Gap Between Dual and Cross Encoders


45. HogVul: Black-box Adversarial Code Generation Framework Against LM-based Vulnerability Detectors


46. RISE: Rule-Driven SQL Dialect Translation via Query Reduction


47. VIB-Probe: Detecting and Mitigating Hallucinations in Vision-Language Models via Variational Information Bottleneck


48. Understanding LLM-Driven Test Oracle Generation


49. Over-Searching in Search-Augmented Large Language Models


50. Evaluating the Use of LLMs for Automated DOM-Level Resolution of Web Performance Issues


51. STELP: Secure Transpilation and Execution of LLM-Generated Programs


52. Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning


53. Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction


54. Tracing Moral Foundations in Large Language Models


55. Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization


56. Lost in Execution: On the Multilingual Robustness of Tool Calling in Large Language Models


57. Multi-turn Jailbreaking Attack in Multi-Modal Large Language Models


58. On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis


59. Retrieval-Augmented Multi-LLM Ensemble for Industrial Part Specification Extraction


60. Engineering the RAG Stack: A Comprehensive Review of the Architecture and Trust Frameworks for Retrieval-Augmented Generation Systems


61. LLM2IR: simple unsupervised contrastive learning makes long-context LLM great retriever


62. Quantifying Document Impact in RAG-LLMs



64. KP-Agent: Keyword Pruning in Sponsored Search Advertising via LLM-Powered Contextual Bandits


65. Tiny Recursive Models on ARC-AGI-1: Inductive Biases, Identity Conditioning, and Test-Time Compute


66. Automating Deception: Scalable Multi-Turn LLM Jailbreaks


67. EvoC2Rust: A Skeleton-guided Framework for Project-Level C-to-Rust Translation