LLM 관련 주요 논문 - 2026-01-14

1. Uncovering Political Bias in Large Language Models using Parliamentary Voting Records


2. Advancing ESG Intelligence: An Expert-level Agent and Comprehensive Benchmark for Sustainable Finance


3. Why AI Alignment Failure Is Structural: Learned Human Interaction Structures and AGI as an Endogenous Evolutionary Shock


4. Prism: Towards Lowering User Cognitive Load in LLMs via Complex Intent Understanding


5. Resisting Manipulative Bots in Memecoin Copy Trading: A Multi-Agent Approach with Chain-of-Thought Reasoning


6. Learner-Tailored Program Repair: A Solution Generator with Iterative Edit-Driven Retrieval Enhancement


7. Sketch-Based Facade Renovation With Generative AI: A Streamlined Framework for Bypassing As-Built Modelling in Industrial Adaptive Reuse


8. What If TSF: A Benchmark for Reframing Forecasting as Scenario-Guided Multimodal Forecasting


9. SUMMPILOT: Bridging Efficiency and Customization for Interactive Summarization System


10. M3-BENCH: Process-Aware Evaluation of LLM Agents Social Behaviors in Mixed-Motive Games


11. Beyond Linearization: Attributed Table Graphs for Table Reasoning


12. YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation


13. Hybrid Distillation with CoT Guidance for Edge-Drone Control Code Generation


14. Owen-Shapley Policy Optimization (OSPO): A Principled RL Algorithm for Generative Search LLMs


15. Semantic Laundering in AI Agent Architectures: Why Tool Boundaries Do Not Confer Epistemic Warrant


16. Greedy Is Enough: Sparse Action Discovery in Agentic LLMs


17. Sparsity Is Necessary: Polynomial-Time Stability for Agentic LLMs in Large Action Spaces


18. T3: Benchmarking Sycophancy and Skepticism in Causal Judgment


19. Large Artificial Intelligence Model Guided Deep Reinforcement Learning for Resource Allocation in Non Terrestrial Networks


20. The End of Reward Engineering: How LLMs Are Redefining Multi-Agent Coordination


21. MPCI-Bench: A Benchmark for Multimodal Pairwise Contextual Integrity Evaluation of Language Model Agents


22. Improving LLM Reasoning with Homophily-aware Structural and Semantic Text-Attributed Graph Compression


23. The Agent’s First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios


24. ZeroDVFS: Zero-Shot LLM-Guided Core and Frequency Allocation for Embedded Platforms


25. Project Synapse: A Hierarchical Multi-Agent Framework with Hybrid Memory for Autonomous Resolution of Last-Mile Delivery Disruptions


26. Embedded AI Companion System on Edge Devices


27. MirrorBench: An Extensible Framework to Evaluate User-Proxy Agents for Human-Likeness


28. MemoBrain: Executive Memory as an Agentic Brain for Reasoning


29. Semantic Gravity Wells: Why Negative Constraints Backfire


30. Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety


31. When Models Know When They Do Not Know: Calibration, Cascading, and Cleaning


32. Executable Ontologies in Game Development: From Algorithmic Control to Semantic World Modeling


33. Modeling LLM Agent Reviewer Dynamics in Elo-Ranked Review System


34. Reasoning Matters for 3D Visual Grounding


35. Asymptotic Universal Alignment: A New Alignment Framework via Test-Time Scaling


36. Reliable Graph-RAG for Codebases: AST-Derived Graphs vs LLM-Extracted Knowledge Graphs


37. UR-Bench: A Benchmark for Multi-Hop Reasoning over Ultra-High-Resolution Images


38. TableCache: Primary Foreign Key Guided KV Cache Precomputation for Low Latency Text-to-SQL


39. TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback


40. Lessons from the Field: An Adaptable Lifecycle Approach to Applied Dialogue Summarization


41. RULERS: Locked Rubrics and Evidence-Anchored Scoring for Robust LLM Evaluation


42. Moral Lenses, Political Coordinates: Towards Ideological Positioning of Morally Conditioned LLMs


43. VideoHEDGE: Entropy-Based Hallucination Detection for Video-VLMs via Semantic Clustering and Spatiotemporal Perturbations


44. BenchOverflow: Measuring Overflow in Large Language Models via Plain-Text Prompts


45. sui-1: Grounded and Verifiable Long-Form Summarization


46. JudgeRLVR: Judge First, Generate Second for Efficient Reasoning


47. CoMa: Contextual Massing Generation with Vision-Language Models


48. Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance


49. Regulatory gray areas of LLM Terms


50. PATS: Personality-Aware Teaching Strategies with Large Language Model Tutors


51. Controlled LLM Training on Spectral Sphere


52. Enhancing Sentiment Classification and Irony Detection in Large Language Models through Advanced Prompt Engineering Techniques


53. Demystifying the Slash Pattern in Attention: The Role of RoPE


54. HIPPO: Accelerating Video Large Language Models Inference via Holistic-aware Parallel Speculative Decoding


55. Knowledge-based learning in Text-RAG and Image-RAG


56. DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection


57. Evaluating Implicit Regulatory Compliance in LLM Tool Invocation via Logic-Guided Synthesis


58. ForgetMark: Stealthy Fingerprint Embedding via Targeted Unlearning in Language Models


59. GI-Bench: A Panoramic Benchmark Revealing the Knowledge-Experience Dissociation of Multimodal Large Language Models in Gastrointestinal Endoscopy Against Clinical Standards


60. Prompt-Based Clarity Evaluation and Topic Detection in Political Question Answering


61. SwiftMem: Fast Agentic Memory via Query-aware Indexing


62. Enriching Semantic Profiles into Knowledge Graph for Recommender Systems Using Large Language Models


63. Qalb: Largest State-of-the-Art Urdu Large Language Model for 230M Speakers with Systematic Continued Pre-training


64. Subspace Alignment for Vision-Language Model Test-time Adaptation


65. Debiasing Large Language Models via Adaptive Causal Prompting with Sketch-of-Thought


66. STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order


67. Q-realign: Piggybacking Realignment on Quantization for Safe and Efficient LLM Deployment


68. Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models


69. Representations of Text and Images Align From Layer One


70. LLM Review: Enhancing Creative Writing via Blind Peer Review Feedback


71. DYCP: Dynamic Context Pruning for Long-Form Dialogue with LLMs


72. SECite: Analyzing and Summarizing Citations in Software Engineering Literature


73. Towards Specialized Generalists: A Multi-Task MoE-LoRA Framework for Domain-Specific LLM Adaptation


74. Enhancing Large Language Models for Time-Series Forecasting via Vector-Injected In-Context Learning


75. Large Language Models and Algorithm Execution: Application to an Arithmetic Function


76. Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification


77. KVzap: Fast, Adaptive, and Faithful KV Cache Pruning


78. Small Symbols, Big Risks: Exploring Emoticon Semantic Confusion in Large Language Models


79. Sliced-Wasserstein Distribution Alignment Loss Improves the Ultra-Low-Bit Quantization of Large Language Models


80. E^2-LLM: Bridging Neural Signals and Interpretable Affective Analysis


81. Multiplicative Orthogonal Sequential Editing for Language Models


82. FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments


83. Hierarchical Sparse Plus Low Rank Compression of LLM