LLM 관련 주요 논문 - 2026-04-16

1. TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration


2. Reward Design for Physical Reasoning in Vision-Language Models


3. AI-Assisted Peer Review at Scale: The AAAI-26 AI Review Pilot


4. GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis


5. The cognitive companion: a lightweight parallel monitoring architecture for detecting and recovering from reasoning degradation in LLM agents


6. Rethinking AI Hardware: A Three-Layer Cognitive Architecture for Autonomous Agents


7. Towards Scalable Lightweight GUI Agents via Multi-role Orchestration


8. ReSS: Learning Reasoning Models for Tabular Data Prediction via Symbolic Scaffold


9. WebXSkill: Skill Learning for Autonomous Web Agents


10. Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models


11. SciFi: A Safe, Lightweight, User-Friendly, and Fully Autonomous Agentic AI Workflow for Scientific Applications


12. Exploration and Exploitation Errors Are Measurable for Language Model Agents


13. From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space


14. LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning


15. Rhetorical Questions in LLM Representations: A Linear Probing Study


16. HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System



18. MAny: Merge Anything for Multimodal Continual Instruction Tuning


19. Diffusion Language Models for Speech Recognition


20. Adaptive Conformal Prediction for Improving Factuality of Generations by Large Language Models


21. Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs


22. How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source Data


23. ASTER: Latent Pseudo-Anomaly Generation for Unsupervised Time-Series Anomaly Detection


24. Do We Still Need Humans in the Loop? Comparing Human and LLM Annotation in Active Learning for Hostility Detection


25. MCPThreatHive: Automated Threat Intelligence for Model Context Protocol Ecosystems


26. SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention


27. Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation


28. From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models


29. Towards Fine-grained Temporal Perception: Post-Training Large Audio-Language Models with Audio-Side Time Prompt


30. Beyond Arrow’s Impossibility: Fairness as an Emergent Property of Multi-Agent Collaboration


31. MIND: AI Co-Scientist for Material Research


32. IndicDB – Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages


33. Automatically Inferring Teachers’ Geometric Content Knowledge: A Skills Based Approach


34. SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment


35. Syn-TurnTurk: A Synthetic Dataset for Turn-Taking Prediction in Turkish Dialogues



37. UHR-BAT: Budget-Aware Token Compression Vision-Language model for Ultra-High-Resolution Remote Sensing


38. CLIP Architecture for Abdominal CT Image-Text Alignment and Zero-Shot Learning: Investigating Batch Composition and Data Scaling


39. Training-Free Test-Time Contrastive Learning for Large Language Models


40. SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization


41. Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning


42. A KL Lens on Quantization: Fast, Forward-Only Sensitivity for Mixed-Precision SSM-Transformer Models


43. The Cognitive Circuit Breaker: A Systems Engineering Framework for Intrinsic AI Reliability


44. From Prediction to Justification: Aligning Sentiment Reasoning with Human Rationale via Reinforcement Learning


45. Peer-Predictive Self-Training for Language Model Reasoning


46. English is Not All You Need: Systematically Exploring the Role of Multilinguality in LLM Post-Training


47. L2D-Clinical: Learning to Defer for Adaptive Model Selection in Clinical Text Classification


48. Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs


49. Lazy or Efficient? Towards Accessible Eye-Tracking Event Detection Using LLMs


50. On the Creativity of AI Agents


51. SemiFA: An Agentic Multi-Modal Framework for Autonomous Semiconductor Failure Analysis Report Generation


52. KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs


53. InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis


54. Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization


55. AgentForge: Execution-Grounded Multi-Agent LLM Framework for Autonomous Software Engineering


56. The Code Whisperer: LLM and Graph-Based AI for Smell and Vulnerability Resolution


57. Applying an Agentic Coding Tool for Improving Published Algorithm Implementations


58. Building Trust in the Skies: A Knowledge-Grounded LLM-based Framework for Aviation Safety


59. DeEscalWild: A Real-World Benchmark for Automated De-Escalation Training with SLMs


60. OmniTrace: A Unified Framework for Generation-Time Attribution in Omni-Modal LLMs


61. LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks


62. EVE: A Domain-Specific LLM Framework for Earth Intelligence


63. Lossless Prompt Compression via Dictionary-Encoding and In-Context Learning: Enabling Cost-Effective LLM Analysis of Repetitive Data


64. Correct Chains, Wrong Answers: Dissociating Reasoning from Output in LLM Logic


65. Bi-Predictability: A Real-Time Signal for Monitoring LLM Interaction Integrity


66. Caption First, VQA Second: Knowledge Density, Not Task Format, Drives Multimodal Scaling


67. From Natural Language to PromQL: A Catalog-Driven Framework with Dynamic Temporal Resolution for Cloud-Native Observability


68. TableNet A Large-Scale Table Dataset with LLM-Powered Autonomous


69. When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation