LLM 관련 주요 논문 - 2026-04-02

1. HippoCamp: Benchmarking Contextual Agents on Personal Computers


2. Detecting Multi-Agent Collusion Through Multi-Agent Interpretability


3. Adversarial Moral Stress Testing of Large Language Models


4. Beyond Symbolic Solving: Multi Chain-of-Thought Voting for Geometric Reasoning in Large Language Models


5. RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning


6. UK AISI Alignment Evaluation Case-Study


7. CircuitProbe: Predicting Reasoning Circuits in Transformers via Stability Zone Detection


8. Agent psychometrics: Task-level performance prediction in agentic coding benchmarks


9. Ontology-Constrained Neural Reasoning in Enterprise Agentic Systems: A Neurosymbolic Architecture for Domain-Grounded AI Agents


10. BloClaw: An Omniscient, Multi-Modal Agentic Workspace for Next-Generation Scientific Discovery


11. Does Unification Come at a Cost? Uni-SafeBench: A Safety Benchmark for Unified Multimodal Large Models


12. Adaptive Parallel Monte Carlo Tree Search for Efficient Test-time Compute Scaling


13. The Silicon Mirror: Dynamic Behavioral Gating for Anti-Sycophancy in LLM Agents


14. Logarithmic Scores, Power-Law Discoveries: Disentangling Measurement from Coverage in Agent-Based Evaluation


15. Towards Reliable Truth-Aligned Uncertainty Estimation in Large Language Models


16. Decision-Centric Design for LLM Systems


17. Signals: Trajectory Sampling and Triage for Agentic Interactions


18. Improvisational Games as a Benchmark for Social Intelligence of AI Agents: The Case of Connections


19. Human-in-the-Loop Control of Objective Drift in LLM-Assisted Computer Science Education


20. A Safety-Aware Role-Orchestrated Multi-Agent LLM Framework for Behavioral Health Communication Simulation


21. One Panel Does Not Fit All: Case-Adaptive Multi-Agent Deliberation for Clinical Prediction


22. How Emotion Shapes the Behavior of LLMs and Agents: A Mechanistic Study


23. $\texttt{YC-Bench}$: Benchmarking AI Agents for Long-Term Planning and Consistent Execution


24. CliffSearch: Structured Agentic Co-Evolution over Theory and Code for Scientific Algorithm Discovery


25. ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget


26. A ROS 2 Wrapper for Florence-2: Multi-Mode Local Vision-Language Inference for Robotic Systems


27. Online Reasoning Calibration: Test-Time Training Enables Generalizable Conformal LLM Reasoning


28. Brainstacks: Cross-Domain Cognitive Capabilities via Frozen MoE-LoRA Stacks for Continual LLM Learning


29. Lightweight Prompt-Guided CLIP Adaptation for Monocular Depth Estimation


30. Temporal Dependencies in In-Context Learning: The Role of Induction Heads


31. TRACE: Training-Free Partial Audio Deepfake Detection via Embedding Trajectory Analysis of Speech Foundation Models


32. Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks


33. Revision or Re-Solving? Decomposing Second-Pass Gains in Multi-LLM Pipelines


34. Fast and Accurate Probing of In-Training LLMs’ Downstream Performances


35. OrgAgent: Organize Your Multi-Agent System like a Company


36. Query-Conditioned Evidential Keyframe Sampling for MLLM-Based Long-Form Video Understanding


37. Dual Optimal: Make Your LLM Peer-like with Dignity


38. Investigating Autonomous Agent Contributions in the Wild: Activity Patterns and Code Change over Time


39. PixelPrune: Pixel-Level Adaptive Visual Token Reduction via Predictive Coding


40. Emotion Entanglement and Bayesian Inference for Multi-Dimensional Emotion Understanding


41. Scalable Pretraining of Large Mixture of Experts Language Models on Aurora Super Computer


42. Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning


43. IWP: Token Pruning as Implicit Weight Pruning in Large Vision Language Models


44. Spectral Compact Training: Pre-Training Large Language Models via Permanent Truncated SVD and Stiefel QR Retraction


45. To Memorize or to Retrieve: Scaling Laws for RAG-Considerate Pretraining


46. Streaming Model Cascades for Semantic SQL


47. HabitatAgent: An End-to-End Multi-Agent System for Housing Consultation


48. Optimsyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation


49. Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding


50. MOON3.0: Reasoning-aware Multimodal Representation Learning for E-commerce Product Understanding


51. A Reasoning-Enabled Vision-Language Foundation Model for Chest X-ray Interpretation


52. Executing as You Generate: Hiding Execution Latency in LLM Code Generation


53. First Logit Boosting: Visual Grounding Method to Mitigate Object Hallucination in Large Vision-Language Models


54. G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs


55. EvolveTool-Bench: Evaluating the Quality of LLM-Generated Tool Libraries as Software Artifacts


56. Prompt-Guided Prefiltering for VLM Image Compression


57. Robust Multimodal Safety via Conditional Decoding


58. Asymmetric Actor-Critic for Multi-turn LLM Agents


59. VeriAct: Beyond Verifiability – Agentic Synthesis of Correct and Complete Formal Specifications


60. The Geometry of Compromise: Unlocking Generative Capabilities via Controllable Modality Alignment


61. LLM Essay Scoring Under Holistic and Analytic Rubrics: Prompt Effects and Bias


62. REM-CTX: Automated Peer Review via Reinforcement Learning with Auxiliary Context


63. Diversity-Aware Reverse Kullback-Leibler Divergence for Large Language Model Distillation


64. Making Sense of AI Agents Hype: Adoption, Architectures, and Takeaways from Practitioners


65. Unified Architecture Metamodel of Information Systems Developed by Generative AI


66. Oblivion: Self-Adaptive Agentic Memory Control through Decay-Driven Activation


67. Hierarchical Pre-Training of Vision Encoders with Large Language Models


68. Learning to Play Blackjack: A Curriculum Learning Perspective


69. Empirical Validation of the Classification-Verification Dichotomy for AI Safety Gates


70. GenoBERT: A Language Model for Accurate Genotype Imputation


71. The Energy Footprint of LLM-Based Environmental Analysis: LLMs and Domain Products


72. Task-Centric Personalized Federated Fine-Tuning of Language Models


73. “Who Am I, and Who Else Is Here?” Behavioral Differentiation Without Role Assignment in Multi-Agent LLM Systems


74. Brevity Constraints Reverse Performance Hierarchies in Language Models


75. WHBench: Evaluating Frontier LLMs with Expert-in-the-Loop Validation on Women’s Health Topics


76. Criterion Validity of LLM-as-Judge for Business Outcomes in Conversational Commerce


77. How Do Language Models Process Ethical Instructions? Deliberation, Consistency, and Other-Recognition Across Four Models


78. Think Twice Before You Write – an Entropy-based Decoding Strategy to Enhance LLM Reasoning


79. Are they human? Detecting large language models by probing human memory constraints


80. MSA-Thinker: Discrimination-Calibration Reasoning with Hint-Guided Reinforcement Learning for Multimodal Sentiment Analysis


81. Finding and Reactivating Post-Trained LLMs’ Hidden Safety Mechanisms


82. Quantifying Gender Bias in Large Language Models: When ChatGPT Becomes a Hiring Manager


83. Can LLMs Perceive Time? An Empirical Investigation


84. Eyla: Toward an Identity-Anchored LLM Architecture with Integrated Biological Priors – Vision, Implementation Attempt, and Lessons from AI-Assisted Development


85. How Trustworthy Are LLM-as-Judge Ratings for Interpretive Responses? Implications for Qualitative Research Workflows


86. Dynin-Omni: Omnimodal Unified Large Diffusion Language Model


87. LinearARD: Linear-Memory Attention Distillation for RoPE Restoration


88. A Reliability Evaluation of Hybrid Deterministic-LLM Based Approaches for Academic Course Registration PDF Information Extraction


89. Benchmark for Assessing Olfactory Perception of Large Language Models


90. Two-Stage Optimizer-Aware Online Data Selection for Large Language Models