VidGuard-R1:
Detecting AI-Generated Videos via Multimodal LLMs Fine-Tuned with RL

Kyoungjun Park1    Yifan Yang2    Juheon Yi2    Shicheng Zheng2    Muhammad Muaz1    Yifei Shen2    Dongqi Han2    Caihua Shan2    Lili Qiu1,2   
1 UT Austin         2 Microsoft Corporation        
Teaser image

Abstract

The rapid rise of AI-generated videos creates urgent risks, from misinformation to reputational harm, making reliable detection tools essential. Beyond accuracy, detectors must also explain their decisions to ensure transparency. We present VidGuard-R1, the first video authenticity detector that fine-tunes a multimodal large language model (MLLM) with group relative policy optimization (GRPO). VidGuard-R1 combines strong accuracy with clear reasoning. We build a challenging dataset of 140k real and generated videos designed to test detection difficulty. Using Qwen-VL with GRPO and two reward models focused on temporal artifacts and generation complexity, VidGuard-R1 achieves state-of-the-art zero-shot results and surpasses 95% accuracy after further training. Case studies show it also provides precise, interpretable explanations for its predictions.

Teaser image

The overall training framework of VidGuard-R1, consisting of two stages:
(1) SFT for CoT initialization, and (2) RL fine-tuning to enable deeper reasoning capabilities.

Highlights

  • VidGuard-R1 is the first video authenticity detector that fine-tunes a multimodal LLM with GRPO. It combines the pretrained knowledge of MLLMs for accurate classification with reinforcement learning for effective exploration. To further boost performance, we introduce two reward models that capture temporal artifacts and generation complexity across diffusion steps.
  • We build a dataset of 140k real/fake video pairs for AI-generated video detection. Using state-of-the-art generators and controlled synthesis, we ensure that distinguishing real from fake remains highly challenging.
  • VidGuard-R1 achieves state-of-the-art zero-shot accuracy above 95% on existing benchmarks, and case studies demonstrate its ability to deliver accurate, interpretable explanations.

Example Videos

BibTeX

TOBE UPDATED
}
}

This page was built using this template.