The rapid rise of AI-generated videos creates urgent risks, from misinformation to reputational harm, making reliable detection tools essential. Beyond accuracy, detectors must also explain their decisions to ensure transparency. We present VidGuard-R1, the first video authenticity detector that fine-tunes a multimodal large language model (MLLM) with group relative policy optimization (GRPO). VidGuard-R1 combines strong accuracy with clear reasoning. We build a challenging dataset of 140k real and generated videos designed to test detection difficulty. Using Qwen-VL with GRPO and two reward models focused on temporal artifacts and generation complexity, VidGuard-R1 achieves state-of-the-art zero-shot results and surpasses 95% accuracy after further training. Case studies show it also provides precise, interpretable explanations for its predictions.
AI Video (Dreamvideo)
AI Video (LaVie)
Real Video
Real Video
AI Video (Show-1)
AI Video (Sora)
AI Video (SVD)
@inproceedings{park2026vidguardr,
title={VidGuard-R1: {AI}-Generated Video Detection and Explanation via Reasoning {MLLM}s and {RL}},
author={Kyoungjun Park and Yifan Yang and Juheon Yi and Shicheng Zheng and Muhammad Muaz and Yifei Shen and Dongqi Han and Caihua Shan and Lili Qiu},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=gXjOsBcXIR}
}