The rapid development of Deepfake technology has posed significant challenges in detecting fake videos. In response to the existing problems in reference frame selection, spatial–temporal feature mining, and fusion in face-swapping video detection techniques, we propose a face-swapping video detection model based on spatial–temporal feature fusion. First, key frame sequences are selected using interframe facial edge region differences. Then, the key frame sequences are separately input into the spatial branch to extract hidden artifacts and the temporal branch to extract inconsistent information. Finally, the spatial–temporal features are fused using a self-attention mechanism and input into a classifier to achieve detection results. To validate the effectiveness of the proposed model, we conducted experiments on the Faceforensics++ and Celeb-DF open-source Deepfake datasets. The experimental results demonstrate that the proposed model achieves better detection accuracy and higher-ranking generalization performance than state-of-the-art competitors. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Video
Feature fusion
Performance modeling
Feature extraction
Data modeling
Education and training
Machine learning