Frequency-aware robustness study for deepfake detection
This research project studies how lossy H.264 compression affects deepfake detection. It compares a spatial XceptionNet-style baseline with a hybrid model that adds frequency-domain signals through DCT features, learnable masking, and cross-attention fusion.
Challenge
- Deepfake detectors can perform strongly on clean benchmark data but degrade after platform recompression.
- Compression may remove or distort subtle artifacts that detectors rely on.
- A robust evaluation needs matched compression settings and external cross-dataset testing.
System architecture
Data and inputs
- FaceForensics++ c23 for primary training and in-distribution evaluation.
- Matched FF++ c40 setting to test heavier H.264 compression.
- Celeb-DF v2 as an external dataset for cross-dataset testing without fine-tuning.
Technical approach
- Spatial XceptionNet-style baseline for RGB face-crop classification.
- Frequency branch based on DCT representations to capture compression-sensitive artifacts.
- Learnable frequency masking to reduce overfitting to narrow spectral cues.
- Cross-attention fusion between spatial and frequency tokens.
- Video-level aggregation across repeated random seeds for more stable reporting.
Evaluation and results
FF++ c40 accuracy: 71.82% hybrid
Celeb-DF AUC: 86.67% hybrid
5 random seeds
- Spatial baseline remained highly competitive under FF++ c23.
- The hybrid frequency-aware model improved the reported FF++ c40 accuracy and Celeb-DF AUC.
- The evaluation separates in-distribution performance, heavy-compression robustness, and external generalization.
Implementation and code
Implementation focus
The implementation connects data preparation, modeling, evaluation, and interpretation in a structured workflow that makes the technical decisions clear.
Source code
The code is available for exploring the implementation details and extending the experiment when needed.
Scope and responsible use
The project is a focused modeling and evaluation study. Broader use should be supported by validation on additional data, robustness checks, monitoring, and domain-specific evaluation.
Future development
- Add calibration curves and threshold analysis for operational decision-making.
- Evaluate more codecs, resolutions, and platform-like preprocessing chains.
- Add interpretable frequency visualizations to show what the hybrid branch is learning.
Technical contribution
This project combines research-oriented AI evaluation with baseline comparison, hybrid architecture design, robustness testing, cross-dataset validation, and careful reporting across random seeds.