As a robust media representation technique, video hashing is frequently used in near-duplicate detection, video authentication, and antipiracy search. Distortions to a video may include spatial modifications to each frame, temporal de-synchronization, and joint spatio-temporal attacks. To address the increasingly difficult case of finding videos under spatio-temporal modifications, we propose a new framework called two-stage video hashing. First, an efficient automatic synchronization is achieved using dynamic time warping (DTW) and a complementary video comparison measure is developed based on flow hashing (FH), which is extracted from the synchronized videos.
Next, a fusion mechanism called distance boosting is proposed to fuse the information extracted by DTW and FH in a future-proof manner in the sense whenever model retraining is needed, the existing hash vectors do not need to be regenerated. Experiments on real video collections show that such a hash extraction and fusion method enables unprecedented robustness under both spatial and temporal attacks.