This is the repo for ECCV'22 paper, "Temporally Consistent Semantic Video Editing".
- 09/25/2022: added example code.
- 07/16/2022: repo initialized.
- Linux
- Anaconda/Miniconda
- Python 3.6 (tested on Python 3.6.7)
- PyTorch
- CUDA enabled GPU
Install packages:
conda env create -f environment.yml
Let's use examples/aamir_khan_clip.mp4
as an example.
- Split a video to frames:
python scripts/vid2frame.py --pathIn examples/aamir_khan_clip.mp4 --pathOut out/aamir_khan/frames
- Face alignment. We use 3DDFA_V2 for face alignment. First, clone 3DDFA_V2 to folder:
git clone https://github.com/cleardusk/3DDFA_V2.git
cd 3DDFA_V2
Then, install the dependency following the instructions, and build the cython
sh ./build.sh
We provide a code snippet single_video_smooth.py
to generate facial landmarks for the alignment. Run
cp ../scrpits/single_video_smooth.py ./
python single_video_smooth.py -f out/aamir_khan/frames
The landmarks.npy
will be saved at path-to-video/../landmarks/landmarks.npy
.
Then we can transform the faces using detected landmarks.
cd ../
python scripts/align_faces_parallel.py --num_threads 1 --root_path out/aamir_khan/frames --output_path out/aamir_khan/aligned
We then run a naive unalignment to see if the alignment makes sense. This will also provide the parameters for the post-processing.
python scripts/unalign.py --ori_images_path out/aamir_khan/frames --aligned_images_path out/aamir_khan/aligned --output_path out/aamir_khan/unaligned
- GAN inversion
For in-domain editing, we use PTI to do the inversion.
We have included PTI in this repo. To use it, download pre-trained models and put them in PTI/pretrained_models/
, then start the inversion (this will take a while):
cd PTI
python scripts/run_pti_multi.py --data_root ../out/aamir_khan/aligned --run_name aamir_khan --checkpoint_path ../out/aamir_khan/inverted
- Direct editing
Here we use StyleCLIP mapper as an example. Download the pretrained mapper here, and put it into PTI/pretrained_models/
. Then, run
python scripts/pti_styleclip.py --inverted_root ../out/aamir_khan/inverted --run_name aamir_khan_eyeglasses --aligned_frame_path ../out/aamir_khan/aligned --output_root ../out/aamir_khan/in_domain --use_multi_id_G
- Our flow-based method Now that we have prepared everything, the next step is to run our proposed method.
Our method relies on RAFT, a flow estimator. Download the pretrained network here, and put raft-things.pth
into VideoEditGAN/pretrained_models/
.
Put pretrained mapper into VideoEditGAN/pretrained_models
, for example
cd VideoEditGAN/pretrained_models
ln -s PTI/pretrained_models/eyeglasses.pt ./
Run our proposed method:
cd VideoEditGAN/
python -W ignore scripts/temp_consist.py --edit_root out/aamir_khan/in_domain --metadata_root out/aamir_khan/unaligned --original_root out/aamir_khan/frames --aligned_ori_frame_root out/aamir_khan/aligned --checkpoint_path out/aamir_khan/inverted --batch_size 1 --reg_frame 0.2 --weight_cycle 10.0 --weight_tv_flow 0.0 --lr 1e-3 --weight_photo 1.0 --reg_G 100.0 --lr_G 1e-04 --weight_out_mask 0.5 --weight_in_mask 0.0 --tune_w --epochs_w 10 --tune_G --epochs_G 3 --scale_factor 4 --in_domain --exp_name 'temp_consist' --run_name 'aamir_khan_eyeglasses'
- Unalignment
As a final step, we run STIT as a post-processing to put the aligned face back to the input video.
python video_stitching_tuning_ours.py --input_folder ../out/aamir_khan/in_domain/StyleCLIP/eyeglasses/temp_consist/tune_G/aligned_frames --output_folder ../out/aamir_khan/in_domain/StyleCLIP/eyeglasses/temp_consist/tune_G/aligned_frames/stitiched --edit_name 'eyeglasses' --latent_code_path ../out/aamir_khan/in_domain/StyleCLIP/eyeglasses/temp_consist/tune_G/variables.pth --gen_path out/aamir_khan/in_domain/StyleCLIP/eyeglasses/temp_consist/tune_G/G.pth --metadata_path ../out/aamir_khan/unaligned --output_frames --num_steps 50
If you find the code useful, please consider citing our paper:
@article{xu2022videoeditgan,
author = {Xu, Yiran and AlBahar, Badour and Huang, Jia-Bin},
title = {Temporally consistent semantic video editing},
journal = {arXiv preprint arXiv: 2206.10590},
year = {2022},
}
The codebase is heavily built upon prior work. We would like to thank