VideoEditGAN

This is the repo for ECCV'22 paper, "Temporally Consistent Semantic Video Editing".

Updates

09/25/2022: added example code.
07/16/2022: repo initialized.

Prerequisites

Linux
Anaconda/Miniconda
Python 3.6 (tested on Python 3.6.7)
PyTorch
CUDA enabled GPU

Install packages:

conda env create -f environment.yml

Get started

Let's use examples/aamir_khan_clip.mp4 as an example.

Split a video to frames:

python scripts/vid2frame.py --pathIn examples/aamir_khan_clip.mp4 --pathOut out/aamir_khan/frames

Face alignment. We use 3DDFA_V2 for face alignment. First, clone 3DDFA_V2 to folder:

git clone https://github.com/cleardusk/3DDFA_V2.git
cd 3DDFA_V2

Then, install the dependency following the instructions, and build the cython

sh ./build.sh

We provide a code snippet single_video_smooth.py to generate facial landmarks for the alignment. Run

cp ../scrpits/single_video_smooth.py ./
python single_video_smooth.py -f out/aamir_khan/frames

The landmarks.npy will be saved at path-to-video/../landmarks/landmarks.npy.

Then we can transform the faces using detected landmarks.

cd ../
python scripts/align_faces_parallel.py --num_threads 1 --root_path out/aamir_khan/frames --output_path out/aamir_khan/aligned

We then run a naive unalignment to see if the alignment makes sense. This will also provide the parameters for the post-processing.

python scripts/unalign.py --ori_images_path out/aamir_khan/frames --aligned_images_path out/aamir_khan/aligned --output_path out/aamir_khan/unaligned

GAN inversion

For in-domain editing, we use PTI to do the inversion. We have included PTI in this repo. To use it, download pre-trained models and put them in PTI/pretrained_models/, then start the inversion (this will take a while):

cd PTI
python scripts/run_pti_multi.py --data_root ../out/aamir_khan/aligned --run_name aamir_khan --checkpoint_path ../out/aamir_khan/inverted

Direct editing

Here we use StyleCLIP mapper as an example. Download the pretrained mapper here, and put it into PTI/pretrained_models/. Then, run

python scripts/pti_styleclip.py --inverted_root ../out/aamir_khan/inverted --run_name aamir_khan_eyeglasses --aligned_frame_path ../out/aamir_khan/aligned --output_root ../out/aamir_khan/in_domain --use_multi_id_G

Our flow-based method Now that we have prepared everything, the next step is to run our proposed method.

Our method relies on RAFT, a flow estimator. Download the pretrained network here, and put raft-things.pth into VideoEditGAN/pretrained_models/.

Put pretrained mapper into VideoEditGAN/pretrained_models, for example

cd VideoEditGAN/pretrained_models
ln -s PTI/pretrained_models/eyeglasses.pt ./

Run our proposed method:

cd VideoEditGAN/
python -W ignore scripts/temp_consist.py --edit_root out/aamir_khan/in_domain --metadata_root out/aamir_khan/unaligned --original_root out/aamir_khan/frames --aligned_ori_frame_root out/aamir_khan/aligned --checkpoint_path out/aamir_khan/inverted --batch_size 1 --reg_frame 0.2 --weight_cycle 10.0 --weight_tv_flow 0.0 --lr 1e-3 --weight_photo 1.0 --reg_G 100.0 --lr_G 1e-04 --weight_out_mask 0.5 --weight_in_mask 0.0 --tune_w --epochs_w 10 --tune_G --epochs_G 3 --scale_factor 4 --in_domain --exp_name 'temp_consist' --run_name 'aamir_khan_eyeglasses'

Unalignment

As a final step, we run STIT as a post-processing to put the aligned face back to the input video.

python video_stitching_tuning_ours.py --input_folder ../out/aamir_khan/in_domain/StyleCLIP/eyeglasses/temp_consist/tune_G/aligned_frames --output_folder ../out/aamir_khan/in_domain/StyleCLIP/eyeglasses/temp_consist/tune_G/aligned_frames/stitiched --edit_name 'eyeglasses' --latent_code_path ../out/aamir_khan/in_domain/StyleCLIP/eyeglasses/temp_consist/tune_G/variables.pth --gen_path out/aamir_khan/in_domain/StyleCLIP/eyeglasses/temp_consist/tune_G/G.pth --metadata_path ../out/aamir_khan/unaligned --output_frames --num_steps 50

Citation

If you find the code useful, please consider citing our paper:

@article{xu2022videoeditgan,
        author    = {Xu, Yiran and AlBahar, Badour and Huang, Jia-Bin},
        title     = {Temporally consistent semantic video editing},
        journal   = {arXiv preprint arXiv: 2206.10590},
        year      = {2022},
        }

Acknowledgements

The codebase is heavily built upon prior work. We would like to thank

StyleGAN2-ada and rosinality's implementation
3DDFA
PTI
ReStyle
StyleCLIP
StyleGAN-NADA
STIT
RAFT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
PTI		PTI
RAFT		RAFT
STIT		STIT
examples		examples
models		models
options		options
scripts		scripts
torch_utils		torch_utils
utils		utils
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VideoEditGAN

Updates

Prerequisites

Get started

Citation

Acknowledgements

About

Releases

Packages

Languages

Twizwei/VideoEditGAN

Folders and files

Latest commit

History

Repository files navigation

VideoEditGAN

Updates

Prerequisites

Get started

Citation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages