PyTorch implementation for Head2Head and Head2Head++. It can be used to fully transfer the head pose, facial expression and eye movements from a source video to a target identity.
Head2Head: Video-based Neural Head Synthesis
Mohammad Rami Koujan*, Michail Christos Doukas*, Anastasios Roussos, Stefanos Zafeiriou
In 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)
(* equal contribution)Paper: https://arxiv.org/abs/2005.10954
Video Demo: https://youtu.be/RCvVMF5cVeY
Head2Head++: Deep Facial Attributes Re-Targeting
Michail Christos Doukas*, Mohammad Rami Koujan*, Anastasios Roussos, Viktoriia Sharmanska Stefanos Zafeiriou
Submitted to the IEEE Transactions on Biometrics, Behavior, and Identity Science (TBIOM) journal.
(* equal contribution)Paper: https://arxiv.org/abs/2006.10199
Video Demo: https://youtu.be/BhpRjjCcmJE
Simple face reenactment (facial expression transfer) | Full head reenactment (pose, expression, eyes transfer) |
Clone the repository
git clone https://github.com/michaildoukas/head2head.git
cd head2head
We provide two alternatives for installing Head2Head required packages:
- Create a Conda environment (Requires python 3.7, CUDA 9.2 and Vulkan already installed)
- Build a Docker image (Recommended, requires sudo privileges)
Create a conda environment, using the provided conda-env.txt
file.
conda create --name head2head --file conda-env.txt
Activate the environment.
conda activate head2head
Install dlib, facenet-pytorch, insightface and mxnet with pip (inside the environment):
pip install dlib insightface mxnet-cu92 facenet-pytorch
Install Docker and its dependencies:
sudo ./docker/ubuntu/xenial/vulkan-base/pre_docker_install.sh
Build docker image (Requires about 15 minutes):
sudo ./docker/ubuntu/xenial/vulkan-base/build.sh
Run container over the image:
sudo ./docker/ubuntu/xenial/vulkan-base/run.sh
Change to head2head directory (inside the container):
cd head2head
python scripts/compile_flownet2.py
If you are using docker, run the command above each time you run the container.
Make sure you have downloaded the required models and files for landmark detection, face reconstruction and FlowNet2 checkpoints, with:
python scripts/download_files.py
In case you want to use your own source or target videos, you need to acquire the LSFM model files all_all_all.mat
and lsfm_exp_30.dat
and place them under the preprocessing/files
directory. These files are essential for the 3D face reconstruction stage. For full terms and conditions, and to request access to the models, please visit the LSFM website. For more details on models, see Large Scale Facial Model (LSFM).
We have trained and tested Head2Head on the seven target identities (Turnbull, Obama, Putin, Merkel, Trudeau, Biden, May) shown below:
- Link to the seven original video files, before ROI extraction: [original_videos.zip]. The corresponding YouTube urls, along with the start and stop timestamps are listed in
datasets/head2headDataset/urls.txt
file. - Link to full dataset, with the extracted ROI frames and 3D reconstruction data (NMFCs, landmarks, expression, identity and camera parameters): [dataset.zip]
Alternatively, you can download Head2Head dataset, by running:
python scripts/download_dataset.py
You can download the fine-tuned models (checkpoints) for all seven target identities here, or with:
python scripts/download_checkpoints.py
We split the original video of each identity into one training and one test sequence. We place about one third of the total number of frames in the test split. In this way, we are able to use these frames as ground truth, when testing the model in a self reenactment scenario. In the test set, we also provide the conditional input (NMFCs, landmarks70) for performing head reenactment between each pair of identities in the dataset (source_nmfcs, source_landmarks70
).
head2headDataset ----- original_videos
|
--- dataset ----- train ----- exp_coeffs (expression vectors)
| |
| --- id_coeffs (identity vector)
| |
| --- images (ROI RGB frames)
| |
| --- landmarks70 (68 + 2 facial landmarks)
| |
| --- misc (camera parameters - pose)
| |
| --- nmfcs (GAN conditional input)
|
|
----- test ----- (same directories as train)
|
--- source_images
|
--- source_landmarks70
|
--- source_nmfcs
We have added eight new identities, with longer training video footage ( > 10 mins) [original_videos.zip]. Please download the complete dataset by running:
python scripts/download_dataset.py --dataset head2headDatasetv2
You can also download the trained models (checkpoints) for all eight target identities here, or with:
python scripts/download_checkpoints.py --dataset head2headDatasetv2
You can create your own dataset from .mp4 video files. For that, first do face detection, which returns a fixed bounding box that is used to extract the ROI, around the face. Then, perform 3D face reconstruction and compute the NMFC images, one for each frame of the video. Finally, run facial landmark localisation to get the eye movements.
In order to perform face detection and crop the facial region from a single .mp4 file or a directory with multiple files, run:
python preprocessing/detect.py --original_videos_path <videos_path> --dataset_name <dataset_name> --split <split>
-
<videos_path>
is the path to the original .mp4 file, or a directory of .mp4 files. (default:datasets/head2headDataset/original_videos
) -
<dataset_name>
is the name to be given to the dataset. (default:head2headDataset
) -
<split>
is the data split to place the file(s). If set totrain
, the videos-identities can be used as target, but the last one third of the frames is placed in the test set, enabling self reenactment experiments. When set totest
, the videos-identities can be used only as source and no frames are placed in the training set. (default:train
)
To perform 3D facial reconstruction and compute the NMFC images of all videos/identities in the dataset, run:
python preprocessing/reconstruct.py --dataset_name <dataset_name>
To extract facial landmarks (and eye pupils), run:
python preprocessing/detect_landmarks70.py --dataset_name <dataset_name>
Execute the commands above each time you use the face detection script to add new identity to <dataset_name>
dataset.
In order to train a new person-specific model from scratch, use:
./scripts/train/train_on_target.sh <target_name> <dataset_name>
where <target_name>
is the name of the target identity, which should have been already placed in the <dataset_name>
dataset, after processing the original video file: <target_name>.mp4
.
In self reenactment, the target person is also used as source. In this way we have access to the ground truth video, which provides a means to evaluate the performance of our model.
The following commands generates a video, using as driving input (source video) the kept out, test frames of <target_name>
:
./scripts/test/test_self_reenactment_on_target.sh <target_name> <dataset_name>
Synthesised videos are saved under the ./results
directory.
For transferring the expressions and head pose from a source person, to a target person in our dataset, first we compute the NMFC frames that correspond to the source video, using the 3DMM identity coefficients computed from the target. For better quality, we adapt the mean Scale and Translation camera parameters of the source to the target. Then, we generate the synthetic video, using these NMFC frames as conditional input.
Given a <source_name>
and a <target_name>
from dataset <dataset_name>
, head reenactment results are generated after running:
./scripts/test/test_head_reenactment_from_source_to_target.sh <source_name> <target_name> <dataset_name>
Instead of transferring the head pose from a source video, we can perform simple face reenactment, by keeping the original pose of the target video, and using only the expressions (inner facial movements) of the source.
For a <source_name>
and a <target_name>
from dataset <dataset_name>
, face reenactment results are generated after running:
./scripts/test/test_face_reenactment_from_source_to_target.sh <source_name> <target_name> <dataset_name>
Nearly real-time demo using your camera:
./scripts/demo/run_demo_on_target.sh <target_name> <dataset_name>
In order to increase the generative performance of head2head in short target videos, we can pre-train a model on the a multi-person dataset, such as FaceForensic++, and then fine-tune it on a new target video-identity. You can download a processed version of the 1000 real videos of FaceForensic++ with complete NMFC annotations (requires ~100 GBs of free disk space), with:
python scripts/download_dataset.py --dataset faceforensicspp
and then train head2head on this multi-person dataset
./scripts/train/train_on_faceforensicspp.sh
Alternatively download the trained checkpoint here, or by running:
python scripts/download_checkpoints.py --dataset faceforensicspp
Finally, fine-tune a model on <target_name>
from <dataset_name>
dataset:
./scripts/train/finetune_on_target.sh <target_name> <dataset_name>
Perform head reenactment:
./scripts/test/test_finetuned_head_reenactment_from_source_to_target.sh <source_name> <target_name> <dataset_name>
If you use this code, please cite our Head2Head paper.
@INPROCEEDINGS {head2head2020,
author = {M. Koujan and M. Doukas and A. Roussos and S. Zafeiriou},
booktitle = {2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (FG)},
title = {Head2Head: Video-Based Neural Head Synthesis},
year = {2020},
volume = {},
issn = {},
pages = {319-326},
keywords = {},
doi = {10.1109/FG47880.2020.00048},
url = {https://doi.ieeecomputersociety.org/10.1109/FG47880.2020.00048},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = {may}
}