fudan-generative-vision · cuijh26 · Oct 18, 2024
diff --git a/README.md b/README.md
@@ -1,167 +1,111 @@
-<p align="center">
-  <img src="assets/CodeFormer_logo.png" height=110>
-</p>
+<h1 align='center'>Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation</h1>
 
-## Towards Robust Blind Face Restoration with Codebook Lookup Transformer (NeurIPS 2022)
+## ⚙️ Installation
 
-[Paper](https://arxiv.org/abs/2206.11253) | [Project Page](https://shangchenzhou.com/projects/CodeFormer/) | [Video](https://youtu.be/d3VDpkXlueI)
+- System requirement: Ubuntu 20.04/Ubuntu 22.04, Cuda 12.1
+- Tested GPUs: A100
 
+Create conda environment:
 
-<a href="https://colab.research.google.com/drive/1m52PNveE4PBhYrecj34cnpEeiHcC5LTb?usp=sharing"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="google colab logo"></a> [![Hugging Face](https://img.shields.io/badge/Demo-%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/sczhou/CodeFormer) [![Replicate](https://img.shields.io/badge/Demo-%F0%9F%9A%80%20Replicate-blue)](https://replicate.com/sczhou/codeformer) [![OpenXLab](https://img.shields.io/badge/Demo-%F0%9F%90%BC%20OpenXLab-blue)](https://openxlab.org.cn/apps/detail/ShangchenZhou/CodeFormer) ![Visitors](https://api.infinitescript.com/badgen/count?name=sczhou/CodeFormer&ltext=Visitors)
-
-
-[Shangchen Zhou](https://shangchenzhou.com/), [Kelvin C.K. Chan](https://ckkelvinchan.github.io/), [Chongyi Li](https://li-chongyi.github.io/), [Chen Change Loy](https://www.mmlab-ntu.com/person/ccloy/) 
-
-S-Lab, Nanyang Technological University
-
-<img src="assets/network.jpg" width="800px"/>
-
-
-:star: If CodeFormer is helpful to your images or projects, please help star this repo. Thanks! :hugs: 
-
-
-### Update
-- **2023.07.20**: Integrated to :panda_face: [OpenXLab](https://openxlab.org.cn/apps). Try out online demo! [![OpenXLab](https://img.shields.io/badge/Demo-%F0%9F%90%BC%20OpenXLab-blue)](https://openxlab.org.cn/apps/detail/ShangchenZhou/CodeFormer)
-- **2023.04.19**: :whale: Training codes and config files are public available now.
-- **2023.04.09**: Add features of inpainting and colorization for cropped and aligned face images.
-- **2023.02.10**: Include `dlib` as a new face detector option, it produces more accurate face identity.
-- **2022.10.05**: Support video input `--input_path [YOUR_VIDEO.mp4]`. Try it to enhance your videos! :clapper: 
-- **2022.09.14**: Integrated to :hugs: [Hugging Face](https://huggingface.co/spaces). Try out online demo! [![Hugging Face](https://img.shields.io/badge/Demo-%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/sczhou/CodeFormer)
-- **2022.09.09**: Integrated to :rocket: [Replicate](https://replicate.com/explore). Try out online demo! [![Replicate](https://img.shields.io/badge/Demo-%F0%9F%9A%80%20Replicate-blue)](https://replicate.com/sczhou/codeformer)
-- [**More**](docs/history_changelog.md)
-
-### TODO
-- [x] Add training code and config files
-- [x] Add checkpoint and script for face inpainting
-- [x] Add checkpoint and script for face colorization
-- [x] ~~Add background image enhancement~~
-
-#### :panda_face: Try Enhancing Old Photos / Fixing AI-arts
-[<img src="assets/imgsli_1.jpg" height="226px"/>](https://imgsli.com/MTI3NTE2) [<img src="assets/imgsli_2.jpg" height="226px"/>](https://imgsli.com/MTI3NTE1) [<img src="assets/imgsli_3.jpg" height="226px"/>](https://imgsli.com/MTI3NTIw) 
-
-#### Face Restoration
-
-<img src="assets/restoration_result1.png" width="400px"/> <img src="assets/restoration_result2.png" width="400px"/>
-<img src="assets/restoration_result3.png" width="400px"/> <img src="assets/restoration_result4.png" width="400px"/>
-
-#### Face Color Enhancement and Restoration
-
-<img src="assets/color_enhancement_result1.png" width="400px"/> <img src="assets/color_enhancement_result2.png" width="400px"/>
-
-#### Face Inpainting
-
-<img src="assets/inpainting_result1.png" width="400px"/> <img src="assets/inpainting_result2.png" width="400px"/>
-
-
-
-### Dependencies and Installation
-
-- Pytorch >= 1.7.1
-- CUDA >= 10.1
-- Other required packages in `requirements.txt`
+```bash
+  conda create -n hallo python=3.10
+  conda activate hallo
 ```
-# git clone this repository
-git clone https://github.com/sczhou/CodeFormer
-cd CodeFormer
-
-# create new anaconda env
-conda create -n codeformer python=3.8 -y
-conda activate codeformer
-
-# install python dependencies
-pip3 install -r requirements.txt
-python basicsr/setup.py develop
-conda install -c conda-forge dlib (only for face detection or cropping with dlib)
-```
-<!-- conda install -c conda-forge dlib -->
 
-### Quick Inference
+Install packages with `pip`
 
-#### Download Pre-trained Models:
-Download the facelib and dlib pretrained models from [[Releases](https://github.com/sczhou/CodeFormer/releases/tag/v0.1.0) | [Google Drive](https://drive.google.com/drive/folders/1b_3qwrzY_kTQh0-SnBoGBgOrJ_PLZSKm?usp=sharing) | [OneDrive](https://entuedu-my.sharepoint.com/:f:/g/personal/s200094_e_ntu_edu_sg/EvDxR7FcAbZMp_MA9ouq7aQB8XTppMb3-T0uGZ_2anI2mg?e=DXsJFo)] to the `weights/facelib` folder. You can manually download the pretrained models OR download by running the following command:
-```
-python scripts/download_pretrained_models.py facelib
-python scripts/download_pretrained_models.py dlib (only for dlib face detector)
-```
-
-Download the CodeFormer pretrained models from [[Releases](https://github.com/sczhou/CodeFormer/releases/tag/v0.1.0) | [Google Drive](https://drive.google.com/drive/folders/1CNNByjHDFt0b95q54yMVp6Ifo5iuU6QS?usp=sharing) | [OneDrive](https://entuedu-my.sharepoint.com/:f:/g/personal/s200094_e_ntu_edu_sg/EoKFj4wo8cdIn2-TY2IV6CYBhZ0pIG4kUOeHdPR_A5nlbg?e=AO8UN9)] to the `weights/CodeFormer` folder. You can manually download the pretrained models OR download by running the following command:
-```
-python scripts/download_pretrained_models.py CodeFormer
+```bash
+  pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
+  pip install -r requirements.txt
 ```
 
-#### Prepare Testing Data:
-You can put the testing images in the `inputs/TestWhole` folder. If you would like to test on cropped and aligned faces, you can put them in the `inputs/cropped_faces` folder. You can get the cropped and aligned faces by running the following command:
+Besides, ffmpeg is also needed:
+```bash
+  apt-get install ffmpeg
 ```
-# you may need to install dlib via: conda install -c conda-forge dlib
-python scripts/crop_align_face.py -i [input folder] -o [output folder]
-```
-
 
-#### Testing:
-[Note] If you want to compare CodeFormer in your paper, please run the following command indicating `--has_aligned` (for cropped and aligned face), as the command for the whole image will involve a process of face-background fusion that may damage hair texture on the boundary, which leads to unfair comparison.
+### 📥 Download Pretrained Models
 
-Fidelity weight *w* lays in [0, 1]. Generally, smaller *w* tends to produce a higher-quality result, while larger *w* yields a higher-fidelity result. The results will be saved in the `results` folder.
+You can easily get all pretrained models required by inference from our [HuggingFace repo](https://huggingface.co/fudan-generative-ai/hallo2).
 
+Clone the pretrained models into `${PROJECT_ROOT}/pretrained_models` directory by cmd below:
 
-🧑🏻 Face Restoration (cropped and aligned face)
-```
-# For cropped and aligned faces (512x512)
-python inference_codeformer.py -w 0.5 --has_aligned --input_path [image folder]|[image path]
+```shell
+git lfs install
+git clone https://huggingface.co/fudan-generative-ai/hallo2 pretrained_models
 ```
 
-:framed_picture: Whole Image Enhancement
-```
-# For whole image
-# Add '--bg_upsampler realesrgan' to enhance the background regions with Real-ESRGAN
-# Add '--face_upsample' to further upsample restorated face with Real-ESRGAN
-python inference_codeformer.py -w 0.7 --input_path [image folder]|[image path]
-```
+Or you can download them separately from their source repo:
+- [hallo2](https://huggingface.co/fudan-generative-ai/hallo2/blob/main/hallo2/net_g.pth): Our checkpoint of video super-resolution.
+- [facelib](https://github.com/sczhou/CodeFormer/releases/tag/v0.1.0): pretrained face parse models
+- [realesrgan](https://github.com/sczhou/CodeFormer/releases/download/v0.1.0/RealESRGAN_x2plus.pth): background upsample model
+- [CodeFormer](https://github.com/sczhou/CodeFormer/releases/download/v0.1.0): pretrained [Codeformer](https://github.com/sczhou/CodeFormer) model, it's optional to download it, only if you want to train our video super-resolution model from scratch
 
-:clapper: Video Enhancement
-```
-# For Windows/Mac users, please install ffmpeg first
-conda install -c conda-forge ffmpeg
-```
-```
-# For video clips
-# Video path should end with '.mp4'|'.mov'|'.avi'
-python inference_codeformer.py --bg_upsampler realesrgan --face_upsample -w 1.0 --input_path [video path]
-```
+Finally, these pretrained models should be organized as follows:
 
-🌈 Face Colorization (cropped and aligned face)
-```
-# For cropped and aligned faces (512x512)
-# Colorize black and white or faded photo
-python inference_colorization.py --input_path [image folder]|[image path]
+```text
+./pretrained_models/
+|-- CodeFormer/
+|   |-- codeformer.pth
+|   `-- vqgan_code1024.pth
+|-- facelib
+|   |-- detection_mobilenet0.25_Final.pth
+|   |-- detection_Resnet50_Final.pth
+|   |-- parsing_parsenet.pth
+|   |-- yolov5l-face.pth
+|   `-- yolov5n-face.pth
+|-- hallo2
+|   `-- net_g.pth
+`-- realesrgan
+    `-- RealESRGAN_x2plus.pth
 ```
 
-🎨 Face Inpainting (cropped and aligned face)
-```
-# For cropped and aligned faces (512x512)
-# Inputs could be masked by white brush using an image editing app (e.g., Photoshop) 
-# (check out the examples in inputs/masked_faces)
-python inference_inpainting.py --input_path [image folder]|[image path]
-```
-### Training:
-The training commands can be found in the documents: [English](docs/train.md) **|** [简体中文](docs/train_CN.md).
+### 🎮 Run Inference
+#### High-Resolution animation
+Simply to run the `scripts/video_sr.py` and pass `input_path` and `output_path`:
 
-### Citation
-If our work is useful for your research, please consider citing:
+```bash
+python scripts/video_sr.py --input_path [input_video] --output_path [output_dir] --bg_upsampler realesrgan --face_upsample -w 1 -s 4
+```
 
-    @inproceedings{zhou2022codeformer,
-        author = {Zhou, Shangchen and Chan, Kelvin C.K. and Li, Chongyi and Loy, Chen Change},
-        title = {Towards Robust Blind Face Restoration with Codebook Lookup TransFormer},
-        booktitle = {NeurIPS},
-        year = {2022}
-    }
+Animation results will be saved at `output_dir`. 
 
-### License
+For more options:
 
-This project is licensed under <a rel="license" href="https://github.com/sczhou/CodeFormer/blob/master/LICENSE">NTU S-Lab License 1.0</a>. Redistribution and use should follow this license.
+```shell
+usage: video_sr.py [-h] [-i INPUT_PATH] [-o OUTPUT_PATH] [-w FIDELITY_WEIGHT] [-s UPSCALE] [--has_aligned] [--only_center_face] [--draw_box]
+                   [--detection_model DETECTION_MODEL] [--bg_upsampler BG_UPSAMPLER] [--face_upsample] [--bg_tile BG_TILE] [--suffix SUFFIX]
 
-### Acknowledgement
+options:
+  -h, --help            show this help message and exit
+  -i INPUT_PATH, --input_path INPUT_PATH
+                        Input video
+  -o OUTPUT_PATH, --output_path OUTPUT_PATH
+                        Output folder. 
+  -w FIDELITY_WEIGHT, --fidelity_weight FIDELITY_WEIGHT
+                        Balance the quality and fidelity. Default: 0.5
+  -s UPSCALE, --upscale UPSCALE
+                        The final upsampling scale of the image. Default: 2
+  --has_aligned         Input are cropped and aligned faces. Default: False
+  --only_center_face    Only restore the center face. Default: False
+  --draw_box            Draw the bounding box for the detected faces. Default: False
+  --detection_model DETECTION_MODEL
+                        Face detector. Optional: retinaface_resnet50, retinaface_mobile0.25, YOLOv5l, YOLOv5n. Default: retinaface_resnet50
+  --bg_upsampler BG_UPSAMPLER
+                        Background upsampler. Optional: realesrgan
+  --face_upsample       Face upsampler after enhancement. Default: False
+  --bg_tile BG_TILE     Tile size for background sampler. Default: 400
+  --suffix SUFFIX       Suffix of the restored faces. Default: None
+```
 
-This project is based on [BasicSR](https://github.com/XPixelGroup/BasicSR). Some codes are brought from [Unleashing Transformers](https://github.com/samb-t/unleashing-transformers), [YOLOv5-face](https://github.com/deepcam-cn/yolov5-face), and [FaceXLib](https://github.com/xinntao/facexlib). We also adopt [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) to support background image enhancement. Thanks for their awesome works.
+## Training
+##### prepare data for training
+We use the VFHQ dataset for training, you can download from its [homepage](https://liangbinxie.github.io/projects/vfhq/). Then updata `dataroot_gt` in `./configs/train/video_sr.yaml`.
 
-### Contact
-If you have any questions, please feel free to reach me out at `[email protected]`. 
+#### training
+Start training with the following command:
+```shell
+python -m torch.distributed.launch --nproc_per_node=8 --master_port=4652 \
+basicsr/train.py -opt ./configs/train/video_sr.yaml \
+--launcher pytorch
+```
diff --git a/assets/CodeFormer_logo.png b/assets/CodeFormer_logo.png
diff --git a/assets/color_enhancement_result1.png b/assets/color_enhancement_result1.png
diff --git a/assets/color_enhancement_result2.png b/assets/color_enhancement_result2.png
diff --git a/assets/framework.png b/assets/framework.png
diff --git a/assets/framework_1.jpg b/assets/framework_1.jpg
diff --git a/assets/framework_2.jpg b/assets/framework_2.jpg
diff --git a/assets/imgsli_1.jpg b/assets/imgsli_1.jpg
diff --git a/assets/imgsli_2.jpg b/assets/imgsli_2.jpg
diff --git a/assets/imgsli_3.jpg b/assets/imgsli_3.jpg
diff --git a/assets/inpainting_result1.png b/assets/inpainting_result1.png
diff --git a/assets/inpainting_result2.png b/assets/inpainting_result2.png
diff --git a/assets/network.jpg b/assets/network.jpg
diff --git a/assets/restoration_result1.png b/assets/restoration_result1.png
diff --git a/assets/restoration_result2.png b/assets/restoration_result2.png
diff --git a/assets/restoration_result3.png b/assets/restoration_result3.png
diff --git a/assets/restoration_result4.png b/assets/restoration_result4.png
diff --git a/assets/wechat.jpeg b/assets/wechat.jpeg
diff --git a/basicsr/VERSION b/basicsr/VERSION
diff --git a/basicsr/__init__.py b/basicsr/__init__.py
diff --git a/basicsr/archs/__init__.py b/basicsr/archs/__init__.py
diff --git a/basicsr/archs/arcface_arch.py b/basicsr/archs/arcface_arch.py
diff --git a/basicsr/archs/arch_util.py b/basicsr/archs/arch_util.py