Cédric Rommel, Eduardo Valle, Mickaël Chen, Souhaiel Khalfaoui, Renaud Marlet, Matthieu Cord, Patrick Pérez
We present an innovative approach to 3D Human Pose Estimation (3D-HPE) by integrating cutting-edge diffusion models, which have revolutionized diverse fields, but are relatively unexplored in 3D-HPE. We show that diffusion models enhance the accuracy, robustness, and coherence of human pose estimations. We introduce DiffHPE, a novel strategy for harnessing diffusion models in 3D-HPE, and demonstrate its ability to refine standard supervised 3D-HPE. We also show how diffusion models lead to more robust estimations in the face of occlusions, and improve the time-coherence and the sagittal symmetry of predictions. Using the Human,3.6M dataset, we illustrate the effectiveness of our approach and its superiority over existing models, even under adverse situations where the occlusion patterns in training do not match those in inference. Our findings indicate that while standalone diffusion models provide commendable performance, their accuracy is even better in combination with supervised models, opening exciting new avenues for 3D-HPE research.
The code requires Python 3.7 or later. The file requirements.txt contains the full list of required Python modules.
pip install -r requirements.txt
You may also optionally install MLFlow for experiment tracking:
pip install mlflow
The Human3.6M dataset was set following the AnyGCN repository. Please refer to it to set it up.
Consider adding the path to where the data is stored to the data.data_dir
field in the conf/config.yaml
file. Alternatively, this information can also be passed directly to the training/test command line if preferred, as explained below.
You can download checkpoints of pretrained models from the assets of the last code release, and put them inside pre-trained-models
in subfolders diff_model_ckpts
(for DiffHPE-2D and DiffHPE-Wrapper checkpoints) and conditioners_ckpts
(for all others).
Both pre-trained DiffHPE-2D and DiffHPE-Wrapper checkpoints are available in pre-trained-models/diff_model_ckpts
folder and can be evaluated.
Just run the command below (evaluate on 27 frames input) for the DiffHPE-Wrapper for example:
python main_h36m_lifting.py run.mode=test data.data_dir=/PATH/TO/H36M/DATA/ eval.model_l=pre-trained-models/diff_model_ckpts/diffhpe-wrapper
Note that you can omit the data.data_dir
part of the command if you filled the corresponding field in conf/config.yaml
beforehand.
To evaluate DiffHPE-2D, just change the path passed to the eval.model_l
as follows:
python main_h36m_lifting.py run.mode=test data.data_dir=/PATH/TO/H36M/DATA/ eval.model_l=pre-trained-models/diff_model_ckpts/diffhpe-2d
Given a pre-trained model checkpoint, uou can visualize the predicted poses using the script viz.py
. For example:
python viz.py data.data_dir=/PATH/TO/H36M/DATA/ eval.model_l=pre-trained-models/diff_model_ckpts/diffhpe-wrapper viz.viz_limit=600
The visualization configuration can be changed within the viz
field, in conf/config.yaml
.
To train DiffHPE-2D from scratch, run:
python main_h36m_lifting.py data.data_dir=/PATH/TO/H36M/DATA/ +train=diffhpe-2d +diffusion=diffhpe-2d
Likewise, you train DiffHPE-Wrapper from scratch with this command:
python main_h36m_lifting.py data.data_dir=/PATH/TO/H36M/DATA/
The previous commands will train the diffusion models with standard data. If you want to train with simulated occlusions, you can choose a different data config from conf/data
. For example, to train a DiffHPE-2D model with consecutive frames occlusion, run:
python main_h36m_lifting.py +data=lifting_cpn17_test_seq27_frame_miss data.data_dir=/PATH/TO/H36M/DATA/ +train=diffhpe-2d +diffusion=diffhpe-2d
Note that, in the case of DiffHPE-Wrapper, you also need to change the checkpoint of the pre-trained conditionner model to on which was trained with the same types of occlusion:
python main_h36m_lifting.py +data=lifting_cpn17_test_seq27_frame_miss data.data_dir=/PATH/TO/H36M/DATA/ diffusion.cond_ckpt=pre-trained-models/conditioners_ckpts/prt_mixste_h36m_L27_C64_structured_frame_miss.pt
This codebase can also be used to retrain the supervised MixSTE baseline (without training tricks):
python main_h36m_lifting.py data.data_dir=/PATH/TO/H36M/DATA/ +train=sup_mixste_seq27 +diffusion=sup_mixste_seq27
Great part of this diffusion code was copied and modified from A generic diffusion-based approach for 3D human pose prediction in the wild, which is also heavily inspired by CSDI.
Human pose lifting, as well as GCN-related code was borrowed from AnyGCN, which builds on top of several other repositories, including:
The baseline model MixSTE was modified from its official paper repository.
@INPROCEEDINGS{rommel2023diffhpe,
title={DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion},
author={Rommel, C{\'e}dric and Valle, Eduardo and Chen, Micka{\"e}l and Khalfaoui, Souhaiel and Marlet, Renaud and Cord, Matthieu and P{\'e}rez, Patrick},
booktitle={International Conference on Computer Vision Workshops (ICCVW)},
year = {2023}
}