This is the readme file for the code release of "3D Human Pose Estimation with Spatio-Temporal Criss-cross Attention" on PyTorch platform.
Thank you for your interest, the code and checkpoints are being updated.
3D Human Pose Estimation with Spatio-Temporal Criss-cross Attention,
Zhenhua Tang, Zhaofan Qiu, Yanbin Hao, Richang Hong, And Ting Yao,
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
checkpoint/: the folder for model weights of STCFormer.
dataset/: the folder for data loader.
common/: the folder for basic functions.
model/: the folder for STCFormer network.
run_stc.py: the python code for STCFormer networks training.
Make sure you have the following dependencies installed:
- PyTorch >= 0.4.0
- NumPy
- Matplotlib=3.1.0
Our model is evaluated on Human3.6M and MPI-INF-3DHP datasets.
We set up the Human3.6M dataset in the same way as VideoPose3D.
We set up the MPI-INF-3DHP dataset in the same way as P-STMO.
For the training stage, please run:
python run_stc.py -f 27 -b 128 --train 1 --layers 6 -s 3
For the testing stage, please run:
python run_stc.py -f 27 -b 128 --train 0 --layers 6 -s 1 --reload 1 --previous_dir ./checkpoint/your_best_model.pth
You can download our pre-trained models from Google Drive or Baidu Disk (extraction code:STC1). Put them in the ./checkpoint directory.
To evaluate our STCFormer model on the 2D keypoints obtained by CPN, please run:
python run_stc.py -f 27 -b 128 --train 0 --layers 6 -s 1 -k 'cpn_ft_h36m_dbb' --reload 1 --previous_dir ./checkpoint/model_27_STCFormer/no_refine_6_4406.pth
python run_stc.py -f 81 -b 128 --train 0 --layers 6 -s 1 -k 'cpn_ft_h36m_dbb' --reload 1 --previous_dir ./checkpoint/model_81_STCFormer/no_refine_6_4172.pth
Different models use different configurations as follows.
Frames | P1 (mm) | P2 (mm) |
---|---|---|
27 | 44.08 | 34.76 |
81 | 41.72 | 32.94 |
Since the model with 243-frames input is proprietary and stored exclusively on the company server, it is unavailable due to copyright restrictions. If you require results based on that specific model, I recommend training a similar model internally to achieve the desired outcome.
The pre-trained models and codes for STCFormer are currently undergoing updates. In the meantime, you can run this code, which is based on an earlier version and may lack organization, to observe the results for 81 frames.
python run_3dhp_stc.py --train 0 --frames 81 -b 128 -s 1 --reload 1 --previous_dir ./checkpoint/model_81_STMO/no_refine_8_2310.pth
Accroding MHFormer, make sure to download the YOLOv3 and HRNet pretrained models here and put it in the './demo/lib/checkpoint' directory firstly. Then, you need to put your in-the-wild videos in the './demo/video' directory.
You can modify the 'get_pose3D' function in the 'vis.py' script according to your needs, including the checkpoint and model parameters, and then execute the following command:
python demo/vis.py --video sample_video.mp4
If you find this repo useful, please consider citing our paper:
@inproceedings{tang20233d,
title={3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention},
author={Tang, Zhenhua and Qiu, Zhaofan and Hao, Yanbin and Hong, Richang and Yao, Ting},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={4790--4799},
year={2023}
}
Our code refers to the following repositories.
VideoPose3D
StridedTransformer-Pose3D
P-STMO
MHFormer
MixSTE
FTCM
We thank the authors for releasing their codes.