InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

Our new Re:InterHand dataset has been released, which has much more diverse image appearances with more stable 3D GT. Check it out at here!

Introduction

This repo is official PyTorch implementation of InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image (ECCV 2020).
Our InterHand2.6M dataset is the first large-scale real-captured dataset with accurate GT 3D interacting hand poses.
Videos of 3D joint coordinates (from joint_3d.json) from the 30 fps split: [single hand] [two hands].
Videos of MANO fittings from the 30 fps split: [single hand] [two hands].

Above demo videos have low-quality frames because of the compression for the README upload.

News

2021.06.10. Boxs in RootNet results are updated to be correct.
2021.03.22. Finally, InterHand2.6M v1.0, which includes all images of 5 fps and 30 fps version, is released! 🎉 This is the dataset used in InterHand2.6M paper.
2020.11.26. Demo code for a random image is added! Checkout below instructions.
2020.11.26. Fitted MANO parameters are updated to the better ones (fitting error is about 5 mm). Also, reduced to much smaller file size by providing parameters fitted to the world coordinates (independent on the camera view).
2020.10.7. Fitted MANO parameters are available! They are obtained by NeuralAnnot.

InterHand2.6M dataset

For the InterHand2.6M dataset download and instructions, go to [HOMEPAGE].
Belows are instructions for our baseline model, InterNet, for 3D interacting hand pose estimation from a single RGB image.

Demo on a random image

Download pre-trained InterNet from here
Put the model at demo folder
Go to demo folder and edit bbox in here
run python demo.py --gpu 0 --test_epoch 20
You can see result_2D.jpg and 3D viewer.

MANO mesh rendering demo

Install SMPLX
cd tool/MANO_render
Set smplx_path in render.py
Run python render.py

MANO parameter conversion from the world coordinate to the camera coordinate system

Install SMPLX
cd tool/MANO_world_to_camera/
Set smplx_path in convert.py
Run python convert.py

Camera positions visualization demo

cd tool/camera_visualize
Run python camera_visualize.py

As there are many cameras, you'd better set subset and split in line 9 and 10, respectively, by yourself.

Directory

Root

The ${ROOT} is described as below.

${ROOT}
|-- data
|-- common
|-- main
|-- output

data contains data loading codes and soft links to images and annotations directories.
common contains kernel codes for 3D interacting hand pose estimation.
main contains high-level codes for training or testing the network.
output contains log, trained models, visualized outputs, and test result.

Data

You need to follow directory structure of the data as below.

${ROOT}
|-- data
|   |-- STB
|   |   |-- data
|   |   |-- rootnet_output
|   |   |   |-- rootnet_stb_output.json
|   |-- RHD
|   |   |-- data
|   |   |-- rootnet_output
|   |   |   |-- rootnet_rhd_output.json
|   |-- InterHand2.6M
|   |   |-- annotations
|   |   |   |-- train
|   |   |   |-- test
|   |   |   |-- val
|   |   |-- images
|   |   |   |-- train
|   |   |   |-- test
|   |   |   |-- val
|   |   |-- rootnet_output
|   |   |   |-- rootnet_interhand2.6m_output_test.json
|   |   |   |-- rootnet_interhand2.6m_output_test_30fps.json
|   |   |   |-- rootnet_interhand2.6m_output_val.json
|   |   |   |-- rootnet_interhand2.6m_output_val_30fps.json

Download InterHand2.6M data [HOMEPAGE]
Download STB parsed data [images] [annotations]
Download RHD parsed data [images] [annotations]
All annotation files follow MS COCO format.
If you want to add your own dataset, you have to convert it to MS COCO format.

Output

You need to follow the directory structure of the output folder as below.

${ROOT}
|-- output
|   |-- log
|   |-- model_dump
|   |-- result
|   |-- vis

log folder contains training log file.
model_dump folder contains saved checkpoints for each epoch.
result folder contains final estimation files generated in the testing stage.
vis folder contains visualized results.

Running InterNet

Start

In the main/config.py, you can change settings of the model including dataset to use and which root joint translation vector to use (from gt or from RootNet).

Train

In the main folder, run

python train.py --gpu 0-3

to train the network on the GPU 0,1,2,3. --gpu 0,1,2,3 can be used instead of --gpu 0-3. If you want to continue experiment, run use --continue.

Test

Place trained model at the output/model_dump/.

In the main folder, run

python test.py --gpu 0-3 --test_epoch 20 --test_set $DB_SPLIT

to test the network on the GPU 0,1,2,3 with snapshot_20.pth.tar. --gpu 0,1,2,3 can be used instead of --gpu 0-3.

$DB_SPLIT is one of [val,test].

val: The validation set. Val in the paper.
test: The test set. Test in the paper.

Results

Here I provide the performance and pre-trained snapshots of InterNet, and output of the RootNet as well.

Pre-trained InterNet

RootNet output

RootNet codes

Codes
See RootNet for the code instructions.

Reference

@InProceedings{Moon_2020_ECCV_InterHand2.6M,  
author = {Moon, Gyeongsik and Yu, Shoou-I and Wen, He and Shiratori, Takaaki and Lee, Kyoung Mu},  
title = {InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image},  
booktitle = {European Conference on Computer Vision (ECCV)},  
year = {2020}  
}

License

InterHand2.6M is CC-BY-NC 4.0 licensed, as found in the LICENSE file.

[Terms of Use] [Privacy Policy]

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
assets		assets
common		common
data		data
demo		demo
main		main
tool		tool
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

Our new Re:InterHand dataset has been released, which has much more diverse image appearances with more stable 3D GT. Check it out at here!

Introduction

News

InterHand2.6M dataset

Demo on a random image

MANO mesh rendering demo

MANO parameter conversion from the world coordinate to the camera coordinate system

Camera positions visualization demo

Directory

Root

Data

Output

Running InterNet

Start

Train

Test

Results

Pre-trained InterNet

RootNet output

RootNet codes

Reference

License

About

Releases 1

Packages

Contributors 2

Languages

License

facebookresearch/InterHand2.6M

Folders and files

Latest commit

History

Repository files navigation

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

Our new Re:InterHand dataset has been released, which has much more diverse image appearances with more stable 3D GT. Check it out at here!

Introduction

News

InterHand2.6M dataset

Demo on a random image

MANO mesh rendering demo

MANO parameter conversion from the world coordinate to the camera coordinate system

Camera positions visualization demo

Directory

Root

Data

Output

Running InterNet

Start

Train

Test

Results

Pre-trained InterNet

RootNet output

RootNet codes

Reference

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages