SLAM-Supported Semi-Supervised Learning for 6D Object Pose Estimation
Check out our paper!
TLDR: We exploit robust pose graph optimization results to pseudo-label robot-collected RGB images and fine-tune 6D object pose estimators during object-based navigation.
The two most important features of this work
-
A SLAM-aided self-training procedure for 6D object pose estimation.
-
Automatic covariance tuning (ACT), a robust pose graph optimization method, enabling flexible uncertainty modeling for learning-based measurements.
We combine object pose prediction with camera odometry to infer object-level 3D scene geometry. We leverage the consistent state estimates to pseudo-label training images and fine-tune the pose estimator.
Create a conda environment to install the dependencies.
cd /path/to/slam-super-6d
conda env create -f environment.yml
- Download a DOPE weights file from here to your favorite folder. (Initial: before self-training; Self-trained: after self-training; Supervised: after supervised training.)
- Change this line to point to the weights file.
- Save the test images to
/path/to/image/folder/
. - Run
cd /path/to/slam-super-6d
python3 experiments/ycbv/inference/inference.py --data /path/to/image/folder/ --outf /output/folder/
- Get the object pose preditions saved in the TUM format at
/output/folder/0000.txt
. - Please check out the DOPE Github repo for more details on how to train/run DOPE networks.
Given a sequence of unlabeled images, how to generate pseudo labels (pseudo ground truth poses)?
- Step 1: Choose your favorite pose estimator and camera odometry pipeline.
- Step 2: Predict the object poses in the images and save them to
${obj_1}.txt
,${obj_2}.txt
, ...,${obj_n}.txt
in TUM format. (If the estimator failed for a certain frame, usetimestamp
+70
s for the corresponding line.) - Step 3: Estimate camera motion and save the noisy pose measurements to
${odom}.txt
in TUM format. - Step 4: Generate pseudo ground truth poses:
- If the objects are from the YCB video dataset, download their PoseRBPF auto-encoder model weights and codebooks to this and this folder, and use the Hybrid mode for pseudo-labeling:
cd /path/to/slam-super-6d python3 src/pseudo_labeler.py --joint --optim 1 --mode 2 --dets ${obj_1}.txt ${obj_2}.txt ... ${obj_n}.txt --odom ${odom}.txt --obj ${obj_1_name} ${obj_2_name} ... ${obj_n_name} --imgs "/path/to/unlabeled/images/*.png" --intrinsics ${fx} ${fy} ${cx} ${cy} ${s} --out ${output}
- Otherwise, use the Inlier labeling mode, which disables the rendered-to-real RoI comparison:
cd /path/to/slam-super-6d python3 src/pseudo_labeler.py --joint --optim 1 --mode 1 --dets ${obj_1}.txt ${obj_2}.txt ... ${obj_n}.txt --odom ${odom}.txt --out ${output}
- Step 5: Get the pseudo ground truth poses at
${output}/obj1.txt
,${output}/obj2.txt
, ...,${output}/objn.txt
in TUM format.
As an example, you should be able to get the this file (up to floating point discrepancies) if you run (Hybrid pseudo-labeling):
cd /path/to/slam-super-6d
python3 src/pseudo_labeler.py --joint --optim 1 --mode 2 --dets ./experiments/ycbv/inference/010_potted_meat_can_16k/Initial/0002.txt --odom ./experiments/ycbv/odom/results/0002.txt --obj 010_potted_meat_can --imgs "/path/to/YCB-V/data/0002/*-color.png" --out /output/folder/
And get this file if you run (Inlier pseudo-labeling):
cd /path/to/slam-super-6d
python3 src/pseudo_labeler.py --joint --optim 1 --mode 1 --dets ./experiments/ycbv/inference/010_potted_meat_can_16k/Initial/0002.txt --odom ./experiments/ycbv/odom/results/0002.txt --out /output/folder/
We're using pre-commit for automatic linting. To install pre-commit
run:
pip3 install pre-commit
You can verify your installation went through by running pre-commit --version
and you should see something like pre-commit 2.14.1
.
To get started using pre-commit
with this codebase, from the project repo run:
pre-commit install
Now, each time you git add
new files and try to git commit
your code will automatically be run through a variety of linters. You won't be able to commit anything until the linters are happy with your code.
Thanks to Jonathan Tremblay for suggestions on DOPE network training and synthetic data generation.