SfMLearner applied to the Canadian Planetary Emulation Terrain Energy-Aware Rover Navigation Dataset (CPET) [1]. The report for this project can be found here.
This code base is largely built off the SfMLearner PyTorch implementation by Clément Pinard. The original project page of "Unsupervised Learning of Depth and Ego-Motion from Video" [2] by Tinghui Zhou, Matthew Brown, Noah Snavely, and David Lowe can be found here.
Sample Disparity Predictions on CPET:
The goal of this project is to investigate the feasibility of SfMLearner for tracking in low-textured martian-like environments from monochrome image sequences. The Canadian Planetary Emulation Terrain Energy-Aware Rover Navigation Dataset provides the necessary data to explore this idea. On a high-level, here is what's been done:
- Supervised depth pre-training pipeline if you wish to accelerate the joint learning of pose and depth by leveraging ground-truth pose
- Unsupervised learning of motion and depth on the CPET dataset (with option to use pre-trained depth weights)
- Methods for generating, aligning, and plotting (2D & 3D) of absolute trajectories from relative pose estimates during online training
- An independent evaluation pipeline that generates quantitative metrics (ATE) on a specified sequence with trained depth and pose CNN weights
- OpenCV (4.5.0)
- PyTorch (1.4.0)
- MatPlotLib (3.3.3)
- NumPy (1.19.4)
Training and evaluation require no pre-processing of data. Simply download the "Human-readable (base) data download" runs of interest from the CPET webpage. You'll also need rover transform and camera intrinsics files. Unpack these into the following directory structure:
/run1_base_hr
/run2_base_hr
/run3_base_hr
/run4_base_hr
/run5_base_hr
/run6_base_hr
/cameras_intrinsics.txt
/rover_transforms.txt
Any missing files (e.g. global-pose-utm for run5) can be manually downloaded from the dataset drive.
There are potentially many factors that inhibit SfMLearner in martian-like environments. For instance, pixel regions across images are extremely similar due to the scene's homogenous nature, and operating on monochrome images offer lower pixel variance. So, unsupervised learning from scratch may take much longer to converge than expected, or might not converge at all.
After downloading the CPET data, you can train the depth network with the command below. Additional flags can be found according to train_depth.py.
python train_depth.py --exp-name <exp-name> --dataset-dir <path/to/data/root>
Alternatively, you can download the pre-trained depth network weights (epoch 5) from this link.
Run the train_joint.py to jointly train the pose and depth network with:
python train_joint.py --exp-name <exp-name> --dataset-dir <path/to/data/root> --disp-net <path/to/pre-trained/weights>
The --disp-net flag is optional - if left unfilled, the script will default to training the depth and pose network from scratch in fully unsupervised fashion. The program will save plots of estimated trajectory on the validation sequence at each epoch. A sample plot on run2_base_hr sequence is given below. Quantitative metrics / model checkpoints will be saved in the experiment directory. Trained model weights for the depth and pose network can be found here and here.
Sample Pose Estimation in Bird's Eye View (Run2 Trajectory):
The evaluate_joint.py script is used to evaluate the trained models on the test sequence (run6), but it works just fine on any of the training and validation sequences as well. You can run evaluation on --run-sequence 'run1'-'run6' with:
python evaluate_joint.py --exp-name <exp-name> --run-sequence <seq_name> --dataset-dir <path/to/data/root> --disp-net <path/to/depth/weights> --pose-net <path/to/pose/weights>
Sample Pose Estimation in 3D (Run2 Trajectory):
Here are the results on all runs of the CPET dataset. Note that these results are acquired through pre-training the depth network prior to joint learning of pose and depth. ATE Easy is the Absolute Trajectory Error (ATE) computed over the Umeyama aligned (similarity transform alignment) trajectories. ATE Hard is the ATE computed over the Horn's Closed Form aligned trajectories, where the start points of the estimated and ground-truth trajectories are identical. These metrics, amongst others, are generated by the evaluation script.
Sequence | ATE Easy | ATE Hard | Loss | Time (hh:mm:ss) |
---|---|---|---|---|
Run 1 (train) | 3.364 | 7.976 | 5.27e-02 | 0:12:24 |
Run 2 (train) | 3.154 | 6.896 | 4.54e-02 | 0:12:23 |
Run 3 (train) | 2.816 | 3.882 | 5.62e-02 | 0:11:32 |
Run 4 (train) | 3.354 | 5.263 | 4.18e-02 | 0:14:56 |
Run 5 (val) | 5.601 | 10.696 | 4.20e-02 | 0:21:37 |
Run 6 (test) | 8.206 | 24.010 | 4.51e-02 | 0:22:27 |
- Lamarre, O., Limoyo, O., Marić, F., & Kelly, J. (2020). The Canadian Planetary Emulation Terrain Energy-Aware Rover Navigation Dataset. The International Journal of Robotics Research, 39(6), 641-650. doi:10.1177/0278364920908922
- Zhou, T., Brown, M., Snavely, N., & Lowe, D. G. (2017). Unsupervised Learning of Depth and Ego-Motion from Video. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2017.700