🚨 This repository contains the code and trained models of our work "GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction", ICCV 2023
by Youmin Zhang, Fabio Tosi, Stefano Mattoccia and Matteo Poggi
Department of Computer Science and Engineering (DISI), University of Bologna
Note: 🚧 Kindly note that this repository is currently in the development phase.
3D Reconstruction and Trajectory Error. From left to right: RGB-D methods (iMAP, NICE-SLAM, DROID-SLAM, and ours), ground truth scan, and monocular methods (DROID-SLAM and ours).
We introduce GO-SLAM, a deep-learning-based dense visual SLAM framework that achieves real-time global optimization of poses and 3D reconstruction. By integrating robust pose estimation, efficient loop closing, and continuous surface representation updates, GO-SLAM effectively addresses the error accumulation and distortion challenges associated with neural implicit representations. Through the utilization of learned global geometry from input history, GO-SLAM sets new benchmarks in tracking robustness and reconstruction accuracy across synthetic and real-world datasets. Notably, its versatility encompasses monocular, stereo, and RGB-D inputs..
Contributions:
-
A novel deep-learning-based, real-time global pose optimization system that considers the complete history of input frames and continuously aligns all poses.
-
An efficient alignment strategy that enables instantaneous loop closures and correction of global structure, being both memory and time efficient.
-
An instant 3D implicit reconstruction approach, enabling on-the-fly and continuous 3D model update with the latest global pose estimates. This strategy facilitates real-time 3D reconstructions.
-
The first deep-learning architecture for joint robust pose estimation and dense 3D reconstruction suited for any setup: monocular, stereo, or RGB-D cameras.
Architecture Overview
GO-SLAM consists of three parallel threads: front-end tracking, back-end tracking, and instant mapping. It can run with monocular, stereo, and RGB-D input.
🖋️ If you find this code useful in your research, please cite:
@inproceedings{zhang2023goslam,
author = {Zhang, Youmin and Tosi, Fabio and Mattoccia, Stefano and Poggi, Matteo},
title = {GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
}
You can create an anaconda environment called go-slam
. For linux, you need to install libopenexr-dev before creating the environment.
git clone --recursive https://github.com/youmi-zym/GO-SLAM
sudo apt-get install libopenexr-dev
conda env create -f environment.yaml
conda activate go-slam
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
pip install evo --upgrade --no-binary evo
python setup.py install
Download the data from Google Drive, and then you can run:
# please modify the OUT_DIR firstly in the script, and also DATA_ROOT in the config file
# MODE can be [rgbd, mono], EXP_NAME is the experimental name you want
./evaluate_on_replica.sh MODE EXP_NAME
# for example
./evaluate_on_replica.sh rgbd first_try
Mesh and corresponding evaluated metrics are available in OUT_DIR.
We also upload our predicted mesh on Google Drive. Enjoy!
Please follow the data downloading procedure on ScanNet website, and extract color/depth frames from the .sens
file using this code.
[Directory structure of ScanNet (click to expand)]
DATAROOT is ./Datasets
by default. If a sequence (sceneXXXX_XX
) is stored in other places, please change the input_folder
path in the config file or in the command line.
DATAROOT
└── ScanNet
└── scans
└── scene0000_00
└── frames
├── color
│ ├── 0.jpg
│ ├── 1.jpg
│ ├── ...
│ └── ...
├── depth
│ ├── 0.png
│ ├── 1.png
│ ├── ...
│ └── ...
├── intrinsic
└── pose
├── 0.txt
├── 1.txt
├── ...
└── ...
Once the data is downloaded and set up properly, you can run:
# please modify the OUT_DIR firstly in the script, and also DATA_ROOT in the config file
# MODE can be [rgbd, mono], EXP_NAME is the experimental name you want
./evaluate_on_scannet.sh MODE EXP_NAME
# for example
./evaluate_on_scannet.sh rgbd first_try
# besides, you can generate video as shown in our project page by:
./generate_video_on_scannet.sh rgbd first_try_on_video
We also upload our predicted mesh on Google Drive. Enjoy!
Please use the following script to download the EuRoC dataset. The GT trajectory can be downloaded from Google Drive.
Please put the GT trajectory of each scene to the corresponding folder, as shown below:
[Directory structure of EuRoC (click to expand)]
DATAROOT is ./Datasets
by default. If a sequence (e.g., MH_01_easy
) is stored in other places, please change the input_folder
path in the config file or in the command line.
DATAROOT
└── EuRoC
└── MH_01_easy
└── mav0
├── cam0
├── cam1
├── imu0
├── leica0
├── state_groundtruth_estimate0
└── body.yaml
└── MH_01_easy.txt
Then you can run:
# for data downloading:
DATA_ROOT=path/to/folder
mkdir $DATA_ROOT
./scripts/download_euroc.sh $DATA_ROOT
# please modify the OUT_DIR firstly in the script, and also DATA_ROOT in the config file
# MODE can be [stereo, mono], EXP_NAME is the experimental name you want
./evaluate_on_euroc.sh MODE EXP_NAME
# for example
./evaluate_on_euroc.sh stereo first_try
In this section, we present illustrative examples that demonstrate the effectiveness of our proposal.
Qualitative results on ScanNet dataset. We evaluate our RGB-D mode SLAM using the ScanNet dataset and benchmark it against state-of-the-art techniques. Our method showcases improved global-consistency in reconstruction results.
Qualitative results on Replica dataset. Supporting both Monocular and RGB-D modes, our GO-SLAM is evaluated on the Replica dataset. It achieves real-time, high-quality 3D reconstruction from monocular or RGB-D input. This stands in contrast to NICE-SLAM, designed solely for depth input, which operates at a frame rate of less than 1 per second and requires hours to achieve comparable outcomes.
Qualitatives examples of LC and full BA on scene0054 00 (ScanNet) with a total of 6629 frames. . In (a), a significant error accumulates when no global optimization is available. With loop closing (b), the system is able to eliminate the trajectory error using global geometry. Additionally, online full BA optimizes (c) the poses of all existing keyframes. The final model (d), which integrates both loop closing and full BA, achieves a more complete and accurate 3D model prediction.
For questions, please send an email to [email protected], [email protected] or [email protected]
We sincerely thank the scholarship supported by China Scholarship Council (CSC).
We adapted some codes from some awesome repositories including NICE-SLAM, NeuS and DROID-SLAM.