FM-Fusion: Instance-aware Semantic Mapping
Boosted by Vision-Language Foundation Models

IEEE RA-L 2024
Chuhao Liu¹, Ke Wang^2,*, Jieqi Shi¹, Zhijian Qiao¹, and Shaojie Shen¹

¹HKUST Aerial Robotics Group ²Chang'an University, China
^*Corresponding Author

News

[25th Oct 2024] Publish code and RGB-D sequences from ScanNet and SgSlam.
[5th Jan 2024] Paper accepted by RA-L.

FM-Fusion utilizes RAM, GroundingDINO and SAM to reconstruct an instance-aware semantic map. Boosted by the vision foundational models, FM-Fusion can reconstruct semantic instances in real-world cluttered indoor environments.

The following instruction explains how to run FM-Fusion on our RGB-D sequences or ScanNet sequences. If you find its useful, please cite our paper.

@article{
  title={{FM-Fusion}: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models}, 
  author={Liu, Chuhao and Wang, Ke and Shi, Jieqi and Qiao, Zhijian and Shen, Shaojie},
  journal={IEEE Robotics and Automation Letters(RA-L)}, 
  year={2024}, 
  volume={9}, 
  number={3},
  pages={2232-2239}
}

Tabel of Contents

Install
Download Data
Run Instance-aware Semantic Mapping
Use it in Your RGB-D Camera
Acknowledge
License

1. Install

Install dependency packages from Ubuntu Source

sudo apt-get install libboost-dev libomp-dev libeigen3-dev

Install Open3D from its source code.(Install Tutorial)

git clone https://github.com/isl-org/Open3D
cd Open3D
mkdir build && cd build
cmake -DBUILD_SHARED_LIBS=ON ..
make -j12
sudo make install

Follow the official tutorials to install OpenCV, GLOG, jsoncpp. To make it compatible with ROS, please install OpenCV 3.4.xx.

Clone and compile FM-Fusion,

git clone [email protected]:HKUST-Aerial-Robotics/FM-Fusion.git
mkdir build && cd build
cmake .. -DINSTALL_FMFUSION=ON
make -j12
make install

Install the ROS node program, which renders the semantic instance map in Rviz. Install ROS platform following its official guidance. Then, build the ros node we provide,

git submodule update --init --recursive
cd catkin_ws && catkin_make
source devel/setup.bash

2. Download Data

We provide two datasets to evaluate: SgSlam (captured using Intel Realsense L-515) and ScanNet. Their sequences can be downloaded:

Please check data format for the illustration about data in each sequence. After download the scans folder in each dataset, go to uncompress_data.py and set the data directories to your local directories. Then, un-compress the sequence data.

python scripts/uncompress_data.py

3. Run Instance-aware Semantic Mapping

a. Run with Rviz.

In launch/semantic_mapping.launch, set the directories to your local dataset directories. Then, launch Rviz and the ROS node,

roslaunch sgloop_ros visualize.launch
roslaunch sgloop_ros semantic_mapping.launch

It should incremental reconstruct the semantic map and render the results on Rviz. At the end of the sequence, the program save the output results, where the output format is illustrated in the data format. To visualize the results that are previously reconstructed, open launch/render_semantic_map.launch and set the result_folder directory accordingly. Then,

roslaunch sgloop_ros render_semantic_map.launch

Tips: If you are running the program on a remote server, you can utilize the ROS across machine function. After set the rosmaster following the ROS tutorial, you can launch visualize.launch at your local machine and semantic_mapping.launch at the server. So, you can still visualize the result on your local machine.

b. Run without visualization.

If you do not need the ROS node to visualize, you can skip its install in the above instruction. Then, simply run the C++ executable program and the results will be saved at ${SGSLAM_DATAROOT}/output. The output directory can be set before run the program.

./build/src/IntegrateInstanceMap --config config/realsense.yaml --root ${SGSLAM_DATAROOT}/scans/ab0201_03a --output ${SGSLAM_DATAROOT}/output

4. Use it in Your RGB-D Camera

In SgSlam dataset, we use Intel Realsense-D515 camera and DJI A3 flight controller to collect data sequence Details of the hardware suite can be found in this paper. You can also collect your own dataset using a similar hardware suite.

a. Prepare RGB-D and Camera poses.

We use VINS-Mono to compute visual-inertial odometry (VIO). We save the camera poses of its keyframes in a pose folder.

b. Run RAM, GroundingDINO and SAM.

The three models are combined to run in Grounded-SAM. Please find our adopted Grounded-SAM here. It should generate a prediction folder as explained in data format. Then, you can run the semantic mapping on your dataset.

5. Acknowledge

The hardware is supported by Luqi Wang. We use Open3D to reconstruct instance sub-volume. The vision foundation models RAM, GroundingDINO, and SAM provide instance segmentation on images.

6. License

The source code is released under GPLv3 license. For technical issues, please contact Chuhao LIU ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
catkin_ws/src		catkin_ws/src
cmake		cmake
config		config
doc		doc
measurement_model		measurement_model
scripts		scripts
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FM-Fusion: Instance-aware Semantic Mapping
Boosted by Vision-Language Foundation Models

News

Tabel of Contents

1. Install

2. Download Data

3. Run Instance-aware Semantic Mapping

a. Run with Rviz.

b. Run without visualization.

4. Use it in Your RGB-D Camera

a. Prepare RGB-D and Camera poses.

b. Run RAM, GroundingDINO and SAM.

5. Acknowledge

6. License

About

Releases

Packages

Contributors 2

Languages

HKUST-Aerial-Robotics/FM-Fusion

Folders and files

Latest commit

History

Repository files navigation

FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models

News

Tabel of Contents

1. Install

2. Download Data

3. Run Instance-aware Semantic Mapping

a. Run with Rviz.

b. Run without visualization.

4. Use it in Your RGB-D Camera

a. Prepare RGB-D and Camera poses.

b. Run RAM, GroundingDINO and SAM.

5. Acknowledge

6. License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

FM-Fusion: Instance-aware Semantic Mapping
Boosted by Vision-Language Foundation Models

Packages