Skip to content

Latest commit

 

History

History
83 lines (50 loc) · 4.01 KB

README.md

File metadata and controls

83 lines (50 loc) · 4.01 KB

stereo-depth

GPU accelerated single view passive stereo depth estimation pipeline. Architecture of the pipeline is presented at the image bellow.

Github Readme Diagram

Features

  • Real-time DNN based right view generation
  • Multiple depth estimation backends
    • Real-time CUDA stereo matching algorithm
    • Group-wise Correlation Stereo Network (GwcNet)
    • MobileStereoNet (MSNet2D & MSNet3D)
  • REST API for the entire depth estimation pipeline

Right View Synthesis module

Architecture and the data flow of the Right view syntehsis module is presented on the image bellow.

RVS

Depth Estimation module

There are multiple depth estimation backends implemented - CUDA stereo matching algorithm, GwcNet and MobileStereoNet. Backend can be configured when creating an instance of DepthEstimationPipeline class.

CUDA stereo matching algorithm

The algorithm consists of 9 steps, each of which can we efficiently implemented for execution on GPUs:

  1. Input images are converted to grayscale
  2. Input images are scaled down by the factor of $K$ using Mean Pooling algorithm
  3. Matching cost volume construction using SAD as dissimilarity measure
  4. Multi-block cost function aggregation
  5. Winner-take-all disparity selection
  6. Secondary matching based on 1D disparity optimization using cost space parabola fit
  7. Disparity map upscale by the factor of $K$
  8. Vertical disparity fill using bilateral estimation
  9. Horizontal disparity fill using bilateral estimation

GwcNet

Architecture of the GwcNet model is presented at the image bellow.

GwcNet Architecture

MobileStereoNet

The architecture of the MobileStereoNet model is very similar to the GwcNet model. Main differences inclue using depth-wise separable convolutions instead of regular 3D convolutions, as well as using different method for constructing the combined cost volume from the feature maps of left and right input images (presented at the image bellow).

MobileStereoNet3d

Demo pipeline runs

All videos are saved in 10 FPS, however, in reality, the frame rate of each of the depth estimation modules differes - stereo matching algorithm works at 30 FPS, GwcNet works at 6 FPS and MobileStereoNet works at 4 FPS.

Right View Synthesis + CUDA stereo matching algorithm

cuda_2022-09-19_18-28-13.mp4

Right View Synthesis + GwcNet

gwcnet_2022-09-19_18-28-51.mp4

Right View Synthesis + MobileStereoNet

msnet3d_2022-09-19_18-30-28.mp4

Main references

  • Right view synthesis

    • Xie, Junyuan, Ross Girshick, and Ali Farhadi. "Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks." European conference on computer vision. Springer, Cham, 2016

    • Luo, Yue, et al. "Single view stereo matching." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

  • CUDA stereo matching

    • Chang, Qiong, and Tsutomu Maruyama. "Real-time stereo vision system: a multi-block matching on GPU." IEEE Access 6 (2018): 42030-42046.
  • GwcNet

    • Guo, Xiaoyang, et al. "Group-wise correlation stereo network." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
  • MobileStereoNet

    • Shamsafar, Faranak, et al. "Mobilestereonet: Towards lightweight deep networks for stereo matching." Proceedings of the ieee/cvf winter conference on applications of computer vision. 2022.