From fa37e392517052d9afd0f31af96ef033d4426666 Mon Sep 17 00:00:00 2001 From: Florian Echtler Date: Mon, 4 Jan 2021 11:20:12 +0100 Subject: [PATCH] v0.2.0 --- README.md | 75 +++++++++++++++++++++++---------- deepseg.cc | 2 +- retrain.md => models/retrain.md | 0 3 files changed, 54 insertions(+), 23 deletions(-) rename retrain.md => models/retrain.md (100%) diff --git a/README.md b/README.md index 0553c17..b74b288 100644 --- a/README.md +++ b/README.md @@ -34,6 +34,10 @@ I've heard good things about this deep learning stuff, so let's try that. I firs I had a look at the corresponding [Python example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/python/label_image.py), [C++ example](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/examples/label_image), and [Android example](https://github.com/tensorflow/examples/tree/master/lite/examples/image_segmentation/android), and based on those, I first cobbled together a [Python demo](https://github.com/floe/deepbacksub/blob/master/deepseg.py). That was running at about 2.5 FPS, which is really excruciatingly slow, so I built a [C++ version](https://github.com/floe/deepbacksub/blob/master/deepseg.cc) which manages 10 FPS without too much hand optimization. Good enough. +I've also tested a TFLite-converted version of the [Body-Pix model](https://blog.tensorflow.org/2019/11/updated-bodypix-2.html), but the results haven't been much different to DeepLab for this use case. + +More recently, Google has released a model specifically trained for [person segmentation that's used in Google Meet](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html). This has way better performance than DeepLab, both in terms of speed and of accuracy, so this is now the default. It needs one custom op from the MediaPipe framework, but that was quite easy to integrate. Thanks to @jiangjianping for pointing this out in the [corresponding issue](https://github.com/floe/deepbacksub/issues/28). + ## Replace Background This is basically one line of code with OpenCV: `bg.copyTo(raw,mask);` Told you that's the easy part. @@ -48,46 +52,73 @@ The dataflow through the whole program is roughly as follows: - init - load background.png, convert to YUYV - - load DeepLab v3+ network, initialize TFLite + - initialize TFLite, register custom op + - load Google Meet segmentation model - setup V4L2 Loopback device (w,h,YUYV) - loop - grab raw YUYV image from camera - - extract square ROI in center - - downscale ROI to 257 x 257 (*) + - extract portrait ROI in center + - downscale ROI to 144 x 256 (*) - convert to RGB (*) - - run DeepLab v3+ - - convert result to binary mask for class "person" + - run Google Meet segmentation model + - convert result to binary mask using softmax - denoise mask using erode/dilate - upscale mask to raw image size - copy background over raw image with mask (see above) - `write()` data to virtual video device -(*) these are required input parameters for DeepLab v3+ +(*) these are required input parameters for this model ## Requirements Tested with the following dependencies: + + - Ubuntu 20.04, x86-64 + - Linux kernel 5.6 (stock package) + - OpenCV 4.2.0 (stock package) + - V4L2-Loopback 0.12.5 (stock package) + - Tensorflow Lite 2.4.0 (from [repo](https://github.com/tensorflow/tensorflow/tree/v2.4.0/tensorflow/lite)) - Ubuntu 18.04.5, x86-64 - - Linux kernel 4.15 (stock package) - - OpenCV 3.2.0 (stock package) - - V4L2-Loopback 0.10.0 (stock package) - - Tensorflow Lite 2.1.0 (from [repo](https://github.com/tensorflow/tensorflow/tree/v2.1.0/tensorflow/lite)) - - Ultra-short build guide for Tensorflow Lite C++ library: clone repo above, then... - - run `./tensorflow/lite/tools/make/download_dependencies.sh` - - run `./tensorflow/lite/tools/make/build_lib.sh` + - Linux kernel 4.15 (stock package) + - OpenCV 3.2.0 (stock package) + - V4L2-Loopback 0.10.0 (stock package) + - Tensorflow Lite 2.1.0 (from [repo](https://github.com/tensorflow/tensorflow/tree/v2.1.0/tensorflow/lite)) Tested with the following software: + - Firefox - - 74.0.1 (works) + - 84.0 (works) - 76.0.1 (works) + - 74.0.1 (works) - Skype - - 8.58.0.93 (works) + - 8.67.0.96 (works) - 8.60.0.76 (works) - - guvcview 2.0.5 (works with parameter `-c read`) - - Microsoft Teams 1.3.00.5153 (works) - - Chrome 81.0.4044.138 (works) - - Zoom 5.0.403652.0509 (works - yes, I'm a hypocrite, I tested it with Zoom after all :-) - + - 8.58.0.93 (works) + - guvcview + - 2.0.6 (works with parameter `-c read`) + - 2.0.5 (works with parameter `-c read`) + - Microsoft Teams + - 1.3.00.30857 (works) + - 1.3.00.5153 (works) + - Chrome + - 87.0.4280.88 (works) + - 81.0.4044.138 (works) + - Zoom - yes, I'm a hypocrite, I tested it with Zoom after all :-) + - 5.4.54779.1115 (works) + - 5.0.403652.0509 (works) + +## Building + +Install dependencies (`sudo apt install libopencv-dev build-essential v4l2loopback-dkms`). + +Run `make` to build everything (should also clone and build Tensorflow Lite). + +If the first part doesn't work: + - Clone https://github.com/tensorflow/tensorflow/ repo into tensorflow/ folder + - Checkout tag v2.4.0 + - run ./tensorflow/lite/tools/make/download_dependencies.sh + - run ./tensorflow/lite/tools/make/build_lib.sh + ## Usage First, load the v4l2loopback module (extra settings needed to make Chrome work): @@ -106,13 +137,13 @@ As usual: pull requests welcome. - Resolution is currently hardcoded to 640x480 (lowest common denominator). - Only works with Linux, because that's what I use. - Needs a webcam that can produce raw YUYV data (but extending to the common YUV420 format should be trivial) - - CPU hog: maxes out two cores on my 2.7 GHz i5 machine for just VGA @ 10 FPS. - - Uses stock Deeplab v3+ network. Maybe re-training with only "person" and "background" classes could improve performance? ## Fixed - Should probably do a erosion (+ dilation?) operation on the mask. - Background image size needs to match camera resolution (see issue #1). + - CPU hog: maxes out two cores on my 2.7 GHz i5 machine for just VGA @ 10 FPS. Fixed via Google Meet segmentation model. + - Uses stock Deeplab v3+ network. Maybe re-training with only "person" and "background" classes could improve performance? Fixed via Google Meet segmentation model. ## Other links diff --git a/deepseg.cc b/deepseg.cc index 2b2c211..7c88b71 100644 --- a/deepseg.cc +++ b/deepseg.cc @@ -112,7 +112,7 @@ void *grab_thread(void *arg) { int main(int argc, char* argv[]) { - printf("deepseg v0.1.0\n"); + printf("deepseg v0.2.0\n"); printf("(c) 2020 by floe@butterbrot.org\n"); printf("https://github.com/floe/deepseg\n"); diff --git a/retrain.md b/models/retrain.md similarity index 100% rename from retrain.md rename to models/retrain.md