Windows Camera post process(DMFT) with DirectML(Tensorflow) #385

MarkHung00 · 2022-08-29T00:55:39Z

We are developing Camera post process @ DMFT (Windows User space DLL),
and we currently hope to run DirectML (with Tensorflow) in DMFT.
We have tried porting C++ sample(DirectMLSuperResolution) of DirectML to DMFT & it's work,
But the part of tensoftflow we don't know how to proceed

Best of best regards

MarkHung00 · 2022-08-29T05:25:51Z

From the DirectML sample, it is necessary to implement each operand in C++. Whether there is a similar Intel OpenVino, you can convert the tensortflow model into IR through the model optimizer, and let the C++ program load the IR .
https://miro.medium.com/max/1230/1*c83JJoHVHOXNGapF1JT86Q.png
Because the camera post process DMFT of windows is in the C++ environment, it will be more convenient to use if we can load the tensorflow model in the C++ environment

PatriceVignola · 2022-08-29T05:48:29Z

If you already have a trained python TensorFlow model, you could freeze it into a .pb file and use the tensorflow-directml C API to load it at runtime. Would that be a good solution for you?

Once your model has been converted to a frozen .pb file, you can use the C API to load it with TF_LoadSessionFromSavedModel and then call TF_SessionRun.

I believe this is the most straightforward and fastest way to get your model working with DirectML. If you need help navigating the TensorFlow C API, please let us know!

Edit: Alternatively, if you are familiar with ONNX and can convert your model to an ONNX model, you could even use onnxruntime instead of TensorFlow, which can use DirectML underneath.

MarkHung00 · 2022-09-07T06:54:45Z

Hi PatriceVignola,
Thank you very much for the information, it's very helpful to us. There seems to be no complete sample code of TF_LoadSessionFromSavedModel on the Internet.

Do you have relevant information or sample demo code, we can have a trial, thanks!

PatriceVignola · 2022-09-08T07:00:08Z

Hi @MarkHung00,

I created a basic sample over here. The sample goes through the process of loading a frozen squeezenet.pb model, creating a graph from it and finally creating a session. Feel free to extract the parts that are relevant for you and let me know if you run into any issues or have other questions!

MarkHung00 · 2022-09-13T00:35:28Z

Hi PatriceVignola,

Thank you very much for your assistance, we have successfully run with Visual Studio, and will use other models in the next stage,
In addition to the recommended operator mentioned in the link below, is there any restriction on the data type (ex: does DirectML Tensorflow support FP32/FP16/INT16/INT8)
https://docs.microsoft.com/en-us/windows/ai/directml/dml-intro

Thanks a lot

PatriceVignola · 2022-09-14T06:28:26Z

@MarkHung00 tensorflow-directml supports a subset of the operators supported by the default GPU (CUDA) device. To see which data types each operator supports, you can look at the source. For example, for Gather:

tensorflow-directml/tensorflow/core/kernels/dml_gather_op.cc

Line 347 in a4a0e27

TF_CALL_float(DML_REGISTER_KERNELS);

For example, FP32 and FP16 is the data type the most commonly supported across DML operators, while int32 is reserved for CPU instead.

MarkHung00 · 2022-09-15T06:05:09Z

Hi PatriceVignola

Thanks a lot, We are currently developing some real time scenarios. The inference time of Nvidia GPU is still satisfactory. We have some questions

For tensorflow model, does DirectML have performance tuning guide or performance evaluation tools
From the information in the link below, does it mean that per operator(layer) data type can be difference

(the article mention that==>see the documentation for each specific operator to find which data types it supports.)
https://docs.microsoft.com/en-us/windows/win32/api/directml/ne-directml-dml_tensor_data_type
From the link below from Intel, DirectML v.s Open, the openvino seems have better performance, we hope to know how to modify the tensorflow model to best fit directml
https://www.intel.cn/content/www/cn/zh/developer/articles/technical/effective-deployment-ai-workloads-with-xe.html

We hope that the inference framework can flexibly use various GPUs (Intel/AMD/Nvidia), directml is good choice, but in the future there may be models that require more computing power and longer inference time

PatriceVignola · 2022-09-15T06:23:08Z

For TensorFlow performance tuning, most of the tools that work for CUDA also work for DirectML. For example, we like to use the chrome tracing format outlined in the post since it shows a good timeline of all operators that are being executed, and is easy to read.
Yes, each operator has a different list of data types that it supports. For example, if you look at the bottom of the Convolution page, you see that it supports float16 and float32.
DirectML performance on Intel heavily depends on the devices, but we're working with them to make sure that DirectML becomes a competitive framework on their platform.

Also, take note that this repository (tensorflow-directml 1.15) is mostly in maintenance mode. We're still doing bug fixes and improving performance, but we're now more focused on the preview of our plugin for TF 2. We don't have a C API for the plugin yet, but it's coming soon!

MarkHung00 · 2022-09-21T08:52:54Z

Hi PatriceVignola

Highly appreciate your help, in addition, we have two questions

We may have scenarios where multiple models are inference at the same time. From the DirectML tensorflow c lib, we can select CPU or DirectML supported GPU from TF_ImportGraphDefOptionsSetDefaultDevice, but if there are two models and "only Nvidia GPU can achieve" realtime performance, we'd like to know, on the user side we can do multi thread to call TF_SessionRun, DirectML scheduling, is the execution order schedule by FIFO, or there are other scheduling mechanisms or the priority we can optimize the execution order
CONV operator only supports FP16, we'd like to know if DirectML supports Quantization aware training(QAT)
thanks again for your great help!

PatriceVignola · 2022-09-21T20:28:15Z

Before trying to do multithreading, I would suggest looking at the GPU usage for a single model. Ideally, most models should run near 100% GPU usage and therefore multithreading won't help and may even make it slower due to context switch and higher memory usage. If your model has poor GPU usage, we'd like to help you investigate what the problem is in order to make it better!
DirectML has quantization support, but tensorflow-directml doesn't have it at this time.

MarkHung00 changed the title ~~Windows Camera post process(DMFT) with DirectML(Tensortflow)~~ Windows Camera post process(DMFT) with DirectML(Tensorflow) Sep 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows Camera post process(DMFT) with DirectML(Tensorflow) #385

Windows Camera post process(DMFT) with DirectML(Tensorflow) #385

MarkHung00 commented Aug 29, 2022

MarkHung00 commented Aug 29, 2022 •

edited

Loading

PatriceVignola commented Aug 29, 2022 •

edited

Loading

MarkHung00 commented Sep 7, 2022 •

edited

Loading

PatriceVignola commented Sep 8, 2022

MarkHung00 commented Sep 13, 2022

PatriceVignola commented Sep 14, 2022 •

edited

Loading

MarkHung00 commented Sep 15, 2022 •

edited

Loading

PatriceVignola commented Sep 15, 2022

MarkHung00 commented Sep 21, 2022

PatriceVignola commented Sep 21, 2022

Windows Camera post process(DMFT) with DirectML(Tensorflow) #385

Windows Camera post process(DMFT) with DirectML(Tensorflow) #385

Comments

MarkHung00 commented Aug 29, 2022

MarkHung00 commented Aug 29, 2022 • edited Loading

PatriceVignola commented Aug 29, 2022 • edited Loading

MarkHung00 commented Sep 7, 2022 • edited Loading

PatriceVignola commented Sep 8, 2022

MarkHung00 commented Sep 13, 2022

PatriceVignola commented Sep 14, 2022 • edited Loading

MarkHung00 commented Sep 15, 2022 • edited Loading

PatriceVignola commented Sep 15, 2022

MarkHung00 commented Sep 21, 2022

PatriceVignola commented Sep 21, 2022

MarkHung00 commented Aug 29, 2022 •

edited

Loading

PatriceVignola commented Aug 29, 2022 •

edited

Loading

MarkHung00 commented Sep 7, 2022 •

edited

Loading

PatriceVignola commented Sep 14, 2022 •

edited

Loading

MarkHung00 commented Sep 15, 2022 •

edited

Loading