Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Camera post process(DMFT) with DirectML(Tensorflow) #385

Open
MarkHung00 opened this issue Aug 29, 2022 · 10 comments
Open

Windows Camera post process(DMFT) with DirectML(Tensorflow) #385

MarkHung00 opened this issue Aug 29, 2022 · 10 comments

Comments

@MarkHung00
Copy link

We are developing Camera post process @ DMFT (Windows User space DLL),
and we currently hope to run DirectML (with Tensorflow) in DMFT.
We have tried porting C++ sample(DirectMLSuperResolution) of DirectML to DMFT & it's work,
But the part of tensoftflow we don't know how to proceed

Best of best regards

@MarkHung00
Copy link
Author

MarkHung00 commented Aug 29, 2022

From the DirectML sample, it is necessary to implement each operand in C++. Whether there is a similar Intel OpenVino, you can convert the tensortflow model into IR through the model optimizer, and let the C++ program load the IR .
https://miro.medium.com/max/1230/1*c83JJoHVHOXNGapF1JT86Q.png
Because the camera post process DMFT of windows is in the C++ environment, it will be more convenient to use if we can load the tensorflow model in the C++ environment

@PatriceVignola
Copy link
Contributor

PatriceVignola commented Aug 29, 2022

If you already have a trained python TensorFlow model, you could freeze it into a .pb file and use the tensorflow-directml C API to load it at runtime. Would that be a good solution for you?

Once your model has been converted to a frozen .pb file, you can use the C API to load it with TF_LoadSessionFromSavedModel and then call TF_SessionRun.

I believe this is the most straightforward and fastest way to get your model working with DirectML. If you need help navigating the TensorFlow C API, please let us know!

Edit: Alternatively, if you are familiar with ONNX and can convert your model to an ONNX model, you could even use onnxruntime instead of TensorFlow, which can use DirectML underneath.

@MarkHung00
Copy link
Author

MarkHung00 commented Sep 7, 2022

Hi PatriceVignola,
Thank you very much for the information, it's very helpful to us. There seems to be no complete sample code of TF_LoadSessionFromSavedModel on the Internet.

Do you have relevant information or sample demo code, we can have a trial, thanks!

@PatriceVignola
Copy link
Contributor

Hi @MarkHung00,

I created a basic sample over here. The sample goes through the process of loading a frozen squeezenet.pb model, creating a graph from it and finally creating a session. Feel free to extract the parts that are relevant for you and let me know if you run into any issues or have other questions!

@MarkHung00
Copy link
Author

Hi PatriceVignola,

Thank you very much for your assistance, we have successfully run with Visual Studio, and will use other models in the next stage,
In addition to the recommended operator mentioned in the link below, is there any restriction on the data type (ex: does DirectML Tensorflow support FP32/FP16/INT16/INT8)
https://docs.microsoft.com/en-us/windows/ai/directml/dml-intro

Thanks a lot

@MarkHung00 MarkHung00 changed the title Windows Camera post process(DMFT) with DirectML(Tensortflow) Windows Camera post process(DMFT) with DirectML(Tensorflow) Sep 13, 2022
@PatriceVignola
Copy link
Contributor

PatriceVignola commented Sep 14, 2022

@MarkHung00 tensorflow-directml supports a subset of the operators supported by the default GPU (CUDA) device. To see which data types each operator supports, you can look at the source. For example, for Gather:

TF_CALL_float(DML_REGISTER_KERNELS);

For example, FP32 and FP16 is the data type the most commonly supported across DML operators, while int32 is reserved for CPU instead.

@MarkHung00
Copy link
Author

MarkHung00 commented Sep 15, 2022

Hi PatriceVignola

Thanks a lot, We are currently developing some real time scenarios. The inference time of Nvidia GPU is still satisfactory. We have some questions

We hope that the inference framework can flexibly use various GPUs (Intel/AMD/Nvidia), directml is good choice, but in the future there may be models that require more computing power and longer inference time

@PatriceVignola
Copy link
Contributor

  1. For TensorFlow performance tuning, most of the tools that work for CUDA also work for DirectML. For example, we like to use the chrome tracing format outlined in the post since it shows a good timeline of all operators that are being executed, and is easy to read.
  2. Yes, each operator has a different list of data types that it supports. For example, if you look at the bottom of the Convolution page, you see that it supports float16 and float32.
  3. DirectML performance on Intel heavily depends on the devices, but we're working with them to make sure that DirectML becomes a competitive framework on their platform.

Also, take note that this repository (tensorflow-directml 1.15) is mostly in maintenance mode. We're still doing bug fixes and improving performance, but we're now more focused on the preview of our plugin for TF 2. We don't have a C API for the plugin yet, but it's coming soon!

@MarkHung00
Copy link
Author

Hi PatriceVignola

Highly appreciate your help, in addition, we have two questions

  • We may have scenarios where multiple models are inference at the same time. From the DirectML tensorflow c lib, we can select CPU or DirectML supported GPU from TF_ImportGraphDefOptionsSetDefaultDevice, but if there are two models and "only Nvidia GPU can achieve" realtime performance, we'd like to know, on the user side we can do multi thread to call TF_SessionRun, DirectML scheduling, is the execution order schedule by FIFO, or there are other scheduling mechanisms or the priority we can optimize the execution order

  • CONV operator only supports FP16, we'd like to know if DirectML supports Quantization aware training(QAT)
    thanks again for your great help!

@PatriceVignola
Copy link
Contributor

  1. Before trying to do multithreading, I would suggest looking at the GPU usage for a single model. Ideally, most models should run near 100% GPU usage and therefore multithreading won't help and may even make it slower due to context switch and higher memory usage. If your model has poor GPU usage, we'd like to help you investigate what the problem is in order to make it better!
  2. DirectML has quantization support, but tensorflow-directml doesn't have it at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants