Modifying ORT to load 3rd Party Model #7931

RaqHer3 · 2021-06-03T15:41:41Z

RaqHer3
Jun 3, 2021

I have a class called MyModelProto that contains all the same fields that an ONNX ModelProto structure would have (but without any Protobuf dependencies). My goal is to run ORT with my 3rd party MyModelProto (already loaded on RAM) rather than with the Protobuf-based onnx::ModelProto (which ORT reads from the onnx file path).

Looking at the code, I wanted to find the best approach to implement this load. I believe that using/modifying an existing load function should be the easiest approach? (And if so, which one(s)?)

In inference_session.h, I see there are many load functions for the ONNX and ORT format models. From what I understand, the onnx model is loaded by passing a protobuf object corresponding to the model file and then having this “model_proto” be copied by the API.

#if !defined(ORT_MINIMAL_BUILD)
  /**
    * Load an ONNX model.
    * @param protobuf object corresponding to the model file. model_proto will be copied by the API.
    * @return OK if success.
    */
  common::Status Load(const ONNX_NAMESPACE::ModelProto& model_proto) ORT_MUST_USE_RESULT;

Then, it seems that several additional files call onnx functions directly in order to get the data stored inside AttributeProto, TensorProto, etc…

Files/Functions:

model.cc: 
- const std::string DocString();
- Version Model::IrVersion();
- ONNX_NAMESPACE::ModelProto ToProto();
- etc...

tensorprotoutils.cc:
- bool HasRawData(const ONNX_NAMESPACE::TensorProto& ten_proto);
- bool HasTensor(const ONNX_NAMESPACE::AttributeProto& at_proto);
- Status UnpackTensor(const ONNX_NAMESPACE::TensorProto& tensor, ...);
- etc...

graph.cc:
- TypeProto TypeProtoFromTensorProto(const TensorProto& tensor);
- void RemoveInvalidValues(ONNX_NAMESPACE::TypeProto& type);
-  common::Status NodeArg::UpdateTypeAndShape(const ONNX_NAMESPACE::TypeProto& input_type, bool strict, bool override_types, const logging::Logger& logger);
- etc...

I guess I will need to modify several of the functions in the above files in order to read directly from my MyModelProto? I see that ORT has a class Model, which contains the fields for the graph and basic information, such as model version, producer version, ir version, doc_string, etc.

namespace fbs {
struct Model;
}  // namespace fbs
}  // namespace experimental

// A machine learning model representation class.
// Besides a main <Graph>, it also holds basic information, say,
// model version, model domain, model author, license etc.
class Model {

For simplicity, an overview of my questions with some additional ones:

Using/modifying an existing load function should be the easiest approach?
If so, which one(s)?
I guess I will need to modify several of the functions in the above files in order to read directly from my MyModelProto?
Also, which load function should be the start point of my modification? (I believe it should be one in inference_session.h?)
I want this to work with both ORT_MINIMAL_BUILD and not ORT_MINIMAL_BUILD, are there any differences I should pay attention to for each one?

pranavsharma · 2021-06-08T18:54:14Z

pranavsharma
Jun 8, 2021

You've identified most of the functions that are used internally. This is going to be a lot of work, plus you'll have to do this in a fork and maintain the fork since it wouldn't make sense for us to accept this contribution as ORT is very tightly coupled with the ONNX model format which is based on protobuf. If you don't care about the size of the build you can skip the ORT_MINIMAL_BUILD related ifdefs.

0 replies

gineshidalgo99 · 2021-07-13T14:14:14Z

gineshidalgo99
Jul 13, 2021

Thanks @pranavsharma for the advice! We dropped that idea and are instead targeting directly using ORT model files. We have 2 questions about these:

We could not find any tutorial/information about how to create the ORT file from C++ (only from Python in [1]). How would this be possible from C++? We were able to generate ORT files by running a normal session and adding SessionOptions->SetOptimizedModelFilePath("SomeModelPath.ort"). Is this the right way or would this have some side-effect or issues?
In addition, we could not find much information about how the session options are used with ORT files and whether we need to use them twice (i.e., when generating the ORT file and when using it). E.g., the pipeline we are following is:

Optimize model (with session options) + save as ort file by adding SetOptimizedModelFilePath
Read ORT file and run it. But do we need to run the same session options here or are they just hard-coded when the ORT file was generated so we do not need them here again? (i.e., flags like SetGraphOptimizationLevel, OrtSessionOptionsAppendExecutionProvider_DML or SetIntraOpNumThreads)

This is the pseudo-code of what we tried for this:

bool SetUpNet(const std::string& FullModelFilePath) {
	// Set session options for GPU/CPU
	if (GetDeviceType() == GPU) {
		OrtSessionOptionsAppendExecutionProvider_DML(*Impl->SessionOptions, 0);
		SessionOptions->SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);
	} else {
		SessionOptions->SetIntraOpNumThreads(2);
		SessionOptions->SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
	}

	// If onnx file --> Create ORT file and load it. If ORT file (i.e., already created previously), load it
	if (FileExtension == "onnx") {
		SessionOptions->SetOptimizedModelFilePath(OutputORTOptimizedModelPath);
		Session = std::make_unique<Ort::Session>(Environment, FullModelFilePath, SessionOptions);
		return SetUpNet(OutputORTOptimizedModelPath);
	} else if (FileExtension == "ort") {
		Session = std::make_unique<Ort::Session>(Environment, FullModelFilePath, SessionOptions);
		return true;
	}
}

That code works as long as we keep everything on the CPU (i.e, if we do not use OrtSessionOptionsAppendExecutionProvider_DML). However, we are facing weird errors when trying to use the GPU version:

Exception code 6
e.what(): "Non-zero status code returned while running Conv node. Name:'heatmap/conv1/separable_conv2d/depthwise' Status Message: D:\P4\ue5_main_pitt64\Engine\Source\../Restricted/NotForLicensees/Plugins/NeuralNetworkInference/Source/ThirdParty/ONNXRuntime/Public/core/providers/dml/DmlExecutionProvider/src/MLOperatorAuthorImpl.cpp(1694)\UnrealEditor-ONNXRuntime.dll!00007FFC4F80073C: (caller: 00007FFC4F95F88F) Exception(2) tid(d538) 8007023E {Application Error}
The exception %s (0x".

Is hardware (e.g., GPU brand/model) or OS fixed for the ORT file? E.g., if I generate a ORT file in a machine with Windows 10 and a Nvidia 1080, can I use that same ORT file in a Mac with an AMD card?
All the online doc we could find for ORT files is [1], [2], and [3]. Please, let me know if there is other important doc we might be missing!
[1] https://www.onnxruntime.ai/docs/how-to/mobile/model-conversion.html
[2] https://www.onnxruntime.ai/docs/resources/graph-optimizations.html
[3] https://www.onnxruntime.ai/docs/resources/mobile-performance-tuning.html

Thanks!

1 reply

pranavsharma Jul 16, 2021

Thanks @pranavsharma for the advice! We dropped that idea and are instead targeting directly using ORT model files. We have 2 questions about these:

We could not find any tutorial/information about how to create the ORT file from C++ (only from Python in [1]). How would this be possible from C++? We were able to generate ORT files by running a normal session and adding SessionOptions->SetOptimizedModelFilePath("SomeModelPath.ort"). Is this the right way or would this have some side-effect or issues?

You need to set 'session.save_model_format' to 'ORT' in session options to ensure an ORT format model is generated. This is the C++ way to do it.

In addition, we could not find much information about how the session options are used with ORT files and whether we need to use them twice (i.e., when generating the ORT file and when using it). E.g., the pipeline we are following is:

Optimize model (with session options) + save as ort file by adding SetOptimizedModelFilePath

Read ORT file and run it. But do we need to run the same session options here or are they just hard-coded when the ORT file was generated so we do not need them here again? (i.e., flags like SetGraphOptimizationLevel, OrtSessionOptionsAppendExecutionProvider_DML or SetIntraOpNumThreads)

This is the pseudo-code of what we tried for this:
bool SetUpNet(const std::string& FullModelFilePath) {
	// Set session options for GPU/CPU
	if (GetDeviceType() == GPU) {
		OrtSessionOptionsAppendExecutionProvider_DML(*Impl->SessionOptions, 0);
		SessionOptions->SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);
	} else {
		SessionOptions->SetIntraOpNumThreads(2);
		SessionOptions->SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
	}

	// If onnx file --> Create ORT file and load it. If ORT file (i.e., already created previously), load it
	if (FileExtension == "onnx") {
		SessionOptions->SetOptimizedModelFilePath(OutputORTOptimizedModelPath);
		Session = std::make_unique<Ort::Session>(Environment, FullModelFilePath, SessionOptions);
		return SetUpNet(OutputORTOptimizedModelPath);
	} else if (FileExtension == "ort") {
		Session = std::make_unique<Ort::Session>(Environment, FullModelFilePath, SessionOptions);
		return true;
	}
}

Except for the graph optimization level, you'll need to set the session options again since you're creating a fresh session. The graph opt level you set before was used to generate the new file with fusions, etc and hence setting it again is optional.

That code works as long as we keep everything on the CPU (i.e, if we do not use OrtSessionOptionsAppendExecutionProvider_DML). However, we are facing weird errors when trying to use the GPU version:
Exception code 6
e.what(): "Non-zero status code returned while running Conv node. Name:'heatmap/conv1/separable_conv2d/depthwise' Status Message: D:\P4\ue5_main_pitt64\Engine\Source\../Restricted/NotForLicensees/Plugins/NeuralNetworkInference/Source/ThirdParty/ONNXRuntime/Public/core/providers/dml/DmlExecutionProvider/src/MLOperatorAuthorImpl.cpp(1694)\UnrealEditor-ONNXRuntime.dll!00007FFC4F80073C: (caller: 00007FFC4F95F88F) Exception(2) tid(d538) 8007023E {Application Error}
The exception %s (0x".
Is hardware (e.g., GPU brand/model) or OS fixed for the ORT file? E.g., if I generate a ORT file in a machine with Windows 10 and a Nvidia 1080, can I use that same ORT file in a Mac with an AMD card?

All the online doc we could find for ORT files is [1], [2], and [3]. Please, let me know if there is other important doc we might be missing!
[1] https://www.onnxruntime.ai/docs/how-to/mobile/model-conversion.html
[2] https://www.onnxruntime.ai/docs/resources/graph-optimizations.html
[3] https://www.onnxruntime.ai/docs/resources/mobile-performance-tuning.html

Thanks!

ORT format models should support DML. The error must be due to unsupported operator or type. @fdwr

gineshidalgo99 · 2021-07-20T16:52:24Z

gineshidalgo99
Jul 20, 2021

Then I believe there is a bug in the ONNX2ORT conversion when setting Device=DirectML. I opened a GitHub issue with all the information in here:
#8440 - #8440

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modifying ORT to load 3rd Party Model #7931

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Modifying ORT to load 3rd Party Model #7931

RaqHer3 Jun 3, 2021

Replies: 3 comments · 1 reply

pranavsharma Jun 8, 2021

gineshidalgo99 Jul 13, 2021

pranavsharma Jul 16, 2021

gineshidalgo99 Jul 20, 2021

RaqHer3
Jun 3, 2021

Replies: 3 comments 1 reply

pranavsharma
Jun 8, 2021

gineshidalgo99
Jul 13, 2021

gineshidalgo99
Jul 20, 2021