-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NNPA] Use device attribute to control device placement for ONNX operations #2510
Conversation
Signed-off-by: Tung D. Le <[email protected]>
Signed-off-by: Tung D. Le <[email protected]>
Signed-off-by: Tung D. Le <[email protected]>
How can users specify specific operations to run cpu or nnpa? |
if (device && device.getValue().equals_insensitive(CPU_DEVICE)) | ||
return true; | ||
// If device is NNPA, force to run the op on NNPA. | ||
if (device && device.getValue().equals_insensitive(NNPA_DEVICE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to do a check of legality check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean something like isNNPA() && isLegality()
? device=NNPA
can be forcing to NNPA
or maybe good for NNPA
. I am OK going with one of them.
Forcing to NNPA
is convenient when we annotate an op with device=NNPA
directly and we really want that op go to NNPA despite of compiler optimizations.
maybe good for NNPA
is safe when we use a cost model, since the cost model may have a mistake in assigning an op to NNPA (e.g. that op is not suitable for NNPA)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forcing to NNPA
is also useful when we have dynamic shape, and we want an op to run NNPA because the compiler is not able to know if it is suitable for CPU or NNPA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have been clearer
assert(isLegal(xxx) && "trying to force an op to NNPA that is not perceived as legal for NNPA");
We will provide later a configuration file (e.g. a json file), so that users can specify which op to run on cpu or nnpa. In the configuration file, users can use operation types (e.g. ONNXConv) or onnx_node_name to identity an op. |
I see. Since I sometime use execNodeOnCpu option, could you keep the option for a while until the configuration are added? It's ok If you plan to add it soon. |
Signed-off-by: Tung D. Le <[email protected]>
…NPA ops Signed-off-by: Tung D. Le <[email protected]>
It would be good if we can have the following flow.
Conversely, if using the json file is easier, that could also be done. There is a certain simplicity in having the info directly in the MLIR. I would venture that it would also be convenient if a user knew (either JSON or other annotation of the ops) that an op is legal on NNPA (regardless of if it is assigned or not) |
@AlexandreEichenberger yes, I am going with the flow you have in mind. Will ping you when it's available. |
Signed-off-by: Tung D. Le <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I would just clarify a bit the warning:
Warning: though the following operation was specified "
"to run on NNPA, the compiler found that NNPA did not "
"support that operation. It's potentially that the "
"compiler was not able to check broadcasting in case of "
"dynamic shape so that it thought the operation was not "
"legal for NNPA.
Does not specify to the user what is the result. I would change the wording to make sure the user understand that the op will still go to the NNPA.
Maybe
Warning, the following operation will run on the NNPA device even though the compiler believes that it is not legal to do so for this operation. The compiler may not have the full information necessary to accurately determine legality, for example in the presence dynamic shapes to exclude broadcasting, or for some other reasons. If the model does not work properly, you may want to double-check the validity of mapping that operation to the NNPA device.
Signed-off-by: Tung D. Le <[email protected]>
@AlexandreEichenberger now with this patch we can do the following things:
For example, this is an output of using
From this, we can see which operation will run on NNPA by looking |
Signed-off-by: Tung D. Le <[email protected]>
@imaihal I am sorry that I will have another PR for users of |
@tungld outstanding, and in the meanwhile, I run tests on z16, generating csv files of data for each ops which are then processed by a python file generating me this code:
which I will then use to evaluate benefits of NNPA vs CPU. I was going to integrate this to your pass with a flag that indicates whether to use benefits or not. This seems less redundant code as writing a new pass that does the same but with benefits. Feel free to comment here if you prefer one vs another |
Yes, I expected that we can use --device-placement pass for that purpose, and can have a flag for benefit. I put a comment in the pass to mark the position we can use for the cost model. Basically, you just walk through all ops of interest and set device, e.g.
|
Jenkins Linux amd64 Build #12756 [push] [NNPA] Use device attrib... started at 09:52 |
Jenkins Linux s390x Build #12779 [push] [NNPA] Use device attrib... started at 10:52 |
Jenkins Linux ppc64le Build #11772 [push] [NNPA] Use device attrib... started at 11:01 |
Jenkins Linux amd64 Build #12756 [push] [NNPA] Use device attrib... failed after 1 hr 8 min |
Jenkins Linux s390x Build #12779 [push] [NNPA] Use device attrib... passed after 1 hr 35 min |
Jenkins Linux ppc64le Build #11772 [push] [NNPA] Use device attrib... passed after 1 hr 50 min |
To set device for an ONNX operation (say, the op in the output of --EmitONNXIR where all onnx-to-onnx transformations has been applied), we set the device attribute for the ONNX operation, e.g.
will place
onnx.Add
on CPU.will place
onnx.Add
on NNPA.If there is no
device
attribute, the compiler will make decision.device
attribute would facilitates our next steps, in particular, using a cost model proposed in #2507, or using a configuration file by users to specify where to place an ONNX op.Next step: create a pass, e.g.
device-placement
to place ONNX operations by using a cost model or a configuration file.By using
device
, the current way of forcing an op to CPU by using--execNodesOnCPU
can be removed.