-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MLP benchmarks #152
MLP benchmarks #152
Conversation
Works fine on cluster, I was able to gather initial numbers for f32. |
@adam-smnk, have you unlocked non f32 types in MLIR conversion? I noticed issues with accuracy in Pytorch layer tests when running min_max tests. Reproduced when OV_MLIR is enabled only. |
I think it was primarily a mistake in my testing setup. Otherwise, I just need to relax matchers to accept any types. |
8776247
to
b5be3c6
Compare
inputs = [(ov.PartialShape(shapes), ov_type) for shapes in input_shapes] | ||
|
||
ov_model = ov.convert_model(torch_seq, input=inputs) | ||
ov.save_model(ov_model, f"{file_name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI: ov.save_model
will partially compress weights from f32
to f16
. "Partially" is because for each constant it is decided individually based on the values range in the constant. In the IR it will be represented as two operations: Constant(f16) -> Convert(f32)
. This is our default behavior for all models in ov.save_model
to save space on the disk, and it shouldn't affect final inference because this combination of operations will be constant folded during model compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't affect inference precision, it is just a way of weights compression.
Usage:
./tools/mlir_bench/mlp_bench.sh
TODO:
f32