Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLP benchmarks #152

Merged
merged 18 commits into from
Jul 30, 2024
Merged

MLP benchmarks #152

merged 18 commits into from
Jul 30, 2024

Conversation

adam-smnk
Copy link
Collaborator

@adam-smnk adam-smnk commented Jul 25, 2024

Usage: ./tools/mlir_bench/mlp_bench.sh

TODO:

  • test on cluster
  • add support for matmul without transpose
  • investigate broadcast error when type is not f32

@adam-smnk
Copy link
Collaborator Author

Works fine on cluster, I was able to gather initial numbers for f32.
There's some lowering issue for other data types. I'll look into it next.

@slyalin
Copy link
Owner

slyalin commented Jul 29, 2024

investigate broadcast error when type is not f32

@adam-smnk, have you unlocked non f32 types in MLIR conversion?

I noticed issues with accuracy in Pytorch layer tests when running min_max tests. Reproduced when OV_MLIR is enabled only.

@adam-smnk
Copy link
Collaborator Author

have you unlocked non f32 types in MLIR conversion?

I think it was primarily a mistake in my testing setup. Otherwise, I just need to relax matchers to accept any types.

inputs = [(ov.PartialShape(shapes), ov_type) for shapes in input_shapes]

ov_model = ov.convert_model(torch_seq, input=inputs)
ov.save_model(ov_model, f"{file_name}")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI: ov.save_model will partially compress weights from f32 to f16. "Partially" is because for each constant it is decided individually based on the values range in the constant. In the IR it will be represented as two operations: Constant(f16) -> Convert(f32). This is our default behavior for all models in ov.save_model to save space on the disk, and it shouldn't affect final inference because this combination of operations will be constant folded during model compilation.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't affect inference precision, it is just a way of weights compression.

@adam-smnk adam-smnk merged commit 705477e into slyalin:mlir Jul 30, 2024
13 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants