Fixing xsmm runner dynamic load #146

slyalin · 2024-07-22T17:16:43Z

TODO: Still doesn't work in Python.

rengolin · 2024-07-22T17:48:01Z

Not sure what's the problem with Python. This is what fixed for us in tpp-mlir, but that was a c++ application, not a shared object. I have copied the libtpp_xsmm_runner_utils.so libraries to the install directory and I still get the same problem:

Created MLIR op: extension::MLIROp MLIROp_2179 (opset1::Parameter a[0]:f32[?,?], opset1::Constant self.linear.weight[0]:f32[128,1024]) -> (f32[?,128])
JIT session error: Symbols not found: [ xsmm_unary_invoke, xsmm_unary_dispatch, xsmm_brgemm_invoke, xsmm_brgemm_dispatch ]
JIT invocation failed

Note, library path is set correctly by running . ./install/setupvars.sh.

rengolin · 2024-07-23T09:34:20Z

Looking more at this, Python can load the library:

openat(AT_FDCWD, "/home/rengolin/devel/intel/openvino/build/install/runtime/lib/intel64/libtpp_xsmm_runner_utils.so.19.0git", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832

But later on the JIT fails to find the symbols.

JIT session error: Symbols not found: [ xsmm_unary_invoke, xsmm_unary_dispatch, xsmm_brgemm_invoke, xsmm_brgemm_dispatch ]
JIT invocation failed
Program aborted due to an unhandled Error:
Failed to materialize symbols: { (main, { entry, _mlir_entry }) }

The way we fixed this in C++ was to pre-load the library onto the tpp-run module, while mlir-cpu-runner has the --shared-libs option, which is then shared during execution.

Unfortunately, looking at Orc (LLVM's JIT compiler), the error messages are triggered by helper classes, emitted by some other loader. I imagine openvino binary is the one that needs to load that library and tell the JIT where it it.

… names of libs explicitly (FIXME)

slyalin · 2024-07-23T10:45:59Z

cmake/tpp-mlir.cmake

+        #FIXME: Provide platform-independent way of doing that:
+        install(FILES ${TPP_MLIR_DIR}/lib/libtpp_xsmm_runner_utils.so ${TPP_MLIR_DIR}/lib/libtpp_xsmm_runner_utils.so.19.0git DESTINATION ${OV_CPACK_RUNTIMEDIR})


@rengolin, please suggest a proper alternative.

That's actually not a bad idea, tbh. An alternative is to change the setupvars.sh to add the TPP build directory to the LD_LIBRARY_PATH.

Unless TPP can be installed as a proper library (on system path), there's not much else we can do.

The idea is to have self-containing openvino package as it is now to model a final product without any extra dependencies. This is how binary size of the package will be calculated. This is one of the important product-level metrics.

I initially intended to provide a proper cmake statement without all these .so etc. things. Do we have a normal way to include TPP-MLIR with find_package similar to what we have in LLVM/MLIR?

slyalin · 2024-07-23T10:47:44Z

Now it works for Linux and C++ only. Needed libraries are installed in the target ov directory.

rengolin · 2024-07-23T10:59:06Z

Now it works for Linux and C++ only. Needed libraries are installed in the target ov directory.

How can I test this in C++?

slyalin · 2024-07-23T12:53:56Z

Now it works for Linux and C++ only. Needed libraries are installed in the target ov directory.

How can I test this in C++?

To test it in C++ you need to have two programs: one to emit OpenVINO IR with desired model (Python part that uses PyTorch), and second to run that IR in C++ application. It is not very convenient but you cannot convert PyTorch model in C++ app, Python is a requirement in this case. And now C++ is a requirement for xsmm runner part, so we need two programs. I would like to see a PR that shows how a library could be registered for JIT in MLIR/LLVM world and makes it functional for both Python and C++.

The first program:

import torch
import torch.nn as nn
import openvino as ov

# Define a synthetic model
class LinearModel(nn.Module):
    def __init__(self, input_size, output_size):
        super(LinearModel, self).__init__()
        self.linear = nn.Linear(input_size, output_size)
    def forward(self, a):
        # some random element-wise stuff first just to see how it can be combined with MatMul
        b = a*a + 2.0
        x = ((a+a) * (a-b)) / a
        out = self.linear(x)
        return out

# Create an instance of the model
input_size = 1024
output_size = 128
model = LinearModel(input_size, output_size)
# Generate random weights
model.linear.weight.data.normal_(0, 0.01)
model.linear.bias.data.fill_(0.01)

input_data = torch.tensor(range(1, input_size*output_size+1)).to(torch.float32).view(output_size, input_size)

with torch.no_grad():
    reference = model(input_data)
    print('Reference:\n', reference)

ov_model = ov.convert_model(model, example_input=input_data)
ov.save_model(ov_model, "simple_model.matmul.1024x128.xml")

The second program:

#include <openvino/openvino.hpp>

int main () {
  ov::Core core;
  auto compiled_model = core.compile_model("simple_model.matmul.1024x128.xml");
  auto infer_request = compiled_model.create_infer_request();

  auto input_tensor_1 = infer_request.get_input_tensor(0);
  size_t size1 = 128;
  size_t size2 = 1024;
  input_tensor_1.set_shape({size1, size2});
  auto data_1 = input_tensor_1.data<float>();
  for(size_t i = 0; i < size1*size2; ++i)
    data_1[i] = i+1;

  infer_request.infer();

  auto output_tensor = infer_request.get_output_tensor(0);
  auto output_data = output_tensor.data<float>();
  for(size_t i = 0; i < output_tensor.get_size(); ++i) {
      std::cout << "[" << i << "]: " << output_data[i] << "\n";
  }
}

You can build it with

g++ example.cpp -I/where/openvino/installed/runtime/include -lopenvino -L/where/openvino/installed/runtime/lib/intel64

Apply setupvars.sh before that and run.

rengolin · 2024-07-23T14:16:17Z

To test it in C++ you need to have two programs: one to emit OpenVINO IR with desired model (Python part that uses PyTorch), and second to run that IR in C++ application. It is not very convenient but you cannot convert PyTorch model in C++ app, Python is a requirement in this case. And now C++ is a requirement for xsmm runner part, so we need two programs. I would like to see a PR that shows how a library could be registered for JIT in MLIR/LLVM world and makes it functional for both Python and C++.

Ok, I think we can go with that for now. The important process is:

It must come from Pytorch, and not a "made-up" graph. Importing through Python and converting to XML is fine.
It must pass through tpp-mlir and emit calls to XSMM. The default pipeline is doing that.
It must be able to load libxsmm and wrappers at runtime. The C++ program can do that.

Now we need a set of benchmarks:

Roofline: Static shape, matmul (no transpose), bias Add (no broadcast), ReLU. This should achieve similar performance as libxsmm-dnn.
Baseline: Matmul transposed, biass Add with broadcast, ReLU. This should be the slowest of the bunch, but still >50% peak.
Fixups for (2) abive: Correctly tile a transposed matmul, use linalg.generic for broadcast element-wise.

Aiming for these performance targets:

Roofline: ~90% peak AMX for BF16
Baseline: >50% of the Roofline
Fixups: >80% of the Roofline

Later on (or in parallel) we can work the Python issues, but these are not critical to demonstrate impact.

Postponed of tpp_xsmm_runner_utils load

757667d

github-actions bot added the category: build label Jul 22, 2024

slyalin requested a review from rengolin July 22, 2024 17:16

Added xsmm runner libs copying to the target ov direcotry listing the…

f8312b5

… names of libs explicitly (FIXME)

slyalin commented Jul 23, 2024

View reviewed changes

slyalin marked this pull request as ready for review July 23, 2024 10:46

rengolin approved these changes Jul 23, 2024

View reviewed changes

slyalin merged commit a7f652e into mlir Jul 23, 2024
14 of 30 checks passed

slyalin mentioned this pull request Jul 24, 2024

Broadcast support for elementwise ops #148

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing xsmm runner dynamic load #146

Fixing xsmm runner dynamic load #146

slyalin commented Jul 22, 2024 •

edited

Loading

rengolin commented Jul 22, 2024 •

edited

Loading

rengolin commented Jul 23, 2024

slyalin Jul 23, 2024

rengolin Jul 23, 2024

slyalin Jul 23, 2024

slyalin Jul 23, 2024

slyalin commented Jul 23, 2024

rengolin commented Jul 23, 2024

slyalin commented Jul 23, 2024

rengolin commented Jul 23, 2024

		#FIXME: Provide platform-independent way of doing that:
		install(FILES ${TPP_MLIR_DIR}/lib/libtpp_xsmm_runner_utils.so ${TPP_MLIR_DIR}/lib/libtpp_xsmm_runner_utils.so.19.0git DESTINATION ${OV_CPACK_RUNTIMEDIR})

Fixing xsmm runner dynamic load #146

Fixing xsmm runner dynamic load #146

Conversation

slyalin commented Jul 22, 2024 • edited Loading

rengolin commented Jul 22, 2024 • edited Loading

rengolin commented Jul 23, 2024

slyalin Jul 23, 2024

Choose a reason for hiding this comment

rengolin Jul 23, 2024

Choose a reason for hiding this comment

slyalin Jul 23, 2024

Choose a reason for hiding this comment

slyalin Jul 23, 2024

Choose a reason for hiding this comment

slyalin commented Jul 23, 2024

rengolin commented Jul 23, 2024

slyalin commented Jul 23, 2024

rengolin commented Jul 23, 2024

slyalin commented Jul 22, 2024 •

edited

Loading

rengolin commented Jul 22, 2024 •

edited

Loading