-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC-GPU integration #169
GC-GPU integration #169
Conversation
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
Co-authored-by: Andrey Pavlenko <[email protected]> Signed-off-by: dchigarev <[email protected]>
shape(module_input_shape.begin(), module_input_shape.end()) { | ||
if (shape.size() != tensor.get_shape().size()) { | ||
// validate that the shape difference is due to trailing '1's | ||
for (size_t i = 0; i < shape.size(); ++i) { | ||
if (shape[i] != tensor.get_shape()[i]) { | ||
OPENVINO_THROW("Mismatch in shape sizes"); | ||
} | ||
} | ||
for (size_t i = shape.size(); i < tensor.get_shape().size(); ++i) { | ||
if (tensor.get_shape()[i] != 1) { | ||
OPENVINO_THROW("Mismatch in shape sizes"); | ||
} | ||
} | ||
} | ||
strides.resize(shape.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed due to the fact that GPU memory formats could hold at least 4-dimensions, causing trailing extra dims (<64x128x1x1>
instead of <64x128>
). This code compares input tensors' dimensions with the input dimensions of a MLIR module and trims extra dims.
@@ -38,21 +51,71 @@ void CreateMLIRSubgraphOp(ProgramBuilder& p, const std::shared_ptr<ov::op::mlir: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need this synchronization anymore since we pass the same queue to GC and submit our kernels to it.
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
@@ -11,7 +11,8 @@ namespace ov { | |||
|
|||
namespace pass { | |||
|
|||
void TRANSFORMATIONS_API transformMLIR(std::shared_ptr<ov::Model> model); | |||
void TRANSFORMATIONS_API transformMLIR(std::shared_ptr<ov::Model> model, | |||
std::shared_ptr<ov::EvaluationContext> loweringContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loweringContext
stores ocl_context
for mlir_op::gpu
Signed-off-by: dchigarev <[email protected]>
OpenCL(bool out_of_order_queue = true) | ||
{ | ||
OpenCL(bool out_of_order_queue = true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was fixed by openvino's linter
@@ -23,8 +23,7 @@ struct OpenCL { | |||
bool _supports_usm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved this class from tests/unit_tests/utils
to tests/common/utils
in order to reuse it in sanity tests for the GPU integration
@@ -95,6 +93,20 @@ struct OpenCL { | |||
_queue = cl::CommandQueue(_context, _device, props); | |||
} | |||
|
|||
OpenCL(cl_context context, bool out_of_order_queue = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's more convenient to construct this object from cl_context
in sanity tests for GPU integration, since we can simply request the context from compiled model and construct this class
@vladimir-paramuzov @slyalin @kurapov-peter @AndreyPavlenko I think this PR is now in a state where it can be reviewed |
Signed-off-by: dchigarev <[email protected]>
src/common/transformations/src/transformations/mlir/convert.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/mlir/mlir_op.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/mlir/mlir_op.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/mlir/mlir_op.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/mlir/mlir_op.cpp
Outdated
Show resolved
Hide resolved
Should we merge #167 before merging this PR? Are both PRs ready to be merged? If they don't have obvious breaking changes, it is more convenient to continue development in the main mlir branch. I have a merged version of mlir branch and master branch from main openvino repository. So to avoid you fighting with merge conflicts on your side I would recommend to merge mentioned two PRs and then I redo the merge with master openvino branch on my side. @kurapov-peter, @AndreyPavlenko, @dchigarev? |
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
src/common/transformations/src/transformations/mlir/mlir_op.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: dchigarev <[email protected]>
I think this PR is already in a state where it can be merged. There's one more question though regarding gpu-runtime headers exposure that I would like to discuss. The questions is, whether it's okay that we include GPU runtime headers to the |
src/inference/include/openvino/runtime/intel_gpu/remote_properties.hpp
Outdated
Show resolved
Hide resolved
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
Signed-off-by: dchigarev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Would TPP need anything from evaluation context btw?
src/common/transformations/src/transformations/mlir/mlir_op.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: dchigarev <[email protected]>
@vladimir-paramuzov, please approve explicitly and we will merge. |
Signed-off-by: dchigarev <[email protected]>
@slyalin I believe we've got all approves we needed |
This PR adds an integration with graph-compiler's GPU pipeline. The integration means passing GPU buffers (usm ptr/cl_mem), cl queue and cl events as is to GC for execution.
A set of sanity tests was also added to test the integration.
How to build and run tests
Build LLVM with IMEX patches:
Build OV from this branch:
Run sanity tests:
Run benchmark_app:
What was changed and how it works
1. Common
MLIREvaluate
class was split into twoThere are now two classes:
MLIREvaluate
(generic evaluation) andMLIREvaluateGcGPU
. They both implement the interface ofMLIREvaluateBase
and an actual instance is created based onmlir_mode
parameter inMLIREvaluateBase::create()
.This was done because these two evaluation classes operate with different objects in order to lower and invoke recieved MLIR module. Generic
MLIREvaluate
operates onmlir::EvaluationEngine
andmlir::Module
, whileMLIREvaluateGcGPU
operates with gc-specific runtime objects (mlir::gc::OclModuleBuilder
,mlir::gc::OclModule
, etc...).2. Context/device information is now forwarded to
MLIREvaluateBase::create()
We need context + device information for the gc-gpu-runtime in order to build a module. That's why we now extract
ocl_context
andcl_device_id
fromRemoteContextImpl
inTransformationsPipeline
and forward it all the way toMLIREvaluateBase::create()
usingov::EvaluationContext
map.3. Separation between
MLIREvaluate::invoke
andMLIREvaluate::invoke_packed
A new invocation method was added to
MLIREvaluateBase
interface (::invoke()
). In comparison with::invoke_packed()
that accepts memref arguments in theMemrefDescriptor
format,::invoke()
takes tensor vectors as is.GC-GPU runtime expects arguments to be in a non-packed format (pointers only) if all memrefs in the compiled mlir module have static shapes. Otherwise it expects "packed" format (MemrefDescriptors).
A query method was added to determine which method of
MLIREvaluate
to call.(@AndreyPavlenko may provide more info on why we need this separation)
4. Actual OCL implementations ofcldnn::stream/buffer/event
are now exposed tointel_gpu/src/plugin/ops/mlir_op.cpp
Base classes of stream/buffer/event do not have a method to get a handle of an actual underlying object (cl_queue/cl_mem/cl_event). In order to obtain these handles and pass them to gc-gpu runtime, the instances of these abstract objects are being dynamic-casted to their presumable implementations (ocl::gpu_buffer / ocl_stream / ocl_base_event). In order to do that we have to expose declaration of these ocl-specific implementations toops/mlir_op.cpp
by modifying its include directories. Are we okay with this?^--- this was replaced with the one below
4.
cldnn::stream/buffer/event/device
are now able to return an underlying ocl handleIn order to get an actual cl object and pass it to the graph compiler's GPU runtime, the
void* get_handle()
method was added tocldnn::stream/buffer/device/event
interfaces. The method is supposed to return a pointer to an opencl handle from C-api (cl_mem
,cl_command_queue
,cl_device_id
, ...) since gc-gpu runtime takes these instead of c++ wrappers.5.
cldnn::stream::create_base_event(...)
can now take a pointer tocl_event
(in order to propagate
cl_event
returned from gc-gpu runtime tocldnn::event
that is returned fromMLIROp::evaluate()
)