Skip to content
This repository has been archived by the owner on Apr 23, 2021. It is now read-only.

[WIP] [spirv] Add Vulkan runtime. #118

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

denis0x0D
Copy link
Contributor

@denis0x0D denis0x0D commented Sep 2, 2019

This patch is addressing #60
Implements initial version of Vulkan runtime to test spirv::ModuleOp.
Creates and runs Vulkan computation pipeline.

Provides one unititest to test multiplication of two spv.arrays of float types.

Requirements:

  • Vulkan capable device and drivers installed.
  • Vulkan SDK installed and VULKAN_SDK environment variable is set.

How to build:

  • Provide -DMLIR_VULKAN_RUNNER_ENABLED as cmake variable.

Note:

@antiagainst @MaheshRavishankar can you please take a look?
Thanks!

@denis0x0D denis0x0D changed the title [spirv] Add Vulkan runtime. [WIP] [spirv] Add Vulkan runtime. Sep 3, 2019
Copy link
Contributor

@antiagainst antiagainst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks @denis0x0D!

I've quite a few inline comments. But generally, I think we should add more documentation and improve how errors are reported. :)

@@ -0,0 +1,59 @@
//===- VulkanRuntime.h - MLIR Vulkan runtime ------------------------------===//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: -*- C++ -*-===// at the end.

// limitations under the License.
// =============================================================================
//
// This file specifies VulkanDeviceMemoryBuffer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This file provides a library for running a module on a Vulkan device." ?


#include <vulkan/vulkan.h>

using Descriptor = uint32_t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more comments to this? Normally a descriptor includes two numbers. We only have one here. Would be nice to be clear on what it means.


using Descriptor = uint32_t;

/// Represents device memory buffer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" Struct containing information regarding a ..." ?

VkDeviceMemory deviceMemory;
};

/// Represents host memory buffer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Struct containing information regarding a ..."?

}

LogicalResult VulkanRuntime::run() {
if (failed(vulkanCreateInstance())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (... || ... || ...) 

VkApplicationInfo applicationInfo;
applicationInfo.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO;
applicationInfo.pNext = nullptr;
applicationInfo.pApplicationName = "Vulkan MLIR runtime";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLIR Vulkan runtime

vkEnumeratePhysicalDevices(instance, &physicalDeviceCount, 0),
"vkEnumeratePhysicalDevices");

llvm::SmallVector<VkPhysicalDevice, 0> physicalDevices(physicalDeviceCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 ? Most hosts should just have one physical device I think.

vkCreateDevice(physicalDevice, &deviceCreateInfo, 0, &device),
"vkCreateDevice");

VkPhysicalDeviceMemoryProperties properties;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's initialize these local variables with = {} otherwise MSAN might be unhappy.

VkPhysicalDeviceMemoryProperties properties;
vkGetPhysicalDeviceMemoryProperties(physicalDevice, &properties);

for (uint32_t i = 0, e = properties.memoryTypeCount; i < e; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put comments here on what we are doing here so it's easier for others to follow?

@denis0x0D
Copy link
Contributor Author

@antiagainst thanks a lot for review! I will update the patch regarding to the comments.

@@ -0,0 +1,59 @@
//===- VulkanRuntime.h - MLIR Vulkan runtime ------------------------------===//
Copy link
Contributor

@River707 River707 Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should not be in /Support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@River707 thanks, I'll delete it, since it's just for the test purpose, I think I can declare the function inside the test file.

Implements initial version of Vulkan runtime to test spirv::ModuleOp.
Creates and runs Vulkan computation pipeline.

Requirements:
- Vulkan capable device and drivers installed.
- Vulkan SDK installed and VULKAN_SDK environment variable is set.

How to build:
- Provide -DMLIR_VULKAN_RUNNER_ENABLED=ON as cmake variable.
@denis0x0D denis0x0D force-pushed the sandbox/vulkan-runtime branch from e1220ee to 6ab4a57 Compare September 7, 2019 17:29
@denis0x0D
Copy link
Contributor Author

@antiagainst I've updated the patch regarding to your comments and have some thoughts about the error reporting machinery in particular and about the vulkan-runner in general,
the following text may contain repetitions with what has already been agreed, sorry about it, but I want make sure that I unrerstand all in the right way. Can you please comment on the following, please fix me if I'm wrong.

  1. I was looking at the current realization of other runtime libs inside MLIR and they are using llvm:errs(), for example cuda runtime wrappers https://github.com/tensorflow/mlir/blob/master/tools/mlir-cuda-runner/cuda-runtime-wrappers.cpp#L34

  2. It could be better for Vulkan runtime to be independent from MLIR context, for example the full pipeline for vulkan-runner could look like this:

2.1. Compile-time passes:

a. Pass which serializes spir-v module into binary and inserts it as an attribute, similar to GpuToCubinPass:

function.setAttr(kCubinAnnotation,
                   builder.getStringAttr({cubin->data(), cubin->size()}));

b. Pass which creates global constant containing spir-v binary, similar to GpuGenerateCubinAccessorsPass:

Value *startPtr = LLVM::createGlobalString(
        loc, builder, StringRef(nameBuffer), blob.getValue(), llvmDialect);
    builder.create<LLVM::ReturnOp>(loc, startPtr);

c. And the final pass which converts launch call into calls for Vulkan runtime wrappers, instruments calls for buffers registration into Vulkan runtime and so on.

2.2. Runtime part must consist of two parts: Vulkan runtime wrappers over actual runtime and actual Vulkan runtime which manages memory, descriptors, layouts, pipeline and so on.
In this case the mlir module could look like this:

module {
  func @main() {
     // ...
    call @vkPopulateBuffer(%buff : memref<x?f32>, %desriptorSet : i32, %descriptorBinding : i32, %storageClass : i32) ...
    // ...
    call @vkLaunchShader(...) ...
    // ...
 }
// extern @vkPopulateBuffer
// extern @vkLaunchShader

Vulkan runtime wrappers could look like this:

extern "C" void vkPopulateBuffer(const memref_t buff, int32_t descriptorSet,
 int32_t decsriptorBinding, int32_t stroageClass) {

     VulkanRuntimeManager::instance()->registerBuffer(
      vulkanHostMemoryBuffer(buff.values, buff.length * sizeof(float)),
      decsriptorSet, descriptorBinding, storageClass);
}

VulkanRuntimeManager could look like this:

class VulkanRuntimeManager {
public:
  static VulkanRuntimeManager *instance() {
  static VulkanRuntimeManager *mng = new VulkanRuntimeManager;
  return mng;
}
void registerBuffer(VulkanHostMemoryBuffer buffer, int32_t set, int32_t binding, int32_t storageclass) {
 lock(m);
 runtime.registerBuffer(...)
}
void laucnhShader(...) ;
private:
mutex m;
VulkanRuntime runtime;
};

In this case we can at first register all needed information for Vulkan runtime such as resources (set, binding), entry point, number of work groups, then create device memory buffers, descriptor set layouts, pipeline layout and bind it to computation pipeline, create command buffer and finaly submit command buffer to the working queue, what do you think?
Thanks.

namespace {

/// Vulkan runtime.
/// The purpose of this class is to run spir-v computaiton shader on Vulkan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: computaiton -> computation

/// spir-v shader, number of work groups and entry point. After the creation of
/// VulkanRuntime, special methods must be called in the following
/// sequence: initRuntime(), run(), updateHostMemoryBuffers(), destroy();
/// each method in the sequence returns sussses or failure depends on the Vulkan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sussses -> success

VkDevice device;
VkQueue queue;

/// Specifies VulkanDeviceMemoryBuffers devided into sets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: devided -> divided

return failure();
}

// Descriptor bindings devided into sets. Each descriptor binding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: devided -> divided


TEST_F(RuntimeTest, SimpleTest) {
// SPIRV module embedded into the string.
// This module contains 4 resource variables devided into 2 sets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: devided -> divided

# https://cmake.org/cmake/help/v3.7/module/FindVulkan.html
if (NOT CMAKE_VERSION VERSION_LESS 3.7.0)
find_package(Vulkan)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a else and a proper error message here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error message is at L29. :) The probe is not done yet here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that there is a fallback, so throwing error here is not possible indeed, but I still feel that the "you're not using the right CMake version should be something surfaced at some point as a root cause of failure.
Something like adding here: an else if ("$ENV{VULKAN_SDK}" STREQUAL "") message(FATAL_ERROR "Please use at least Make 3.7.0 or provide the VULKAN_SDK path as an environment variable")

uint32_t z{1};
};

// This is a temporary function and will be removed in the future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify the temporary aspect: there is a layering violation right now where you're poking at the implementation of a tools.

destroyResourceVarFloat(vars[1][1]);
destroyResourceVarFloat(fmulResult);
destroyResourceVarFloat(expected);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid C++ unit-tests and use a regular lit+FileCheck testing please? I think that is the whole point of using a runner.

See the cuda testing for reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was just a temporary step to have functionalities implemented incrementally. We don't have all the shims and conversions like the CUDA side yet. We've already >1k LOCs here; having all the above, we are expecting another >1k LOCs. So I'm fine of landing this as-is. But if you are very uncomfortable about this, I see two ways going forward: 1) remove the C++ tests and land the functionality; 2) leaving this PR open and have following-up commits to implement the rest of the functionality so finally we can squash the intermediate state away.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still on the same opinion as in September: build the right tool step by step with the right testing at every step along the way.

Copy link
Contributor

@antiagainst antiagainst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply! On my side I just have a few more nits.

# https://cmake.org/cmake/help/v3.7/module/FindVulkan.html
if (NOT CMAKE_VERSION VERSION_LESS 3.7.0)
find_package(Vulkan)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error message is at L29. :) The probe is not done yet here.

struct VulkanDeviceMemoryBuffer {
BindingIndex bindingIndex{0};
VkDescriptorType descriptorType{VK_DESCRIPTOR_TYPE_MAX_ENUM};
VkDescriptorBufferInfo bufferInfo;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero initialize the other fields too? (Using VK_NULL_HANDLE and other reasonable defaults)

uint32_t z{1};
};

/// Struct containing information regarding to a descriptor set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/to//

namespace {

/// Vulkan runtime.
/// The purpose of this class is to run spir-v computaiton shader on Vulkan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/spir-v/SPIR-V/

globally


void createResourceVarFloat(uint32_t id, uint32_t elementCount) {
std::srand(unsigned(std::time(0)));
float *ptr = new float[elementCount];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Actually it's fine to have VulkanHostMemoryBuffer to hold raw pointer. I was just saying that we should wrap these raw pointers in unique_ptr to avoid potential leaks. For example, you can let this function to return the unique_ptr and assign it to a local variable in the test so we don't need to explicitly call delete later.

destroyResourceVarFloat(vars[1][1]);
destroyResourceVarFloat(fmulResult);
destroyResourceVarFloat(expected);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was just a temporary step to have functionalities implemented incrementally. We don't have all the shims and conversions like the CUDA side yet. We've already >1k LOCs here; having all the above, we are expecting another >1k LOCs. So I'm fine of landing this as-is. But if you are very uncomfortable about this, I see two ways going forward: 1) remove the C++ tests and land the functionality; 2) leaving this PR open and have following-up commits to implement the rest of the functionality so finally we can squash the intermediate state away.

@antiagainst
Copy link
Contributor

Re: #118 (comment)

  1. Okay, using llvm::errs() sgtm. :)
  2. What you said is reasonable for me!

Again, sorry for the delay on reviewing! There are quite a few interesting things happened in the meantime that relates to this. Our prototyping on the Vulkan runtime side, IREE, was open sourced. (I see you've already noticed that because you posted questions there. ;-P) It is a reflection of how we actually want to approach in a more Vulkan-native way. Compared to CUDA, which provides a more developer-friendly and middle-level host API abstractions, Vulkan's host API is more low-level and much verbose. We think by modelling it in IR form (not literally but selectively on core features), we can gain the benefits of running compiler optimizations on it. It is not fully proven yet but looks quite promising thus far. We are also thinking about how to structure MLIR core and IREE regarding the components. SPIR-V dialect is in MLIR core right now, but the lowering from high-level dialect used by IREE is not; it's inside IREE. That is partially because we don't have a proper modelling of Vulkan host side in MLIR core. Here different from CUDA again, a simple gpu.launch op that connects the host and device code does not really work for Vulkan (I mean yes we can still lower it to Vulkan but it's not gaining all expected benefits from Vulkan performance-wise). So it would be nice to have proper modelling on Vulkan in core so we can have the lowering in core and potentially share with other code paths for functionalities. Just wanted to point out recent developments to keep you informed where we are going; what you've here is certainly very valuable for testing and running in core and as building blocks for future Vulkan modelling. Thanks for the contribution again! :)

Copy link
Contributor

@joker-eph joker-eph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just noticed that the PR wasn't updated by Denis recently, I got mislead because it popped up in my email box after recent comments from kiszk@

# https://cmake.org/cmake/help/v3.7/module/FindVulkan.html
if (NOT CMAKE_VERSION VERSION_LESS 3.7.0)
find_package(Vulkan)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that there is a fallback, so throwing error here is not possible indeed, but I still feel that the "you're not using the right CMake version should be something surfaced at some point as a root cause of failure.
Something like adding here: an else if ("$ENV{VULKAN_SDK}" STREQUAL "") message(FATAL_ERROR "Please use at least Make 3.7.0 or provide the VULKAN_SDK path as an environment variable")

@denis0x0D
Copy link
Contributor Author

@kiszk @joker-eph @antiagainst @MaheshRavishankar thanks for review, actually I was thinking that IREE covers that PR as well, and this PR is not needed anymore, but if it is still relevant to test lowering - that sounds great to me!

@joker-eph
Copy link
Contributor

IREE is an independent project, I am not aware of a plan to integrate IREE within MLIR, so having end-to-end codegen story and testing capability seems still important to me.

@denis0x0D
Copy link
Contributor Author

@joker-eph @antiagainst

having end-to-end codegen story and testing capability seems still important to me.

sounds great to me!

If it's ok, on the next iteration I'll rebase current patch on trunk, update according to the comments, delete c++ unit tests and leave this PR under WIP.
After that I can start to work on a separate PR, the PR will implement a pass which:

  1. Translates module into spv.module.
  2. Serializes spv.module into binary form.
  3. Sets serialized module as an attribute.
    This is similar to GpuKernelToCubinPass does https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp

Seems to me, to implement fully working mlir-vulkan-runner also requires one more PR to cover similar functionality with https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertLaunchFuncToCudaCalls.cpp and to update runtime part to be consistent with new passes and mlir-vulkan-runner. I can cover it on the future PRs.
What do you think? If you see the better plan, please fix me in this case.
Thanks!

@antiagainst
Copy link
Contributor

+1 to what @joker-eph said in #118 (comment).

@denis0x0D: what you said SGTM! Just wanted to let you know, we are working on enabling the lowering flow from Linalg dialect to SPIR-V right now. I think that has a higher priority. We'll create issues for tasks soon and would really appreciate it if you can help there first. :) Sorry for keeping this thread open for even longer time, but I guess because this is entirely new code so it won't be much a problem to rebase against master branch later. Thanks!

@denis0x0D
Copy link
Contributor Author

@antiagainst

we are working on enabling the lowering flow from Linalg dialect to SPIR-V right now. I think that has a higher priority. We'll create issues for tasks soon and would really appreciate it if you can help there first.

Sounds great, I would like to help on it if possible!
Thanks!

@MaheshRavishankar
Copy link
Contributor

MaheshRavishankar commented Nov 24, 2019

Just catching up on all the discussion here now. Apologies for not paying more attention here.

As everyone agrees here, having a mlir-vulkan-runner in MLIR core is extremely useful for testing. So thanks again @denis0x0D for taking this up. Just to provide more context, as @antiagainst mentioned we have been trying to get all the pieces in place to convert from Linalg ops to GPU dialect to SPIR-V dialect. Some of the changes w.r.t to these will land soon (in a couple of days). I have also been making some fairly significant change to the SPIR-V lowering infrastructure to make it easy to build upon. (these should also land in a couple of days). While these changes don't directly affect the development of a vulkan-runner (the runner should be only be concerned about the spir-v binary generated by the dialect), they do motivate some requirements on the vulkan-runner summarized below.

  1. As structured right now, going to GPU dialect will result in two sub-modules within the main module
module attributes {gpu.container_module} {
    // Host side code which will contain gpu.launch_func op
   module @<some_name> attributes {gpu.kernel_module} {
     // Kernel functions ,i.e. FuncOps with the gpu.kernel attribute
  }
}

The functions in the gpu.kernel_module with the gpu.kernel attribute are targeted for SPIR-V lowering. THe lowering will create a new spv.module within the gpu.kernel_module

module attributes {gpu.container_module} {
   // Host side code which will contain gpu.launch_func op
  module @<some_name> attributes {gpu.kernel_module} {
    spv.module {
       ...
    }
     // Kernel functions ,i.e. FuncOps with the gpu.kernel attribute
}

So it would be useful if the vulkan runner can handle such module. Given this I have some follow up below on a previous comment

@joker-eph @antiagainst

having end-to-end codegen story and testing capability seems still important to me.

sounds great to me!

If it's ok, on the next iteration I'll rebase current patch on trunk, update according to the comments, delete c++ unit tests and leave this PR under WIP.
After that I can start to work on a separate PR, the PR will implement a pass which:

  1. Translates module into spv.module.

I am not sure we can translate a module into spv.module directly since the module contains both host and device side components. Do you mean extracting spv.module from within a module and running them?

  1. Serializes spv.module into binary form.
  2. Sets serialized module as an attribute.
    This is similar to GpuKernelToCubinPass does https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp

Seems to me, to implement fully working mlir-vulkan-runner also requires one more PR to cover similar functionality with https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertLaunchFuncToCudaCalls.cpp and to update runtime part to be consistent with new passes and mlir-vulkan-runner. I can cover it on the future PRs.
What do you think? If you see the better plan, please fix me in this case.
Thanks!

@joker-eph
Copy link
Contributor

joker-eph commented Nov 24, 2019 via email

@denis0x0D
Copy link
Contributor Author

denis0x0D commented Nov 24, 2019

@MaheshRavishankar thanks for the feedback,

I am not sure we can translate a module into spv.module directly since the module contains both host and device side components. Do you mean extracting spv.module from within a module and running them?

I was wrong about translate to spv.module inside this pass, sorry about it, I lost the context of this task :)
The lowering part GPU -> SPIR-V is already exists and it should be added to pass manager in mlir-vulkan-runner.

So to enalbe mlir-vulkan-runner it needs:

  1. A pass which enables right after "-convert-gpu-to-spirv", it consumes module with {gpu.container_module} attribute, serializes spv.module into binary form, using existing spirv::Serializer and attaches serialized spv.module as an attribute, simliar to this
    https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp#L85

  2. A pass which instruments host part, the part which contains gpu.launch_func, with calls to runtime wrappers, simliar to this (the API should be discussed):

module {
  func @main() {
     // ...
    call @vkPopulateBuffer(%buff : memref<x?f32>, %desriptorSet : i32, %descriptorBinding : i32, %storageClass : i32) ...
    // ...
    call @vkLaunchShader(...) ...
    // ...
 }
// extern @vkPopulateBuffer
// extern @vkLaunchShader

The host part is lowering to llvm ir (llvm.dialect), the serialized module becomes a llvm.global https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertLaunchFuncToCudaCalls.cpp#L370 with just a binary data, so we can pass a pointer to that data to runtime https://github.com/tensorflow/mlir/pull/118/files#diff-13d6d98d70fab37eda97625d19b84998R685, actually to create a shaderModule we just need to populate VkShaderModuleCreateInfo with size and pointer to binary shader.

So as result after the second pass we get a module which can be run under JitRunner, the pipeline must look similiar to this code https://github.com/tensorflow/mlir/blob/master/tools/mlir-cuda-runner/mlir-cuda-runner.cpp#L112.
Thanks!

@MaheshRavishankar
Copy link
Contributor

Thanks @denis0x0D . Overall what you are suggesting makes sense to me, but I am not that fluent in Vulkan speak to provide more feedback on specifics. But the interaction with the kernel to SPIR-V binary compilation makes sense.

@joker-eph : Thanks for catching my mistake. Edited my post to reflect the actual layout. The main point i was raising though was that you cannot serialize the entire module to a spv.module.

@MaheshRavishankar
Copy link
Contributor

On Sat, Nov 23, 2019 at 11:45 PM MaheshRavishankar @.***> wrote: Just catching up on all the discussion here now. Apologies for not paying more attention here. As everyone agrees here, having a mlir-vulkan-runner in MLIR core is extremely useful for testing. So thanks again @denis0x0D https://github.com/denis0x0D for taking this up. Just to provide more context, as @antiagainst https://github.com/antiagainst mentioned we have been trying to get all the pieces in place to convert from Linalg ops to GPU dialect to SPIR-V dialect. Some of the changes w.r.t to these will land soon (in a couple of days). I have also been making some fairly significant change to the SPIR-V lowering infrastructure to make it easy to build upon. (these should also land in a couple of days). While these changes don't directly affect the development of a vulkan-runner (the runner should be only be concerned about the spir-v binary generated by the dialect), they do motivate some requirements on the vulkan-runner summarized below. 1. As structured right now, going to GPU dialect will result in two sub-modules within the main module module { module attributes {gpu.container_module} { // Host side code which will contain gpu.launch_func op } module @fmul_kernel attributes {gpu.kernel_module} { // Kernel functions ,i.e. FuncOps with the gpu.kernel attribute } }
As far as I can tell, at the moment the host code is in the module enclosing the kernel module, they aren't siblings: https://github.com/tensorflow/mlir/blob/master/g3doc/Dialects/GPU.md#gpulaunch_func Did I miss anything?
The functions in the gpu.kernel_module with the gpu.kernel attribute are targeted for SPIR-V lowering. THe lowering will create a new spv.module within the gpu.kernel_module module { module attributes {gpu.container_module} { // Host side code which will contain gpu.launch_func op } module @fmul_kernel attributes {gpu.kernel_module} { spv.module { ... } // Kernel functions ,i.e. FuncOps with the gpu.kernel attribute } }
Why is the spv.module nested inside the kernel_module instead lowering the kernel_module into a spv.module? What do the "FuncOps with the gpu.kernel" attribute becomes in your representation above? (is there an example I could follow in the repo maybe?) Thanks,

-- Mehdi

The conversion of a FuncOp with gpu.kernel attribute to spv.module predates the existence of kernel_module. It is pretty straight-forward (and would actually clean up the conversion a little bit) to lower a kernel_module to a spv.module. It is on my list of things to do. Will probably get to that in a day or so.

@denis0x0D : I will try to send the change to lower kernel_module to spv.module as a pull request to the mlir github. So that might be something to look out for w.r.t to the vulkan runner. Thanks!

So it would be useful if the vulkan runner can handle such module. Given this I have some follow up below on a previous comment @joker-eph https://github.com/joker-eph @antiagainst https://github.com/antiagainst having end-to-end codegen story and testing capability seems still important to me. sounds great to me! If it's ok, on the next iteration I'll rebase current patch on trunk, update according to the comments, delete c++ unit tests and leave this PR under WIP. After that I can start to work on a separate PR, the PR will implement a pass which: 1. Translates module into spv.module. I am not sure we can translate a module into spv.module directly since the module contains both host and device side components. Do you mean extracting spv.module from within a module and running them? 1. Serializes spv.module into binary form. 2. Sets serialized module as an attribute. This is similar to GpuKernelToCubinPass does https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp Seems to me, to implement fully working mlir-vulkan-runner also requires one more PR to cover similar functionality with https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertLaunchFuncToCudaCalls.cpp and to update runtime part to be consistent with new passes and mlir-vulkan-runner. I can cover it on the future PRs. What do you think? If you see the better plan, please fix me in this case. Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#118?email_source=notifications&email_token=AAZXKDFCJRDZ2QY2OE4F3ZTQVIWJRA5CNFSM4IS5DSN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFAFU3A#issuecomment-557865580>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZXKDGVYKFLG7BEALSKFYDQVIWJRANCNFSM4IS5DSNQ .

@denis0x0D
Copy link
Contributor Author

@MaheshRavishankar

I will try to send the change to lower kernel_module to spv.module as a pull request to the mlir github. So that might be something to look out for w.r.t to the vulkan runner. Thanks!

Sounds great to me! By the way, those convertions looks great https://github.com/tensorflow/mlir/tree/master/test/Conversion/GPUToSPIRV

@antiagainst
Copy link
Contributor

Thannks @MaheshRavishankar for the good catch! I was actually implicitly thinking that way and didn't notice that it is not obvious to everyone. :)

I think we are already creating the spv.module inside the top-level model, so not embedded in gpu.kernel_module:

void GPUToSPIRVPass::runOnModule() {
auto context = &getContext();
auto module = getModule();
SmallVector<Operation *, 4> spirvModules;
module.walk([&module, &spirvModules](FuncOp funcOp) {
if (!gpu::GPUDialect::isKernel(funcOp)) {
return;
}
OpBuilder builder(module.getBodyRegion());

But yes going directly from gpu.kernel_module instead of gpu.kernel is cleaner and we can have multiple entry points in the same spv.module! (We need to do more verification along the way too.)

@MaheshRavishankar
Copy link
Contributor

@antiagainst : In top of tree now it is added just before the FuncOp

OpBuilder builder(funcOp.getOperation());

But the correct way is to convert the entire module into a spv.Module.

@denis0x0D : The conversion from GPU to SPIRV is changed now with 8c152c5
The CL introduces attributes that you use to specify the ABI for entry point functions. So when you convert from GPU to SPIR-V dialect you add these attributes to the function arguments and function itself that describe various ABI related aspects (like storage class, descriptor_set, binding, etc for spv.globalVariables), as well as the workgroup size for the entry point function.
A later pass (just before serialization) will materialize this ABI.

@denis0x0D
Copy link
Contributor Author

@MaheshRavishankar thanks for mention this!

@sherhut
Copy link

sherhut commented Nov 28, 2019

Late to the game and just wanted to comment that it would be really awesome if we had a lowering for the host-side of the gpu dialect in the vulkan/SPIRV context, as well. I think by using a C++ library to implement a higher-level API should make the host-side code-generation part relatively straight forward. However, as far as I have been told, it still requires quite some code to write such C++ library.

I would not spend too much energy on the design of the C++ library's API and just build what makes sense for easy code generation. The host side of the dialect will likely evolve a fair bit so these abstractions won't be forever and the mlir-vulkan-runner is not meant to replace a real runtime, so simplicity is more important.

@antiagainst
Copy link
Contributor

I would not spend too much energy on the design of the C++ library's API and just build what makes sense for easy code generation.

That's the approach that has been taken in the current pull request. You'll need thousands of lines of code to bootstrap a basic Vulkan compute shader. Right now that's all hidden in a single entry point runOnVulkan.

@joker-eph joker-eph force-pushed the master branch 3 times, most recently from 48dcae0 to 3722f03 Compare December 26, 2019 04:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants