matmul_nbits: Use GPU_WARP_SIZE_HOST for host side code #65

jagadish-amd · 2024-09-10T06:46:06Z

For ROCm device, the host side code needs to call GPU_WARP_SIZE_HOST to query warpsize of the underlying GPU device.

Fixes MatMulNBits tests on Navi.

For ROCm device, the host side code needs to call GPU_WARP_SIZE_HOST to query warpsize of the underlying GPU device. Fixes MatMulNBits tests on Navi. Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

TedThemistokleous · 2024-09-10T14:02:14Z

onnxruntime/contrib_ops/cuda/quantization/matmul_nbits.cu

@@ -288,6 +288,7 @@ bool TryMatMul4Bits(
  if (n % kColsPerThreadBlock != 0 || k % 8 != 0 || m > 1) {
    return false;
  }
+  const int kWarpSize = GPU_WARP_SIZE_HOST;


~~Put a #if around this via how we handle things with USE_ROCM and USE_MIGRAPHX~~

#if USE_MIGRAPHX || USE_ROCM const int kWarpSize = GPU_WARP_SIZE_HOST; #endif

~~That way if once we upstream this change it wouldn't break another EP if this is specific to AMD~~

There's no need to ifdef protect this. CUDA EP also defines GPU_WARP_SIZE_HOST for this reason.

Ah then my real curiosity is why redefine it here then if that's the case?

onnxruntime/contrib_ops/cuda/quantization/matmul_nbits.cu

jeffdaily · 2024-09-10T14:07:12Z

onnxruntime/contrib_ops/cuda/quantization/matmul_nbits.cu

@@ -288,6 +288,7 @@ bool TryMatMul4Bits(
  if (n % kColsPerThreadBlock != 0 || k % 8 != 0 || m > 1) {
    return false;
  }
+  const int kWarpSize = GPU_WARP_SIZE_HOST;


There's no need to ifdef protect this. CUDA EP also defines GPU_WARP_SIZE_HOST for this reason.

Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

jeffdaily · 2024-09-10T19:22:53Z

@jagadish-amd please file upstream PR for same change.

matmul_nbits: Use GPU_WARP_SIZE_HOST for host side code

011cecd

For ROCm device, the host side code needs to call GPU_WARP_SIZE_HOST to query warpsize of the underlying GPU device. Fixes MatMulNBits tests on Navi. Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

jagadish-amd requested review from jeffdaily, xinyazhang, TedThemistokleous and causten September 10, 2024 06:46

TedThemistokleous reviewed Sep 10, 2024

View reviewed changes

TedThemistokleous assigned jagadish-amd Sep 10, 2024

jeffdaily reviewed Sep 10, 2024

View reviewed changes

Add GPU_WARP_SIZE_HOST in threads dim constructor.

7b5180f

Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

jeffdaily approved these changes Sep 10, 2024

View reviewed changes

jeffdaily merged commit d7e1c61 into ROCm:rocm6.3_internal_testing Sep 10, 2024
5 of 10 checks passed

jagadish-amd deleted the fix_matmul4 branch September 10, 2024 20:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

matmul_nbits: Use GPU_WARP_SIZE_HOST for host side code #65

matmul_nbits: Use GPU_WARP_SIZE_HOST for host side code #65

jagadish-amd commented Sep 10, 2024

TedThemistokleous Sep 10, 2024 •

edited

Loading

jeffdaily Sep 10, 2024

TedThemistokleous Sep 10, 2024

jeffdaily Sep 10, 2024

jeffdaily commented Sep 10, 2024

matmul_nbits: Use GPU_WARP_SIZE_HOST for host side code #65

matmul_nbits: Use GPU_WARP_SIZE_HOST for host side code #65

Conversation

jagadish-amd commented Sep 10, 2024

TedThemistokleous Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

jeffdaily Sep 10, 2024

Choose a reason for hiding this comment

TedThemistokleous Sep 10, 2024

Choose a reason for hiding this comment

jeffdaily Sep 10, 2024

Choose a reason for hiding this comment

jeffdaily commented Sep 10, 2024

TedThemistokleous Sep 10, 2024 •

edited

Loading