Fixes cudaErrorInvalidValue when running on nvbench-created cuda stream #113

elstehle · 2022-12-07T12:03:57Z

This PR fixes a minor issue that may occur when nvbench is run on multiple GPUs without a user-provided cuda stream.

The issue

The error that I observed in this case looked like:

Fail: Unexpected error: nvbench/detail/l2flush.cuh:55: Cuda API call returned error: cudaErrorInvalidValue: invalid argument

When run with memcheck I would see:

Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaMemsetAsync.

The Problem

It seems that nvbench is creating all the nvbench-owned streams on device 0.

Suggested Fix

This fix makes sure that the streams are created on the device on which they are later on used.

nvbench/cuda_stream.cuh

jrhemstad · 2022-12-07T16:10:22Z

nvbench/cuda_stream.cuh

@@ -42,10 +45,18 @@ struct cuda_stream
   * Constructs a cuda_stream that owns a new stream, created with
   * `cudaStreamCreate`.
   */
-  cuda_stream()
-      : m_stream{[]() {
+  cuda_stream(std::optional<nvbench::device_info> device)


Docs should be updated to explain the semantics of the new device parameter.

Thanks. Updated docs. Could you please check if it's understandable?

alliepiper

This LGTM, thanks for catching it! Some of the tests don't build after the changes, you can run ci/local/build.bash from the nvbench root to build and test if you have docker setup.

Once tests are passing this is good to go.

elstehle · 2023-01-18T15:24:18Z

This LGTM, thanks for catching it! Some of the tests don't build after the changes, you can run ci/local/build.bash from the nvbench root to build and test if you have docker setup.

Once tests are passing this is good to go.

Thanks for reviewing the PR. nvbench::cuda_stream used to be default constructible and also be part of the public API.
In this PR, I required passing a std::optional<nvbench::device_info> to cuda_stream's ctor, which sort of was a breaking change. To avoid the breaking change, I've now added back the default ctor to cuda_stream.

testing/cuda_stream.cu

alliepiper · 2023-01-30T18:25:51Z

@elstehle I'm still seeing a test regression when running ci/local/build.bash on this branch:

 4/39 Test #32: nvbench.test.state_generator ..................***Failed    2.39 sec
/cccl/nvbench/nvbench/detail/device_scope.cuh:37: Cuda API call returned error: cudaErrorInvalidDevice: invalid device ordinal
Command: 'cudaSetDevice(dev_id)'

elstehle · 2023-01-31T17:22:55Z

@elstehle I'm still seeing a test regression when running ci/local/build.bash on this branch:

 4/39 Test #32: nvbench.test.state_generator ..................***Failed    2.39 sec
/cccl/nvbench/nvbench/detail/device_scope.cuh:37: Cuda API call returned error: cudaErrorInvalidDevice: invalid device ordinal
Command: 'cudaSetDevice(dev_id)'

Thanks! Sorry, I've had missed that regression as it only occurred on systems with three devices or less.

Issue with the test in testing/state_generator.cu was that we generate states for devices [0, 1, 2], independent of whether those devices existed or not:

const auto device_0 = nvbench::device_info{0, {}};
const auto device_1 = nvbench::device_info{1, {}};
const auto device_2 = nvbench::device_info{2, {}};

dummy_bench bench;
bench.set_devices({device_0, device_1, device_2});
...
const std::vector<nvbench::state> states = nvbench::detail::state_generator::create(bench);

When the states are created, we create the stream for each state on that state's given device. If a given device doesn't exist, we run into a cuda error.

For comparison, if we'd currently run a benchmark with invalid device ids, the runner would fail with the same error.

../nvbench/device_info.cuh:71: Cuda API call returned error: cudaErrorInvalidDevice: invalid device ordinal

I resolved this regression by adjusting the test in testing/state_generator.cu to only run on devices actually available in the system. But I would like to confirm that we're generally ok with that behaviour.

jrhemstad · 2023-02-06T18:39:01Z

nvbench/detail/measure_hot.cu

+      if (!m_state.get_cuda_stream().has_value())
+      {
+        m_state.set_cuda_stream(nvbench::cuda_stream{m_state.get_device()});
+      }
+      return m_state.get_cuda_stream().value();


This feels weird to have the initialization of the optional external to state.

How about putting this logic inside state::get_cuda_stream instead and don't expose the optional externally.

How about putting this logic inside state::get_cuda_stream instead and don't expose the optional externally.

@allisonvacanti and I have discussed that option too but agreed to prefer explicitly setting the stream over implicitly initializing it as a byproduct, if it didn't exist. Considering the user interfacing with the API, I feel that, for multi-GPU systems, it's safer to make it explicit when resources are created and what device they are associated with. Especially, when the current device may influence what device a resource is associated with.

That said, I'm fine to have it any way we decide makes more sense. @allisonvacanti what do you think?

create cuda stream on each device

eac79ef

elstehle force-pushed the fix/per-device-stream branch from 09cb757 to eac79ef Compare December 7, 2022 12:05

fixes include order

8e85886

jrhemstad reviewed Dec 7, 2022

View reviewed changes

nvbench/cuda_stream.cuh Outdated Show resolved Hide resolved

jrhemstad reviewed Dec 7, 2022

View reviewed changes

adds device documentation on stream ctor

1301b52

elstehle requested a review from jrhemstad December 8, 2022 09:53

alliepiper requested changes Jan 17, 2023

View reviewed changes

adds back default ctor for cuda_stream

8b191fe

elstehle marked this pull request as ready for review January 18, 2023 15:26

elstehle added 2 commits January 19, 2023 02:23

adds tests for cuda_stream

ff4e811

adds check for status returned from cuda driver api

7c82037

elstehle commented Jan 19, 2023

View reviewed changes

testing/cuda_stream.cu Show resolved Hide resolved

elstehle requested review from alliepiper and removed request for jrhemstad January 20, 2023 06:05

guard cuda driver API calls by cupti macro

14079ae

limit states test to available devices

b6a29ec

elstehle added 3 commits February 2, 2023 04:32

revert state_generator test changes

7281bbd

lazily initializes cuda stream during measurements

85645cb

fixes format

78fa3c6

jrhemstad reviewed Feb 6, 2023

View reviewed changes

happierpig mentioned this pull request Oct 21, 2024

Multi-GPU Support mit-han-lab/Quest#11

Closed

GregoryKimball mentioned this pull request Oct 25, 2024

Issue with devices flag on multi-GPU system #189

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes cudaErrorInvalidValue when running on nvbench-created cuda stream #113

Fixes cudaErrorInvalidValue when running on nvbench-created cuda stream #113

elstehle commented Dec 7, 2022

jrhemstad Dec 7, 2022

elstehle Dec 8, 2022

alliepiper left a comment

elstehle commented Jan 18, 2023

alliepiper commented Jan 30, 2023

elstehle commented Jan 31, 2023

jrhemstad Feb 6, 2023

elstehle Feb 7, 2023 •

edited

Loading

Fixes cudaErrorInvalidValue when running on nvbench-created cuda stream #113

Are you sure you want to change the base?

Fixes cudaErrorInvalidValue when running on nvbench-created cuda stream #113

Conversation

elstehle commented Dec 7, 2022

The issue

The Problem

Suggested Fix

jrhemstad Dec 7, 2022

Choose a reason for hiding this comment

elstehle Dec 8, 2022

Choose a reason for hiding this comment

alliepiper left a comment

Choose a reason for hiding this comment

elstehle commented Jan 18, 2023

alliepiper commented Jan 30, 2023

elstehle commented Jan 31, 2023

jrhemstad Feb 6, 2023

Choose a reason for hiding this comment

elstehle Feb 7, 2023 • edited Loading

Choose a reason for hiding this comment

elstehle Feb 7, 2023 •

edited

Loading