Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes cudaErrorInvalidValue when running on nvbench-created cuda stream #113

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

elstehle
Copy link

@elstehle elstehle commented Dec 7, 2022

This PR fixes a minor issue that may occur when nvbench is run on multiple GPUs without a user-provided cuda stream.

The issue

The error that I observed in this case looked like:

Fail: Unexpected error: nvbench/detail/l2flush.cuh:55: Cuda API call returned error: cudaErrorInvalidValue: invalid argument

When run with memcheck I would see:

Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaMemsetAsync.

The Problem

It seems that nvbench is creating all the nvbench-owned streams on device 0.

Suggested Fix

This fix makes sure that the streams are created on the device on which they are later on used.

@elstehle elstehle force-pushed the fix/per-device-stream branch from 09cb757 to eac79ef Compare December 7, 2022 12:05
nvbench/cuda_stream.cuh Outdated Show resolved Hide resolved
@@ -42,10 +45,18 @@ struct cuda_stream
* Constructs a cuda_stream that owns a new stream, created with
* `cudaStreamCreate`.
*/
cuda_stream()
: m_stream{[]() {
cuda_stream(std::optional<nvbench::device_info> device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs should be updated to explain the semantics of the new device parameter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated docs. Could you please check if it's understandable?

@elstehle elstehle requested a review from jrhemstad December 8, 2022 09:53
Copy link
Collaborator

@alliepiper alliepiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thanks for catching it! Some of the tests don't build after the changes, you can run ci/local/build.bash from the nvbench root to build and test if you have docker setup.

Once tests are passing this is good to go.

@elstehle
Copy link
Author

This LGTM, thanks for catching it! Some of the tests don't build after the changes, you can run ci/local/build.bash from the nvbench root to build and test if you have docker setup.

Once tests are passing this is good to go.

Thanks for reviewing the PR. nvbench::cuda_stream used to be default constructible and also be part of the public API.
In this PR, I required passing a std::optional<nvbench::device_info> to cuda_stream's ctor, which sort of was a breaking change. To avoid the breaking change, I've now added back the default ctor to cuda_stream.

@elstehle elstehle marked this pull request as ready for review January 18, 2023 15:26
@elstehle elstehle requested review from alliepiper and removed request for jrhemstad January 20, 2023 06:05
@alliepiper
Copy link
Collaborator

@elstehle I'm still seeing a test regression when running ci/local/build.bash on this branch:

 4/39 Test #32: nvbench.test.state_generator ..................***Failed    2.39 sec
/cccl/nvbench/nvbench/detail/device_scope.cuh:37: Cuda API call returned error: cudaErrorInvalidDevice: invalid device ordinal
Command: 'cudaSetDevice(dev_id)'

@elstehle
Copy link
Author

@elstehle I'm still seeing a test regression when running ci/local/build.bash on this branch:

 4/39 Test #32: nvbench.test.state_generator ..................***Failed    2.39 sec
/cccl/nvbench/nvbench/detail/device_scope.cuh:37: Cuda API call returned error: cudaErrorInvalidDevice: invalid device ordinal
Command: 'cudaSetDevice(dev_id)'

Thanks! Sorry, I've had missed that regression as it only occurred on systems with three devices or less.

Issue with the test in testing/state_generator.cu was that we generate states for devices [0, 1, 2], independent of whether those devices existed or not:

const auto device_0 = nvbench::device_info{0, {}};
const auto device_1 = nvbench::device_info{1, {}};
const auto device_2 = nvbench::device_info{2, {}};

dummy_bench bench;
bench.set_devices({device_0, device_1, device_2});
...
const std::vector<nvbench::state> states = nvbench::detail::state_generator::create(bench);

When the states are created, we create the stream for each state on that state's given device. If a given device doesn't exist, we run into a cuda error.

For comparison, if we'd currently run a benchmark with invalid device ids, the runner would fail with the same error.

../nvbench/device_info.cuh:71: Cuda API call returned error: cudaErrorInvalidDevice: invalid device ordinal

I resolved this regression by adjusting the test in testing/state_generator.cu to only run on devices actually available in the system. But I would like to confirm that we're generally ok with that behaviour.

Comment on lines +41 to +45
if (!m_state.get_cuda_stream().has_value())
{
m_state.set_cuda_stream(nvbench::cuda_stream{m_state.get_device()});
}
return m_state.get_cuda_stream().value();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels weird to have the initialization of the optional external to state.

How about putting this logic inside state::get_cuda_stream instead and don't expose the optional externally.

Copy link
Author

@elstehle elstehle Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about putting this logic inside state::get_cuda_stream instead and don't expose the optional externally.

@allisonvacanti and I have discussed that option too but agreed to prefer explicitly setting the stream over implicitly initializing it as a byproduct, if it didn't exist. Considering the user interfacing with the API, I feel that, for multi-GPU systems, it's safer to make it explicit when resources are created and what device they are associated with. Especially, when the current device may influence what device a resource is associated with.

That said, I'm fine to have it any way we decide makes more sense. @allisonvacanti what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants