-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT/CUDA: Runtime CUDA >= 12.3 to enable VMM #10396
base: master
Are you sure you want to change the base?
Conversation
1ce967f
to
68a5f51
Compare
status = uct_cuda_copy_md_check_is_ctx_set_flags_supported(); | ||
if ((status != UCS_OK) && (md->config.enable_fabric != UCS_NO)) { | ||
ucs_warn("disabled fabric memory allocations as cuda driver " | ||
"library does not support cuCtxSetFlags()"); | ||
md->config.enable_fabric = UCS_NO; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status = uct_cuda_copy_md_check_is_ctx_set_flags_supported(); | |
if ((status != UCS_OK) && (md->config.enable_fabric != UCS_NO)) { | |
ucs_warn("disabled fabric memory allocations as cuda driver " | |
"library does not support cuCtxSetFlags()"); | |
md->config.enable_fabric = UCS_NO; | |
} | |
if (md->config.enable_fabric != UCS_NO) { | |
status = uct_cuda_copy_md_check_is_ctx_set_flags_supported(); | |
if (status != UCS_OK) { | |
if (md->config.enable_fabric == UCS_YES) { | |
ucs_error("fabric memory allocation requested but cuda driver " | |
"library does not support cuCtxSetFlags()"); | |
goto err_free_md; | |
} else { | |
ucs_diag("disabled fabric memory allocations as cuda driver " | |
"library does not support cuCtxSetFlags()"); | |
md->config.enable_fabric = UCS_NO; | |
} | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would not work as we will try to use set ctx even if fabric is not enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can check md->config.enable_fabric
instead of ctx_set_flags_func
in uct_cuda_copy_sync_memops
.
CUresult cu_err; | ||
|
||
if (status == UCS_ERR_INVALID_ADDR) { | ||
pthread_mutex_lock(&lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need a mutex here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
theoretical: multiple workers writing func pointer
we have tests for different cuda versions, which include cuda memory hooks (for example, Test Cuda Docker ubuntu18_cuda_12_0). can we add a test that would have caught the new api usage? |
What?
Do not use
cuCtxSetFlags()
if CUDA driver does not support it.Why?
Unresolved symbol for
cuCtxSetFlags
on CUDA driver < 12.1 causes crash.How?
Assumptions:
cuCtxSetFlags
is only needed for VMM, which has UCX support starting from CUDA driver >= 12.3cuCtxSetFlags
is not strictly needed for malloc asyncTesting
Locally tested, needs final testing on platform with actual older drivers.