-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding DtoD support to CUDA shared memory feature #58
base: main
Are you sure you want to change the base?
Conversation
@rarzumanyan Can you describe how your changes work? For device=gpu, what is the input_values for set_shared_memory_region? I would suggest adding an example demonstrating the DtoD copies similar to simple_http_cudasm_client. You can can have some transitive flow of data on the GPU. It will be useful in testing and understanding the API as well. |
Hi @tanmayv25
Sure. The only difference between them is that
When
Makes total sense, I just want to make sure you're happy with proposed API changes first. |
src/python/library/tritonclient/utils/cuda_shared_memory/cuda_shared_memory.cc
Show resolved
Hide resolved
Hi @CoderHam
That's perfectly fine by me to add extra flag. However, I didn't add it due for following reasons:
That's why I've decided to introduce another function for that. |
@rarzumanyan on second thought your arguments for not contaminating the namespace / enum are valid. You can just make the new changes I suggested for handling the input_values field specially for the case where it is a (cuda) device pointer |
_raise_error( | ||
"unsupported device type: cpu and gpu are supported only" | ||
) | ||
|
||
offset_current = 0 | ||
for input_value in input_values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the input_values a list of cudaptr or a single cudaptr? I think a single cuda device ptr is suffcient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a single cuda device ptr.
@@ -173,20 +181,27 @@ def set_shared_memory_region(cuda_shm_handle, input_values): | |||
"input_values must be specified as a list/tuple of numpy arrays" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to expand checks here to allow for a single cuda device ptr instead of a list or tuple of numpy arrays
def set_shared_memory_region(cuda_shm_handle, input_values): | ||
"""Copy the contents of the numpy array into the cuda shared memory region. | ||
def set_shared_memory_region(cuda_shm_handle, input_values, device='cpu'): | ||
"""Copy the contents of the numpy array or cuda dptr into the cuda shared memory region. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction: "Copy the contents of the list/tuple of numpy arrays or the content pointed to by a cuda device pointer into the cuda shared memory region."
@tanmayv25 who would be a good person to task with reviewing the remainder of this code/review? I'm not familiar with this kind of code. |
@jbkyang-nvi, @tanmayv25 has suggested you might be a good person to request a review from for this code. Can you help with that? |
@rarzumanyan are you able to add testing for this feature? |
Hello, I also encountered the same problem. In one of my face recognition projects, due to the large amount of data, batch>64, and the recognition model depends on the results of the detection model, so I will detect the results of the model (need nms etc.) are processed on the GPU (cupy, torch), but due to triton’s official cudashm.set_shared_memory_region(shm_input_handle, [img]), the img object can only be numpy, so I have to move the post-processed data Go back to the CPU. Until I came across your article, I have been doing this, because I am a novice in C++, I don't know how to modify the code, so that the cudashm of python's tritonclient supports uploading tensor data on the GPU. I hope you can tell me how I can use your branch to build a python module so that my python program can send the tensor on the GPU to triton? Thanks a lot;Thanks a lot;Thanks a lot.@rarzumanyan |
Hello world,
This patch:
set_shared_memory_region
function to accept CUDA memory beside numpy arrays.I'm not certain about changes in Python module API so please free to give me guidance on how to implement it better.