You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue was really going to be about malloc pitch and 2D, but then I realized maybe the whole cust::memory needed a little refurbish.
Cuda gives better performance when rows are aligned in a specific manner. For 2D arrays (e.g. images) it is common to use cudaMallocPitch for allocation and cudaMemcpy2D for copying. In addition there exists corresponding functions for 3D which makes things a bit more complex as the cuda array type must be used.
The first question is: Should these be exposed, in some way, in cust? I think the answer is yes, or else there would be unnecessarily difficult to write code that is as performant as when using for host code C/C++.
The second question is: How should these functions be exposed in cust? Cuda is all over the place when it comes to naming, argument lists, return values and even how things really work. Should we follow the cuda runtime API to make it as similar as possible for the C++ crowd? Or should try to make cust::memory as coherent as possible?
I understand both sides, but it seems to me the goal for this project is to make Cuda in Rust as good as it can be without being afraid of diverging from how it works in C++. The logical solution might then be to try to improve on naming and function signatures.
When I started looking at this I realized that the current cust::memory module is not very unified either. The alloc and memcpy functions operate of different types (cust::DevicePointer vs CUdeviceptr), size specifications (bytes vs elements) and constraints (when allocing T must be DeviceCopy). How should the feature complete cust::memory look like?
The text was updated successfully, but these errors were encountered:
the memory module is kind of a mess, it was a bit of a mess from rustacuda, then incrementally we started changing and adding things, but a lot of the odd stuff is still left in there. In particular the decisions for using DevicePointer vs CUdevicePtr. Its not even over yet, DeviceSlice will probably get another rework per #44
yeah i think adding a malloc_pitched function would be fine, idk about adding pitched methods in DeviceBuffer and such, i havent researched pitched memory much
This issue was really going to be about malloc pitch and 2D, but then I realized maybe the whole
cust::memory
needed a little refurbish.Cuda gives better performance when rows are aligned in a specific manner. For 2D arrays (e.g. images) it is common to use
cudaMallocPitch
for allocation andcudaMemcpy2D
for copying. In addition there exists corresponding functions for 3D which makes things a bit more complex as the cuda array type must be used.The first question is: Should these be exposed, in some way, in cust? I think the answer is yes, or else there would be unnecessarily difficult to write code that is as performant as when using for host code C/C++.
The second question is: How should these functions be exposed in cust? Cuda is all over the place when it comes to naming, argument lists, return values and even how things really work. Should we follow the cuda runtime API to make it as similar as possible for the C++ crowd? Or should try to make
cust::memory
as coherent as possible?I understand both sides, but it seems to me the goal for this project is to make Cuda in Rust as good as it can be without being afraid of diverging from how it works in C++. The logical solution might then be to try to improve on naming and function signatures.
When I started looking at this I realized that the current
cust::memory
module is not very unified either. The alloc and memcpy functions operate of different types (cust::DevicePointer vs CUdeviceptr), size specifications (bytes vs elements) and constraints (when allocing T must beDeviceCopy
). How should the feature completecust::memory
look like?The text was updated successfully, but these errors were encountered: