-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): Add ArrowDeviceArray extension to Python bindings #313
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #313 +/- ##
==========================================
+ Coverage 86.91% 89.10% +2.19%
==========================================
Files 70 4 -66
Lines 10591 101 -10490
==========================================
- Hits 9205 90 -9115
+ Misses 1386 11 -1375 ☔ View full report in Codecov by Sentry. |
If there are no objections, I'll merge this tomorrow! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a nice improvement for the repr!
@@ -474,20 +481,8 @@ cdef class Array: | |||
else: | |||
return None | |||
|
|||
def view(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you are making this into a function instead of method? (for Schema it's still a method, so this gets a bit inconsistent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes...the Array
as defined here doesn't know if it's a CPU or GPU array, and the "view" does. I don't know that the view()
method is a particularly good idea for the Schema
, either...a cached property might be a better fit.
Early on you made the suggestion that the Array
and ArrayView
were a little confusing, name wise. I wonder if renaming the Array
to CArray
(+ Schema and ArrayStream) would fix some of that. Then ArrayView
could be renamed to Array
, which would probably make more sense to everybody.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...the CPU or GPUness matters because the ArrayView
will let you access buffer content, which will crash for the GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early on you made the suggestion that the
Array
andArrayView
were a little confusing, name wise. I wonder if renaming theArray
toCArray
(+ Schema and ArrayStream) would fix some of that. ThenArrayView
could be renamed toArray
, which would probably make more sense to everybody.
I was just writing up some thoughts on that! ;) -> #319
python/src/nanoarrow/_lib.pyx
Outdated
return <uintptr_t>&self.c_array | ||
|
||
cdef class DeviceHolder: | ||
"""Memory holder for an ArrowDevice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ArrowDevice concept something specific from nanoarrow? (I don't find it in https://arrow.apache.org/docs/dev/format/CDeviceDataInterface.html)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a nanoarrow invention. It contains methods for things like "copy this buffer to another device" but it's not ABI stable, so we'd have to workshop how to register a device from another package such that Device.resolve()
does the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some docstring then clarifying that? (to the Device class below)
8d61def
to
f16bb9f
Compare
cdef ArrowDevice* _ptr | ||
|
||
def __cinit__(self, object base, uintptr_t addr): | ||
self._base = base, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this base
used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a strong reference to the DeviceHolder
(like in Array
, Schema
, and ArrayStream
). It's probably less confusing to just use the Device
for this (unlike Array
and Schema
, there's never any other base than a DeviceHolder
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right...in making that change I remembered why this is like this: the CPU device is statically allocated in C, so there's no base
/DeviceHolder
. But other types of device implementations might not (it's not totally solved in the C library yet).
Co-authored-by: Joris Van den Bossche <[email protected]>
This PR adds basic support for wrapping the
ArrowDeviceArray
in nanoarrow Python. You have to try pretty hard to get anything that isn't a CPU array here, but 99% of this is just to get arepr()
and make the device array easier to debug/understand. In some future the CUDA and Metal implementations could live in separate Python packages as well.Most of the code changes here are to add the appropriate reprs for the
Array
andSchema
objects. These should work for both CPU and Device flavours of the C Data interface.@jorisvandenbossche the reprs should make it easier for your
__arrow_c_array__
/__arrow_c_schema__
testing!