-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PyLong_CopyBits
and _FromBits
functions
#12
Comments
Could you expound on why simply exposing |
Right now, Its signature doesn't lend itself to ease of readability or understanding - it takes multiple It also doesn't handle undersized buffers that well. It does handle filling in the full buffer before failing, which is great, but it doesn't calculate the required size. By adding a new function on top that does that, we could later combine the two parts (copying, and calculating the required size) for greater efficiency, though I suspect simply being able to handle compact ints efficiently will be more than enough perf win to be worthwhile. Also, leaving it as it is means that anyone currently relying on it won't be surprisingly impacted (I did add another option to skip setting exceptions, so they'll be slightly impacted, but it'll show up at compile time and can be fixed by adding a The reason for not just exposing the |
I guess you don't want to use just |
Yeah, we already have a Python "bytes", and "buffer", and "bytearray". "Bits" was the first that came to me that means (to me, at least) a plain old unadorned integer value, and doesn't already have a formal Python use. I considered "CopyNative"/"FromNative" or "CopyNativeInt"/"FromNativeInt". The latter I don't mind, though just "Native" feels a bit... abstract? I also considered "AsTwosComplement"/"FromTwosComplement" which is more accurate in some ways and less meaningful in others. "FromNativeBytes"/"AsNativeBytes" might be the best option? I only just came up with that. It doesn't feel as "cool" as CopyBits, but maybe people are more likely to guess what it's doing at first glance? |
I'd say plain "Native" is fine, but I see nothing wrong with "NativeBytes" either. |
Separately, do we really need the endianness/byte-order argument? Given that we already have existing APIs for known fixed sizes 32/64/128 (I think?), it seems unlikely that people want the order to be anything other than little-endian? |
Yes: there is no good default for this argument. For converting to int, you want native; for serialization to disk/network, you want a fixed one (and you should know which one you need).
No, we don't. That's the immediate problem this solves.
There's Brainstorming ideas:
|
Would it be possible to have first an API proposed/blessed by consumers of the current API? Or an API based on existing API in other libraries such as GMP? I have too little knowledge to propose such API.
For example, this API doesn't allow to pass a word of 32-bit and say that only the first 30 bits should be used. It looks incomplete. |
I would prefer to wait until the community agrees on an API before the C API Working Group has to vote to approve or a reject a specific API. |
GMP's use case (non-byte digits) is explicitly out of scope here, see above. Yes, this API is incomplete in that respect. |
It's been open for discussion on a public issue for weeks now, and the only request from potential consumers is feature creep. I think it's fine to discuss designs with you all as well without having to drag you into the main thread (I'm happy to act as the focal point for all inputs), but apart from the name, I'm confident in the design.
No other API does either, so this is a feature request, not just an API request. But it's also why I was concerned about using "bits" in the name - if someone as experienced as Victor misunderstands the API, then that name is definitely not helpful! I quite like |
You can use "RawBytes" instead of "Bits" if you want to avoid the confusion with bytes objects. "Native" can be confused for "native endianness".
Surely you mean it's
Agreed. |
I trust that Steve has explored the design space and that this is the API we need. I withdraw any bikeshedding apart from the name.
Let's go with |
"Buffer" can be confused with the buffer protocol as much as "Bytes" can be confused with bytes objects ;-) |
Whatever. :-) |
Yes, my bad. The wrapper converts a -1 endianness to
I actually just looked at my own quote and misread "Into" as "Int o'Buffer" 😆 Why has naming things got to be so hard! @iritkatriel Do you have any thoughts? |
There are different values for endianness in https://docs.python.org/3/library/struct.html#byte-order-size-and-alignment Is there a convention? |
In our generated header file, But also, |
After sleeping on the name and handling a few more comments on the issue from likely users, I'm choosing The signature and behaviour basically matches Thanks all for the discussion so far. If you have any more comments/questions, please raise them still. I'm not planning to put this into the stable ABI yet, so I don't want to hold for that discussion - I'd rather get the API into alpha and find out what works/doesn't first. But if people say to not even do that much, then I'll hold. |
Apparently, these functions were added to Python 3.13 alpha4, and so the issue can be closed, no? |
Over on python/cpython#111140 and PR python/cpython#114886 I've proposed an implementation for converting between
PyLongObject
and arbitrary-sized native integers. This is exposed (inefficiently) as the private_PyLong_AsByteArray
with a more complex signature, but I believe this design is more appropriate for long-term stability, retains the ability for us to optimise, leads users into correct usage, supports existing needs (in particular, PyO3-style language interop), and is readable in source code.Full docs on the PR, but here's the prototype and summary:
Endianness is -1 for native, or 0/1 for big/little. The expectation is that most callers will just pass -1, particularly when calling from another language, but if someone passes
PY_LITTLE_ENDIAN
from config.h then it will do the correct thing.PyLong_CopyBits
returns -1 with an exception set, or the required size of the buffer - if the result is less/equal ton_bytes
, no overflow occurred. No exception is raised for overflow, andbuffer
is always filled up ton_bytes
even if higher bytes get dropped (like a C-style downcast).The
FromBits
functions are basically just consistent wrappers around the existing private function. They'll do an__index__
call if necessary, but otherwise there was no gain in changing the implementation there.About the only thing I don't love is using "bits" when the argument is the number of bytes. Taking bytes is nice for what I consider the "normal" case of
PyLong_CopyBits(v, &int32_value, sizeof(int32_value), -1)
, andCopyBytes
didn't seem to imply as low-level an operation as copying bits. There's some discussion ofAs/FromByteArray
on the issue.The return value of
CopyBits
is also unusual, but I believe justified. The callers are very unlikely to simply return anOverflowError
back into Python, and will more likely clear it and do something different (perhaps allocating dynamic memory to copy the bits to), or just don't care and are happy to truncate (because they've already range checked).I believe these are good candidates for the limited API, but I'm happy to ship them first and gather feedback before pushing for that. The current limited API alternative is to use
PyObject_CallMethod(v, "to_bytes" ...)
and copy out of thebytes
object.Any thoughts/suggestions/concerns about these APIs?
The text was updated successfully, but these errors were encountered: