-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEP 757: Add import-export API for Python int objects #35
Comments
To play with:
Later I'll show some simple benchmarks, but first please note that current code for gmpy2 do special cases for "small" integers to make performance acceptable. I think it's a common pattern and should be used in the second case as well (current code in the CPython main has a special case for 1-digit ints). That's why I would prefer more simple and less generic API, like proposed in #31. E.g. for reading: if (PyLong_IsCompact(obj)) { /* should be a quick check */
mpz_set_si(z, PyLong_CompactValue(obj)); /* compact value fits in some integer type */
}
else {
/* fallback to "array of digits" view for abs(obj) */
const Py_digit *digits = PyLong_GetDigits(obj); /* can't fail for non-compact ints */
mpz_import(z, ..., digits);
if (PyLong_IsNegative(obj) { /* looks better than PyLong_GetSign() ;) */
mpz_neg(z, z);
}
} (In this example, for simplicity, I assume that compact value fits in long.) |
Benchmarks (one, two, ~ten, ~hundred digits) for gmpy2.For exportOn the master:
With new API:
Without special case for 1-digit:
Using PyUnstable_Long_IsCompact (like in proposed above code):
New API overhead seems to be noticeable for 1-2 digits case (up to 10-15%). For importOn the master:
With new API:
|
If Otherwise, if we were to document this function as pairing with
Please, no more exported data. Make it a function that either returns a pointer to the layout, a function that copies the layout values into a struct, or integrate it with
Does this meant "most significant digit first"? Or "most significant byte of each digit first"? Or both? As we'd now use |
No, it's about endianness in the digit. We have Probably naming could be better. E.g.
That seems to be a good idea. |
I agree with @zooba, and I still stand by my earlier comment. I don't want to lock CPython -- and any future C API implementors -- into using a single layout for all ints. |
Do you mean adding a layout member to PyLong_DigitArray (export) and a layout parameter to typedef struct PyLong_DigitArray {
...
const PyLongLayout *layout; // <== NEW
} PyLong_DigitArray;
PyAPI_FUNC(int) PyLong_AsDigitArray(
PyObject *obj,
PyLong_DigitArray *array);
PyAPI_FUNC(PyLongWriter*) PyLongWriter_Create(
...
const PyLongLayout *layout); // <== NEW |
JFR, I don't think this locks CPython in any way: this single layout is an asymptotic one (for "big" ints). The API has enough freedom to emulate this view for small integers. Suggestion @zooba might work with API like (reading case): typedef struct PyLongLayout {
uint8_t bits_per_digit;
uint8_t digit_sizeof;
int8_t digits_order;
int8_t digit_endianness;
Py_ssize_t ndigits;
} PyLongLayout;
/* Returns a pointer to digit array (or NULL, if obj not PyLongObject) and
set layout. Which might be different wrt bits_per_digit and digit_sizeof for
"small" and "big" ints. */
PyAPI_FUNC(const void *) PyLong_AsDigits(PyObject *obj, PyLongLayout *layout); |
Currently, exporting a small integer as an array of digits is a O(1) operation. There is no need to allocate memory or anything, all integers are implemented internally as an array of digits. |
OK. Let's design API for what we have. If we need additional layouts, that kind of API can be added in the future, and if/when we add it, the API added here will start copying data. And if a non-CPython implementation uses a more elaborate scheme, it'll need to copy data too.
Same with the writer: We also need a We don't need endianness in the struct. I think something like a new medium-int representation (which would need a new API), is much more likely than switching to non-native endianness. As for avoiding
Would it work if you replace Looking at the above example: most callers will need the sign bit, so |
You meant word_endian too?
patch wrt gmpy2 test branchdiff --git a/src/gmpy2_convert_gmp.c b/src/gmpy2_convert_gmp.c
index 34f2546..e732080 100644
--- a/src/gmpy2_convert_gmp.c
+++ b/src/gmpy2_convert_gmp.c
@@ -44,24 +44,25 @@
static void
mpz_set_PyLong(mpz_t z, PyObject *obj)
{
- static PyLong_DigitArray long_export;
-
- PyLong_AsDigitArray(obj, &long_export);
- if (long_export.ndigits == 1) {
- mpz_set_si(z, long_export.digits[0]);
+ long value;
+ Py_ssize_t bytes = PyLong_AsNativeBytes(obj, &value, sizeof(long), -1);
+ if (0 <= bytes && bytes <= (Py_ssize_t)sizeof(long)) {
+ mpz_set_si(z, value);
}
else {
+ static PyLong_DigitArray long_export;
+ PyLong_AsDigitArray(obj, &long_export);
mpz_import(z, long_export.ndigits,
PyLong_LAYOUT.array_endian,
PyLong_LAYOUT.digit_size,
PyLong_LAYOUT.word_endian,
PyLong_LAYOUT.digit_size*8 - PyLong_LAYOUT.bits_per_digit,
long_export.digits);
+ if (long_export.negative) {
+ mpz_neg(z, z);
+ }
+ PyLong_FreeDigitArray(&long_export);
}
- if (long_export.negative) {
- mpz_neg(z, z);
- }
- PyLong_FreeDigitArray(&long_export);
}
static MPZ_Object *
Not necessary. This view rather represents the absolute value. |
Yes. I see no reason to single that out as a detail that might change.
Hm, I guess that's not good enough?
I meant that instead of
you'd do
|
Yes, especially the second case. You can see other benchmarks above.
Yes, if PyLong_AsNativeBytes can do a quick exit - that should be an alternative to PyLong_IsCompact/CompactValue.
This looks as a minor convenience for me. I think to query (optionally) sign people should use dedicated functions (like PyLong_IsNegative()). |
Oh, I forgot we already have If you're chasing nanoseconds, I don't think you want to compile for stable ABI, and that you will want to change your code if the internals change. So, (That said, consider having the optimizations in |
I renamed
Ok, I modified the API to add
|
The function |
Yes, but see #29 |
If I'm reading the earlier benchmarks correctly, then we'd have to beat 111ns and 155ns for these, which is clearly only going to be possible by leaking access to Long internals. So it's fine for when using unstable APIs, and the slower path is probably fine for stable callers (it's still faster than failing ;) ). The difference is largely unavoidable function call overhead, a type check, and the (potentially unaligned) memcpy. I daresay we could add a faster path through But that's okay, blazing speed is what the unstable API is for, and those who need it can at least opt into using it at their own cost. I do think that anything returning somewhat raw digits from a LongObject should also return the sign. One call to get all the info to accurately recreate the value, and you only need additional checks if you're creating your own fast path. |
Ok, done.
I would prefer to avoid adding such function for now, we can add it later if needed. What do you think? |
@skirpichev @zooba @encukou: I updated the API. What do you think? Are we good yet? |
Is there any impact going to come out of the discussions of setting a hard (if theoretical) limit on the length? |
@serhiy-storchaka may know about that :-) This API is limited by The PR python/cpython#123498 defines |
So, to destroy the writer (e.g. in an error handling path), the user should call That works, but I'd much prefer writers to consistently and have both |
Sorry for a late answer @zooba, but no. Those timings are for writing API (mpz->int). Above benchmarks are for reading (int->mpz). I'll repeat them in one place: For exportOn the master:
With new API (see aleaxit/gmpy#495):
Without special case for 1-digit:
Using PyUnstable_Long_IsCompact (like in proposed above code):
"if you replace PyLong_IsCompact and PyLong_CompactValue with PyLong_AsNativeBytes and a digit_size buffer":
So, using a quick check (currently with PyUnstable_Long_IsCompact) to fail on special functions for small integers, else use PyLong_GetDigits() (only one new function for reading digits) - has better performance then current API. Maybe you are right and we should introduce a bunch of simple PyUnstable_* functions instead of this more complicated API, which has performance regressions.
I doubt that using single layout for "big enough" integers leaks something important. All MP libraries share that detail, hardly CPython will break this in a future.
I like changes. |
I'm wrong. MAX_LONG_DIGITS fits into signed |
You can prepare your data in advance before creating a writer, so later, your code cannot fail.
Ok, I added a |
I open a vote on this API: |
I would appreciate @casevh opinion, but so far this API has severe performance degradation for reading small integers (int->mpz conversion), up to 25%. Hardly it's acceptable. So, probably projects like gmpy2, sage or flint-python - will use some other code path for such integers.
Yet maybe we should use size_t here, like GMP? Signed integer type looks odd in this context. |
Ah, PyLong_GetDigits() declaration has
+1 I'm also not sure if we should keep layout field in the PyLong_DigitArray struct. What else it could be if not native layout?! BTW, all required information about native layout is already available via PyLong_GetInfo(). PyLongLayout has additionally
@casevh, here are some benchmarks. Does this look acceptable for you? |
Have you tried PyLong_AsLongAndOverflow instead of PyUnstable_Long_IsCompact? It might not be as fast but it is stable (I think). |
Yes, I did, just not included this case in above benchmarks. It's good for 1-2 digits and for many digits, but for ~10 - performance has negative impact. benchmarks code# r.py; reading
import pyperf
from gmpy2 import mpz
runner = pyperf.Runner()
runner.bench_func('1<<7', mpz, 1 << 7)
runner.bench_func('1<<38', mpz, 1 << 38)
runner.bench_func('1<<300', mpz, 1 << 300)
runner.bench_func('1<<3000', mpz, 1 << 3000) # w.py; writing
import pyperf
from gmpy2 import mpz
runner = pyperf.Runner()
runner.bench_func('1<<7', int, mpz(1 << 7))
runner.bench_func('1<<38', int, mpz(1 << 38))
runner.bench_func('1<<300', int, mpz(1 << 300))
runner.bench_func('1<<3000', int, mpz(1 << 3000)) Benchmark results for reading (now my pr uses PyLong_AsLongAndOverflow):
Hmm, maybe it's a good tradeoff. JFR, benchmark results for writing:
Benchmark hidden because not significant (1): 1<<7 |
Yes, later we can add |
Ok, I updated the API:
|
@skirpichev: IMO it's fine to rely on |
AFAIK, Python uses |
@skirpichev: are you ok to use signed Py_ssize_t? |
This still look odd for me, but I agree with Petr - Py_ssize_t seems to be more consistent with the CPython API. size_t used in *alloc() functions. |
Ok, I updated the API again to move back to signed |
@skirpichev: Are you ok with the latest API? |
@vstinner, yes. Some minor nitpicks were mentioned above, feel free to ignore:
(PyLong_GetInfo() is slow, but it's not a problem, as it might be called just once at module initialization.) |
Ah, good :-)
The purpose of this API is to stop relying on Python internals, but instead offer an API which expose all information needed by these features (import/export). Maybe CPython internals will not change, but maybe PyPy uses a different layout and can decide to change its layout between two PyPy versions. MicroPython, GraalPython, RustPython and others can also use a different layout. IMO it's worth it exposing "digits_order" and "endian". Also, a structure is more convenient than a namedtuple in C. Well, an alternative would be to complete For reference, sys.int_info doc: https://docs.python.org/dev/library/sys.html#sys.int_info |
Certainly, no. And I would be surprised if some bigint library (including Python implementations) using something different here. Can anyone point on examples?
Or extend only PyLong_GetInfo() output. |
You'd be surprised if there's a library out there that doesn't use 15-bit (or 30-bit) digits? I'd honestly be surprised if there was a bigint library that used those sizes, at least by default. Do they really all do it that way? |
No, it's fine: GMP is an example (that's why digit_size & bits_per_digit - really useful information). But GMP has same behaviour wrt digits_order and endian parameters. Is there a bigint library, where least significant "digit" ("limb", whatever) - stored last? |
Ah, you just meant the digit order. On one hand, they may not exist and we may not imagine a reason why they would, but on the other hand, that's why we didn't have this function in the first place. Better to include all the options in there, just in case, since we've already failed to predict the future once 😉 |
It became difficult for me to follow this discussion and navigate between messages since [the PR] has 119 comments and this issue has 47 comments, and there are a few more related issues. So I decided to write down PEP 757 to summarize the discussion: python/peps#3961. It might be easier to make a decision on a PEP. |
I wrote PEP 757 – C API to import-export Python integers, it's now online! I announced the PEP on discuss.python.org as well. There is an open questions (by @skirpichev):
I like |
@vstinner, now description is outdated wrt the current proposal. Maybe you can sync? |
PEP 757 now contains all details. So I just replaced the first message with a reference to PEP 757. |
This issue has a long history, the API changed multiple times, it became confusing to follow it. So I created a new clean issue for PEP 757: #45. I close this issue. |
See PEP 757 – C API to import-export Python integers.
The text was updated successfully, but these errors were encountered: