Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vote on PEP 757 – C API to import-export Python integers #45

Closed
5 tasks done
vstinner opened this issue Oct 7, 2024 · 40 comments
Closed
5 tasks done

Vote on PEP 757 – C API to import-export Python integers #45

vstinner opened this issue Oct 7, 2024 · 40 comments
Labels

Comments

@vstinner
Copy link

vstinner commented Oct 7, 2024

Vote on PEP 757 – C API to import-export Python integers.

Since I co-authored PEP 757, I abstain from voting on my own PEP :-)

I closed the previous issue since it had a long history and the API changed multiple times.

@encukou
Copy link
Collaborator

encukou commented Oct 7, 2024

I'd prefer being able to deprecate PyLong_Export -- there's no version directly in struct PyLongExport, so if things change too much we'll need to add new API. And that's OK, but we will want deprecation/removal in that case.
Otherwise, +1 from me!

@skirpichev
Copy link

skirpichev commented Oct 8, 2024

@encukou,

  1. How long API should live? E.g. mpz_import/export API was in the GMP since 2002.
  2. PyLongLayout has no version field too. Are you ok with that?
  3. Which changes do you want to address? I think we should keep something in mind, even highly hypothetical, if going to add a version field.

I address d.p.o comment also here:

This function always succeeds if obj is a Python int object or a subclass. [...] This means we can never deprecate this function. We might want to do that if, for example, we introduce an internal format that doesn’t fit in the struct PyLongExport parameters. I suggest removing the line.

Probably, this concern also could be resolved with a new version field, right? We did our best to preserve "always successful" constraint and I would like to keep it. (And it requires much less code adaptation on the gmpy2 side.)

Edit: lets look closer on the structure

typedef struct PyLongExport {
    int64_t value;
    uint8_t negative;
    Py_ssize_t ndigits;
    const void *digits;
    // Member used internally, must not be used for other purpose.
    Py_uintptr_t _reserved;
} PyLongExport;

Most public fields (negative/ndigits/digits) are required, else the whole API doesn't make sense. So, we left with one field: value - that part might be versioned. How about this (which is closer to the original proposal by @pitrou):

typedef struct PyLongExport2 {
    uint8_t is_compact;
    union {
        struct {
            uint8_t version;
            int64_t value;
        } compact_value;
        struct {
            uint8_t negative;
            Py_ssize_t ndigits;
            const void *digits;
            // _reserved could be actually here.
        } digit_array;
    };
    // Member used internally, must not be used for other purpose.
    Py_uintptr_t _reserved;
} PyLongExport2;

Or we could use just reserve enough room for compact_value:

struct {
    int64_t value[2];
} compact_value;

@encukou
Copy link
Collaborator

encukou commented Oct 8, 2024

How long API should live?

As long as we can support it, but it shouldn't be forced to live forever :)

mpz_import/export API was in the GMP since 2002.

Sure, but AFAIK there's no expectation of that being a zero-copy mechanism, and it is in a library that's expected to provide conversions between arbitrary formats. It seems that adding a mpz_export2 with an extra argument would be too much trouble for GMP.

But, not being able to fail with MemoryError, PyLong_Export practically can't switch away from being a zero-copy mechanism. The API prevents future CPython (and any other implementations that use the C API) from using alternative int representations internally.

PyLongLayout has no version field too. Are you ok with that?

Oh, right -- a version field would be better on PyLongLayout; then PyLongExport wouldn't need one.

No version anywhere means that if/when PyLongLayout/PyLongExport no longer fit, we'll need to deprecate PyLong_Export itself. That's an OK plan, if we allow ourselves to deprecate it (which means raising exceptions under -Werror).

Which changes do you want to address?

Examples are 2's complement, or a 64-bit compact value with a separate sign bit.

But the ability to deprecate functions is mainly useful for unknown unknowns. I can't see decades ahead.

@skirpichev
Copy link

But, not being able to fail with MemoryError, PyLong_Export practically can't switch away from being a zero-copy mechanism. The API prevents future CPython (and any other implementations that use the C API) from using alternative int representations internally.

Not all. Only if this representation can't fit in 64-bit value. But if this field is too small - we can make it versioned or just bigger from start.

Oh, right -- a version field would be better on PyLongLayout; then PyLongExport wouldn't need one.

Then I'm lost :( The PyLongLayout has nothing to say about representation for "compact" values, that fit in some machine-sized integer.

It's used to describe "asymptotic" representation of CPython integers as "digit array". (Maybe PEP isn't clear enough here?) Which else field we might want to add to this structure?

Examples are 2's complement, or a 64-bit compact value with a separate sign bit.

I'm assuming you meant 2's complement for "compact" values, not for representation of big integers. Then both variants could be solved as described above:

  1. versioned structure compact_value in the union
  2. or just more big structure, say 128-bit

If this complication is required - I'm slightly biased to (1).

@vstinner
Copy link
Author

vstinner commented Oct 8, 2024

versioned structure compact_value in the union

I dislike anonymous union, it caused us some troubles in the past.

or just more big structure, say 128-bit

I dislike having two numbers, you have to define the order. I don't think that it's needed. I suggest to revisit the API once Python will need 128-bit. For now, it only needs 60-bit.

@vstinner
Copy link
Author

vstinner commented Oct 8, 2024

Examples are 2's complement, or a 64-bit compact value with a separate sign bit.

We can make a small change to PEP 757:

  • Change PyLongExport.value type from signed int64_t to unsigned uint64_t.
  • Use also PyLongExport.negative when digits is NULL.

It doubles the capacity of PyLongExport.value.

@pitrou
Copy link

pitrou commented Oct 8, 2024

We can make a small change to PEP 757:

What would be the point though? How likely is it that CPython switches to an uncommon "uint64 + sign bit" representation? @markshannon what do you think?

@vstinner
Copy link
Author

vstinner commented Oct 8, 2024

What would be the point though?

It may simplify the code using this API: https://peps.python.org/pep-0757/#export-pylong-export-with-gmpy2

Currently, this code is needed:

            mpz_import(z, 1, -1, sizeof(int64_t), 0, 0, &value);
            if (value < 0) {
                mpz_t tmp;
                mpz_init(tmp);
                mpz_ui_pow_ui(tmp, 2, 64);
                mpz_sub(z, z, tmp);
                mpz_clear(tmp);
            }

I expect that it can be simplified to the following code which should be more efficient:

            mpz_import(z, 1, -1, sizeof(uint64_t), 0, 0, &value);
            if (long_export.negative) {
                mpz_neg(z, z);
            }

Note: currently, only Windows uses this code path, since Windows only has 32-bit C long for mpz_set_si() fast-path.

@encukou
Copy link
Collaborator

encukou commented Oct 8, 2024

I should never have mentioned any examples.

A PyLong_Export that cannot fail (for ints), cannot be deprecated, and cannot switch to copying data, limits the internal int representation (in CPython and other implementations of the C API) to only what's exportable through PyLongExport.
IMO, that is unacceptable.
We need a way to evolve in the future, not guess now what the future might be.

@vstinner
Copy link
Author

vstinner commented Oct 8, 2024

I'd prefer being able to deprecate PyLong_Export -- there's no version directly in struct PyLongExport, so if things change too much we'll need to add new API. And that's OK, but we will want deprecation/removal in that case.

I wrote python/peps#4025 to allow PyLong_Export() to fail.

@pitrou
Copy link

pitrou commented Oct 8, 2024

It may simplify the code using this API: https://peps.python.org/pep-0757/#export-pylong-export-with-gmpy2

The goal of this API is not to simplify consumer code, it's to make the export as efficient as possible.

To make the export as efficient as possible, it must be as close as possible to CPython's internal representation.

CPython is much more likely to adopt a int64 representation internally, than a "uint64 + sign bit" representation. The latter simply doesn't make sense, because it can't fit in a machine register.

Thus the export API should favor int64 over "uint64 + sign bit".

@skirpichev
Copy link

I dislike anonymous union, it caused us some troubles in the past.

Was it before or after adding the C11 compiler requirement? With union, the structure will be also smaller (both in the current variant & with version field).

I dislike having two numbers, you have to define the order.

Native. Something like that does make sense for me with hypothetical 128-bit machine int's. Using e.g. uint64_t + sign doesn't look as a possible future for CPython integers.

I don't think that it's needed.

What about version field?

@vstinner
Copy link
Author

vstinner commented Oct 8, 2024

Was it before or after adding the C11 compiler requirement?

In short, the Python C API only requires C99, not C11: see the long discussion.

@vstinner
Copy link
Author

vstinner commented Oct 8, 2024

@encukou:

I'd prefer being able to deprecate PyLong_Export -- there's no version directly in struct PyLongExport, so if things change too much we'll need to add new API. And that's OK

I don't think that we should add a version member. I don't see how users would be supposed to use it. If things change, I would prefer a new API, as @encukou wrote.

@encukou
Copy link
Collaborator

encukou commented Oct 8, 2024

The union doesn't need to be anonymous, let's just name it.

How about:

typedef struct PyLongExport2 {
    // type also serves as a version field: on unknown type,
    // users must fall back to e.g. PyLong_AsNativeBytes.
    // (And it's big because the next field is
    // pointer-aligned, there's a lot of padding anyway.)
    uint32_t type;  
    union {
        int64_t compact_value;
        struct {
            Py_ssize_t ndigits;
            const void *digits;
            uint8_t negative;

            // Member used internally, must not be used for other purpose.
            Py_uintptr_t _reserved;
        } digit_array;
    } data;
} PyLongExport2;

(edit: the union should be last so the struct can grow)

Maybe this discussion should be on Discourse, not here...

@vstinner
Copy link
Author

vstinner commented Oct 8, 2024

The union doesn't need to be anonymous, let's just name it.

Which problem are you trying to address with this different structure?

@encukou
Copy link
Collaborator

encukou commented Oct 8, 2024

Which problem are you trying to address with this different structure?

Compared to @skirpichev's earlier proposal:

  • Avoiding an anonymous union
  • Adding a “free” version field (by specifying what to do with an unknown union tag)
  • Making unused padding bits potentially useful in the future

@skirpichev
Copy link

The union doesn't need to be anonymous, let's just name it.

I'm not sure if anonymous unions are banned from public API. (The long discussion seems to be unfinished.)

But the rest clearly does make sense. This even more closer to the original proposal by Antoine.

@vstinner
Copy link
Author

@erlend-aasland @mdboom @serhiy-storchaka: What's your call on the PEP 757?

@skirpichev
Copy link

Note that Import-API (which requires also PyLongLayout description for int's) from previous discussions seems much less controversial. I think it helps if C-API will have at least that part of PEP.

Maybe it worth to make separate pools? The Export API + Layout API and the Import API + Layout API.

@vstinner
Copy link
Author

vstinner commented Oct 15, 2024

@encukou:

I'd prefer being able to deprecate PyLong_Export -- there's no version directly in struct PyLongExport, so if things change too much we'll need to add new API. And that's OK, but we will want deprecation/removal in that case.
Otherwise, +1 from me!

We modified PEP 757 to remove the following sentence from PyLong_Export() function:

This function always succeeds if obj is a Python int object or a subclass.

So you can now deprecate the function by emitting DeprecationWarning, which can set an exception if the warning is treated as an error.

@vstinner
Copy link
Author

Nobody voted so far. Does it mean that you're against PEP 757, or just that you are waiting until the discussion settles down?

@encukou
Copy link
Collaborator

encukou commented Oct 22, 2024

I'm +0 for the current PEP. It does the job, but I think the union-based API from the discussion could do it better.

@skirpichev
Copy link

@encukou, here you suggest that "other workarounds" for your concerns (including union-based API, where PyLong_Export() returns a type of export) "IMO worse". Did you change your mind or I miss something?

What do you think on the Import API (layout + PyLongWriter)? To me it looks PEP going to be rejected. If so, I don't see good reasons to loose that part (which got much less criticism) too.

@encukou
Copy link
Collaborator

encukou commented Oct 25, 2024

Yes, I changed my mind. But not much, I'm still +0 here.

A disadvantage of the checkbox-based voting system is that it doesn't distinguish between “no”, “+0”, and “didn't vote yet”.

The import API looks is fine to me.

@vstinner
Copy link
Author

vstinner commented Nov 7, 2024

@mdboom @serhiy-storchaka @zooba: Would you mind to review PEP 757 (and vote)?

@zooba
Copy link

zooba commented Nov 11, 2024

One minor nit (which can be clarified in docs, as far as I'm concerned, rather than having to update the PEP) is the lifetime of the digits provided to PyLongWriter_Create - can I change/free it immediately after Create or do I have to wait until Finish/Discard?

There's no mention in the PEP of limited API - is this going in? I think we should also add PyLong_AsNativeBytes and FromNativeBytes to the limited API at the same time, and I'm okay with that being in 3.14.

@skirpichev
Copy link

rather than having to update the PEP

(I think it's fine, if updates are minor. Then we can copy-paste docs to the implementation pr.)

can I change/free it immediately after Create or do I have to wait until Finish/Discard?

But there is no other options to free digits. It's lifetime is same as for the PyLongWriter object; it's not a caller job to allocate memory for this array - that does PyLongWriter_Create. And to free it, here is the rule: "The caller can either initialize the array of digits digits and then call PyLongWriter_Finish() to get a Python int, or call PyLongWriter_Discard() to destroy the writer instance."

There's no mention in the PEP of limited API - is this going in?

I think it should be good enough for this. @vstinner?

But note that now e.g. #29 is not going to be in the limited API. That's a better candidate.

I think we should also add PyLong_AsNativeBytes and FromNativeBytes to the limited API

And this; +1.

@zooba
Copy link

zooba commented Nov 11, 2024

But there is no other options to free digits.

You're right, I misread. It's an out parameter (which makes far more sense). So yes, no need to change anything.

@vstinner
Copy link
Author

There's no mention in the PEP of limited API - is this going in?

I would prefer to not add these APIs to the limited C API in Python 3.14. There is no need for that right now, it can be done later.

@vstinner
Copy link
Author

@zooba:

One minor nit (which can be clarified in docs, as far as I'm concerned, rather than having to update the PEP) is the lifetime of the digits provided to PyLongWriter_Create - can I change/free it immediately after Create or do I have to wait until Finish/Discard?

You must call Finish or Discard to free digits. You cannot/must not free it explicitly.

Technically, digits is a pointer to an integer digits: the memory is a PyLongObject object. But it's an implementation detail ;-)

@vstinner
Copy link
Author

@mdboom @serhiy-storchaka: Gentle reminder: Would you mind to review PEP 757 (and vote)? Don't hesitate if you have doubts or questions.

@skirpichev
Copy link

Based on @serhiy-storchaka feedback in python/cpython#111140 (comment), I would guess that he is seconded to the Mark's opinion in the discussion thread. I.e. it's better to have mpz_import/export-like API on the CPython side.

We tried to address this concern here: https://peps.python.org/pep-0757/#provide-mpz-import-export-like-api-instead (this slightly updated by python/peps#4126).

@vstinner
Copy link
Author

This issue is open for almost 2 months. PEP 757 got 4 positive votes, but @serhiy-storchaka didn't vote yet. Does the vote need 100% majority?

@serhiy-storchaka
Copy link

I am very sorry for being a hindrance. I would prefer more mpz_import/mpz_export like API, but it has serious drawbacks: more complex code, not zero-copy, and most importantly, it only works with absolute values -- this makes it implementation dependent. The PEP addresses this option. Now, I don't want to delay the vote even further by bikeshedding. I am +1.

@vstinner
Copy link
Author

vstinner commented Nov 28, 2024

@serhiy-storchaka:

I am very sorry for being a hindrance. I would prefer more mpz_import/mpz_export like API, but it has serious drawbacks: more complex code, not zero-copy, and most importantly, it only works with absolute values -- this makes it implementation dependent. The PEP addresses this option. Now, I don't want to delay the vote even further by bikeshedding. I am +1.

Thanks Serhiy. I mark the PEP as Accepted and close the issue.

The PR python/cpython#121339 is now ready for your final review :-)

@skirpichev
Copy link

I am very sorry for being a hindrance.

(You are not. 3.14.b1 scheduled on 2025-05-06. It's a long way to...)

not zero-copy

That depends on the lib. Probably, most will offer GMP-like access to internals.

it only works with absolute values

I would like to see some bigints library, with a different internals (say 2s complement).

Now, I don't want to delay the vote even further by bikeshedding.

(I would appreciate your bikeshedding on python/peps#4111.)

@zooba
Copy link

zooba commented Nov 28, 2024

That depends on the lib. Probably, most will offer GMP-like access to internals.

The problem with the GMP-like interface would've been that the caller can specify the shape of the result. We didn't want to implement that, so it would likely have been a case where the caller has to provide exactly the settings that match our internal representation or they get an error. That's silly, so we use a different interface where we tell them how it's shaped. That's all that is different here.

@serhiy-storchaka
Copy link

That depends on the lib. Probably, most will offer GMP-like access to internals.

Does it work with LibTomMath (the library used in Tcl)? At least Python did not provide such access to inteomrnals before this PEP.

I would like to see some bigints library, with a different internals (say 2s complement).

It seems that cpp_int in Boost implements this. I myself implemented a minimal arbitrary fixed size integer library on templates over 20 years ago, and 2s complement was a natural choice.

@skirpichev
Copy link

Does it work with LibTomMath (the library used in Tcl)?

I think so: the mp_int seems to be public. You can do similar in the flint too.

It seems that cpp_int in Boost implements this.

(Isn't that just a wrapper to the GMP/TomMath?)

arbitrary fixed size integer library

I would say this rather fits to extensions of machine integer arithmetic, i.e. of fixed precision (albeit arbitrary). Probably without need to something beyond school algorithms for arithmetic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants