Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-128150: improve performances of uuid.uuid* constructor functions. #128151

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

picnixz
Copy link
Contributor

@picnixz picnixz commented Dec 21, 2024

There are some points that can be addressed:

  • We can drop some micro-optimizations to reduce the diff. Most of the time is taken by function calls and loading integers.

  • HACL* MD5 is faster than OpenSSL MD5 so it's better to use the former. However, using _md5.md5 or from _md5 import md5 is a micro-optimization that can be dropped without affecting performances too much.

  • The rationale of expanding not 0 <= x < 1 << 128 into x < 0 or x > 0xffff_ffff_ffff_ffff_ffff_ffff_ffff_ffff is due to the non-equivalent bytecodes.
    Similar arguments apply to expanding not 0 <= x < (1 << C) into x < 0 or x > B where B is the hardcoded hexadecimal value of (1 << C) - 1.

    Bytecode comparisons
       1           LOAD_SMALL_INT           0
                   LOAD_NAME                0 (x)
                   SWAP                     2
                   COPY                     2
                   COMPARE_OP              42 (<=)
                   COPY                     1
                   TO_BOOL
                   POP_JUMP_IF_FALSE        9 (to L1)
                   NOT_TAKEN
                   POP_TOP
                   LOAD_SMALL_INT           1
                   LOAD_SMALL_INT         128
                   BINARY_OP                3 (<<)
                   COMPARE_OP               2 (<)
                   JUMP_FORWARD             2 (to L2)
           L1:     SWAP                     2
                   POP_TOP
           L2:     TO_BOOL
                   UNARY_NOT
                   POP_TOP
                   LOAD_CONST               0 (None)
                   RETURN_VALUE
    

    versus

       1           LOAD_NAME                0 (x)
                   LOAD_SMALL_INT           0
                   COMPARE_OP               2 (<)
                   COPY                     1
                   TO_BOOL
                   POP_JUMP_IF_TRUE         8 (to L1)
                   POP_TOP
                   LOAD_NAME                0 (x)
                   LOAD_CONST               0 (340282366920938463463374607431768211455)
                   COMPARE_OP             132 (>)
                   POP_TOP
                   LOAD_CONST               1 (None)
                   RETURN_VALUE
           L1:     POP_TOP
                   LOAD_CONST               1 (None)
                   RETURN_VALUE
    

📚 Documentation preview 📚: https://cpython-previews--128151.org.readthedocs.build/

@eendebakpt
Copy link
Contributor

The changes itself look good at first glance. On the other hand: if performance is really important, there there dedicated packages to calculate uuids (binding to rust or C) that are much faster.

One more idea to improve performance: add a dedicated constructor that skips the checks. For example add to UUID:

    @classmethod
    def _from_int(cls, int,  is_safe=SafeUUID.unknown):
        v= cls.__new__(cls)
        object.__setattr__(v, 'int', int)
        object.__setattr__(v, 'is_safe', is_safe)
        return v

Results in

%timeit UUID._from_int(123 )
%timeit UUID(int=123, version=None)
451 ns ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
767 ns ± 41.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

(the UUID._from_int can be used from inside uuid4 for example)

@picnixz
Copy link
Contributor Author

picnixz commented Dec 21, 2024

On the other hand: if performance is really important, there there dedicated packages to calculate uuids (binding to rust or C) that are much faster.

I also thought about expanding the C interface for the module but it would have been too complex as a first iteration. As for third-party packages, I do know about them but there might be slightly differences in which methods they use for the UUID (and this could be a stop for existing code, namely switching to another implementation).

One more idea to improve performance: add a dedicated constructor that skips the checks

I also had this idea but haven't tested it as a first iteration. I wanted to get some feedback (I feel that performance gains are fine but OTOH, the code is a bit uglier =/)

@picnixz picnixz force-pushed the perf/uuid/init-128150 branch from 4f2744a to 0710549 Compare December 21, 2024 14:25
@picnixz
Copy link
Contributor Author

picnixz commented Dec 21, 2024

Ok the benchmarks are not always very stable but I do see improvements sith the dedicated constructor. I need to go now but I'll try to see which version is the best and the most stable.

@picnixz
Copy link
Contributor Author

picnixz commented Dec 22, 2024

So, we're now stable and consistent:

+----------------------------------------+---------+-----------------------+-----------------------+
| Benchmark                              | ref     | new                   | opt                   |
+========================================+=========+=======================+=======================+
| uuid3(NAMESPACE_DNS, os.urandom(16))   | 1.13 us | 767 ns: 1.47x faster  | 767 ns: 1.47x faster  |
+----------------------------------------+---------+-----------------------+-----------------------+
| uuid3(NAMESPACE_DNS, os.urandom(1024)) | 2.05 us | 1.82 us: 1.13x faster | 1.78 us: 1.15x faster |
+----------------------------------------+---------+-----------------------+-----------------------+
| uuid4()                                | 1.15 us | 867 ns: 1.33x faster  | 860 ns: 1.34x faster  |
+----------------------------------------+---------+-----------------------+-----------------------+
| uuid5(NAMESPACE_DNS, os.urandom(16))   | 1.10 us | 810 ns: 1.35x faster  | 778 ns: 1.41x faster  |
+----------------------------------------+---------+-----------------------+-----------------------+
| uuid5(NAMESPACE_DNS, os.urandom(1024)) | 1.52 us | 1.22 us: 1.24x faster | 1.19 us: 1.27x faster |
+----------------------------------------+---------+-----------------------+-----------------------+
| uuid8()                                | 926 ns  | 673 ns: 1.38x faster  | 671 ns: 1.38x faster  |
+----------------------------------------+---------+-----------------------+-----------------------+
| Geometric mean                         | (ref)   | 1.21x faster          | 1.22x faster          |
+----------------------------------------+---------+-----------------------+-----------------------+

Benchmark hidden because not significant (3): uuid1(), uuid1(node, None), uuid1(None, clock_seq)

Strictly speaking, the uuid1() benchmarks can be considered significant but only if you consider a 4% improvement as significant, which I did not. I only kept improvements over 10%. The last column is the same as the second one (PGO, no LTO) but using python -OO (namely assertions are removed).

@@ -225,6 +237,15 @@ def __init__(self, hex=None, bytes=None, bytes_le=None, fields=None,
object.__setattr__(self, 'int', int)
object.__setattr__(self, 'is_safe', is_safe)

@classmethod
def _from_int(cls, int, *, is_safe=SafeUUID.unknown):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _from_int(cls, int, *, is_safe=SafeUUID.unknown):
def _from_int(cls, int):

assert int >= 0 and int <= 0xffff_ffff_ffff_ffff_ffff_ffff_ffff_ffff
self = cls.__new__(cls)
object.__setattr__(self, 'int', int)
object.__setattr__(self, 'is_safe', is_safe)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
object.__setattr__(self, 'is_safe', is_safe)

At this moment the argument is unused. Removing it makes the call faster.

usedforsecurity=False
).digest()
return UUID(bytes=digest[:16], version=3)
# HACL*-based MD5 is slightly faster than its OpenSSL version,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put these comments in the pr comments and leave them out of the code. (but I do see the value of them)

Copy link
Contributor

@eendebakpt eendebakpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement overall! Personally I am not a fan of the lazy imports here, but I'll let someone else decide on that.

* :func:`~uuid.uuid3` is 47% faster for 16-byte names and 13% faster
for 1024-byte names. Performances for longer names remain unchanged.
* :func:`~uuid.uuid5` is 35% faster for 16-byte names and 24% faster
for 1024-byte names. Performances for longer names remain unchanged.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for 1024-byte names. Performances for longer names remain unchanged.
for 1024-byte names. Performance for longer names remains unchanged.

functions:

* :func:`~uuid.uuid3` is 47% faster for 16-byte names and 13% faster
for 1024-byte names. Performances for longer names remain unchanged.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for 1024-byte names. Performances for longer names remain unchanged.
for 1024-byte names. Performance for longer names remains unchanged.

functions:

* :func:`~uuid.uuid3` is 47% faster for 16-byte names and 13% faster
for 1024-byte names. Performances for longer names remain unchanged.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for 1024-byte names. Performances for longer names remain unchanged.
for 1024-byte names. Performance for longer names remains unchanged.

for 1024-byte names. Performances for longer names remain unchanged.
* :func:`~uuid.uuid4` is 33% faster and :func:`~uuid.uuid8` is 38% faster.

Overall, dedicated generation of UUID objects version 3, 4, 5, and 8 is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you already have the specific improvements I would remove this paragraph (or the other way around)

@picnixz
Copy link
Contributor Author

picnixz commented Dec 23, 2024

The entire module has been written so to reduce import times but I understand. I'll adress your comments tomorrow and will also check if I can remove some unnecessary micro optimizations. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants