-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-128150: improve performances of uuid.uuid*
constructor functions.
#128151
base: main
Are you sure you want to change the base?
Conversation
The changes itself look good at first glance. On the other hand: if performance is really important, there there dedicated packages to calculate uuids (binding to rust or C) that are much faster. One more idea to improve performance: add a dedicated constructor that skips the checks. For example add to
Results in
(the |
I also thought about expanding the C interface for the module but it would have been too complex as a first iteration. As for third-party packages, I do know about them but there might be slightly differences in which methods they use for the UUID (and this could be a stop for existing code, namely switching to another implementation).
I also had this idea but haven't tested it as a first iteration. I wanted to get some feedback (I feel that performance gains are fine but OTOH, the code is a bit uglier =/) |
4f2744a
to
0710549
Compare
Ok the benchmarks are not always very stable but I do see improvements sith the dedicated constructor. I need to go now but I'll try to see which version is the best and the most stable. |
So, we're now stable and consistent:
Strictly speaking, the uuid1() benchmarks can be considered significant but only if you consider a 4% improvement as significant, which I did not. I only kept improvements over 10%. The last column is the same as the second one (PGO, no LTO) but using |
@@ -225,6 +237,15 @@ def __init__(self, hex=None, bytes=None, bytes_le=None, fields=None, | |||
object.__setattr__(self, 'int', int) | |||
object.__setattr__(self, 'is_safe', is_safe) | |||
|
|||
@classmethod | |||
def _from_int(cls, int, *, is_safe=SafeUUID.unknown): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _from_int(cls, int, *, is_safe=SafeUUID.unknown): | |
def _from_int(cls, int): |
assert int >= 0 and int <= 0xffff_ffff_ffff_ffff_ffff_ffff_ffff_ffff | ||
self = cls.__new__(cls) | ||
object.__setattr__(self, 'int', int) | ||
object.__setattr__(self, 'is_safe', is_safe) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object.__setattr__(self, 'is_safe', is_safe) |
At this moment the argument is unused. Removing it makes the call faster.
usedforsecurity=False | ||
).digest() | ||
return UUID(bytes=digest[:16], version=3) | ||
# HACL*-based MD5 is slightly faster than its OpenSSL version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put these comments in the pr comments and leave them out of the code. (but I do see the value of them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement overall! Personally I am not a fan of the lazy imports here, but I'll let someone else decide on that.
* :func:`~uuid.uuid3` is 47% faster for 16-byte names and 13% faster | ||
for 1024-byte names. Performances for longer names remain unchanged. | ||
* :func:`~uuid.uuid5` is 35% faster for 16-byte names and 24% faster | ||
for 1024-byte names. Performances for longer names remain unchanged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for 1024-byte names. Performances for longer names remain unchanged. | |
for 1024-byte names. Performance for longer names remains unchanged. |
functions: | ||
|
||
* :func:`~uuid.uuid3` is 47% faster for 16-byte names and 13% faster | ||
for 1024-byte names. Performances for longer names remain unchanged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for 1024-byte names. Performances for longer names remain unchanged. | |
for 1024-byte names. Performance for longer names remains unchanged. |
functions: | ||
|
||
* :func:`~uuid.uuid3` is 47% faster for 16-byte names and 13% faster | ||
for 1024-byte names. Performances for longer names remain unchanged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for 1024-byte names. Performances for longer names remain unchanged. | |
for 1024-byte names. Performance for longer names remains unchanged. |
for 1024-byte names. Performances for longer names remain unchanged. | ||
* :func:`~uuid.uuid4` is 33% faster and :func:`~uuid.uuid8` is 38% faster. | ||
|
||
Overall, dedicated generation of UUID objects version 3, 4, 5, and 8 is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you already have the specific improvements I would remove this paragraph (or the other way around)
The entire module has been written so to reduce import times but I understand. I'll adress your comments tomorrow and will also check if I can remove some unnecessary micro optimizations. Thank you! |
There are some points that can be addressed:
We can drop some micro-optimizations to reduce the diff. Most of the time is taken by function calls and loading integers.
HACL* MD5 is faster than OpenSSL MD5 so it's better to use the former. However, using
_md5.md5
orfrom _md5 import md5
is a micro-optimization that can be dropped without affecting performances too much.The rationale of expanding
not 0 <= x < 1 << 128
intox < 0 or x > 0xffff_ffff_ffff_ffff_ffff_ffff_ffff_ffff
is due to the non-equivalent bytecodes.Similar arguments apply to expanding
not 0 <= x < (1 << C)
intox < 0 or x > B
where B is the hardcoded hexadecimal value of(1 << C) - 1
.Bytecode comparisons
versus
uuid.*
functions #128150📚 Documentation preview 📚: https://cpython-previews--128151.org.readthedocs.build/