-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support UUIDv6, UUIDv7, and UUIDv8 from RFC 9562 #89083
Comments
Three new types of UUIDs have been proposed in the latest draft of the next version of RFC4122. Full text of that draft is in [1] (published 21 April 2021; draft period ends 21 Oct 2021). Support for these should be included in uuid.py for Python 3.11, with backport for 3.9 and 3.10. The timetable for Python 3.11 should fit with the end of the IETF draft period. Implementation should be similar to the existing UUID classes in uuid.py, the prototypes in [2], or even parts of my own uuid6 version [3]. [1] https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format |
It is a new feature, and we usually do not backport new features to old Python versions, so it can only be included in Python 3.11 (backports can be provided by third-party libraries). Do you want to create a PR? |
Is there anyone currently working on this? If not I'd like to have a look at implementing this. |
Note: the spec for UUIDv5 - UUIDv8 is still a draft, it's still being revised: Therefore, it is too early to add this to the Python standard library. |
UUIDv6, UUIDv7, and UUIDv8 are now in a standards-track RFC: |
I'll make a PR for this (I'm interested in those versions). |
This comment was marked as resolved.
This comment was marked as resolved.
FYI - there are PyPI packages from people in the community attempting to come up with ways to use UUID v6-8 today:
What we'd be seeking to do within the stdlib is settle upon how these should fit as features into the standard library's existing |
Actually, I first tried an implementation based on those packages but after reading the RFC again, I was wondering: "which is the best course of action for the standard library?" and thus I decided to pick the (only) possible variant of v6 where the implementation is RFC-compliant (and then I hit the issue with the fields...) and for v7 and v8, I decided to first take the generic one (and made an alternative for v7 using monotonicity as specified in the RFC alternatives). I did not decide anything on v8 since discussion should first be done. Note that oittaa's v7 is more or less like #120650 (non-monotonous sub-sec v7) since it follows the basic RFC but Simmons' v7 seems to follow the alternative (Method 3) combined with Method 1, §6.2 (Fixed Bit-Length Dedicated Counter) whereas #120830 is Method 3 combined with Method 2, §6.2 (Monotonic Random). I say "seems to" because it's not really clear whether the RFC allows mixing Method 1 & Method 3 (Method 1 forces the counter to immediately follow the 48-bit timestamp part but Method 3 says that the sub-seconds precision should be at that place so...). Method 2 explicitly tells me that I need to use the last 62 bits to make whatever I need so it's closer to RFC compliance. Actually, there are more prototypes that I found last week: https://github.com/uuid6/prototypes, and they like to differ in the implementation of v7 and v8... For v6, the implementation is RFC-decided so we don't need to bother with a discussion, just the other issue on the fields. For v7/v8, do you think we need a Discourse (different from https://discuss.python.org/t/add-uuid7-in-uuid-module-in-standard-library/44390/7) & a PEP perhaps? There's also https://github.com/uuid-rs/uuid which uses the same techniques that I presented in the first PR (namely, UUIDv7 has 80-bit security and UUIDv8 has custom chunks). |
I've opened https://discuss.python.org/t/rfc-4122-9562-uuid-version-7-and-8-implementation/56725 to discuss the implementations more in detail. |
Edited my original comment to feature a note about |
From my understand of the v7 spec, it's possible to have both sub-millisecond precission AND a counter, and also the sub-millisecond DON'T need to be at least 12 bits, just only recommended since it's the size of the I was planning on doing my own implementation (I can share it if you are interested to include it), and my idea was to have a |
When using submillisecond precision, I advise you to use the whole timestamp as a counter if the timestamps for consecutive UUIDs have not increased. This will eliminate the need for a separate counter, and therefore it will be possible to preserve a sufficiently long random segment (at least 32 bits) to make attacks difficult by sequential brute force of UUID values. As for the length of the submillisecond part, it is necessary to take into account not only the available precision of time sources in operating systems, but also the maximum performance of recording in the DBMS, which does not require nanosecond accuracy. |
You're right that the RFC does not give a lower bound but it gives an upper bound (emphasis mine):
That being said, I chose to have a non-modularizable implemention of v7 for the standard library as a first draft. It's always possible to make it extensible in the future but I think the standard library should propose one way (if you want more, then I think 3rd-party libs should be used). The issue with specifying a flexible counter bit length is that we need to keep track of multiple global variables (for instance, all UUIDv7 objects generated with a counter with say 7 bits will have their own global timestamp and counter for synchronization). This is quite easy since you would create a dictionary entry with its state, each entry being a possible configuration. But I don't really want to go there. What could be done, however, is to create a factory of UUID factories. In other words, you specify a configuration for your UUIDv7 algorithm and you'll get an object that would create UUIDv7 objects according to that specific configuration. The factory object would have a single method, namely This would probably the easiest way to have a flexible UUIDv7 implementation included in the standard library. The standard library would however expose by default the UUIDv7 implementation using the RFC recommended methods. |
My comments were about made the v7 implementation according to the spec, but i'm ok with having base support with just 48 timestamp bits and no counters or already enabled opt-in featured, as far as API is designes to allow add and enabled the featured in upcoming versions. It's said, i don't want implentation to be opinión based but be spec based, like 12 bits submilliseconds being all or nothing when spec allows It to be variable, or force to choose between submilliseconds precission or counter when spec allows both. Call me purist if you want. |
I'll work on designing a separate factory for that. What I want to more or less ensure is to be consistent with other languages if possible (e.g., PHP/Rust that can be both used for microservices and/or backends) rather than having Python follows its own rules.
I won't call you that because that's what I would personally do for my personal projects. However, for the stdlib we sometimes need to make design and implementation choices. But part of me do like a flexible implementation (especially if it is desired by the community). Now, another of example of this is actually the |
|
I have create a full implementation of the UUIDv7 latest frozen spec (RFC 9562) at https://github.com/piranna/UUIDv7, with 100% tests coverage. The most complex part was to understand that in practice, having a monotonic random makes the counter as a sort of guard, since spec text is confusing explaining the relationship between methods 1 & 2. After that, implementation was easy-ish and I think got to get a very simple API, although complete and and at the same time unopinionated. I have done it with the intention of being considered to be included as a built-in library in the Python batteries. Besides adding (more and better) documentation, what else would I need to get it included? |
Thank you for that but I already (and completely) implemented the v7 but I'll have a look at yours. We can update my PR and avoid having two different PRs but not today (I'm currently travelling) The issue here is not really the API but rather which standard implementation to choose by default. I followed what other languages decided to do (and will update it accordingly) but we will probably have another interface for more versatility. In general, the Python library does not like having a single function with lots of parameters which make the implementation different and rather like having different functions with different names (but I'm not sure if this principle applies here; it did for the (yet to be accepted) fnmatch.filterfalse function). This is also the reason why I first wanted a parameterless uuidv7 function as a first implementation. |
My own one by default works without arguments, all of them are optional, and that just provides a 48 bits timestamp + 74 random bits. Later you can provide the arguments to enable the different methods, or tune Up, or define the explicit values for each one of the fields. |
Actually this is what we try to avoid. One reason is that it makes maintenance harder (if any) and optimization harder as well (you create if-branches due to that). Finally, the fact that there are assertions / checks for checking whether the parameters combinations are good or not is something I would like to avoid (I think it's easier to make multiple functions rather than a single one; but we can make a single class with multiple class methods acting as a factory, which is what I originally had in mind). I had a quick look at the API and I think we'll need to rethink the UUID class itself and the class itself should only be a view and not be responsible for generating the value. What I can suggest is: we first decide on a default implementation that is really parameterless, namely |
I also think UUID API needs to be rethink, It seems like It was a uuid1 that later was refitted to allow support for the other versions. A UUID base class and several UUIDx chikd classes with their own properties would be better. A parameterless version of UUID would just only create current timestamp + random, that can work pretty much as replacement of uuid4. I think we can start with that, just only in my use case i needed to set the timestamp explicitly too, just only meanwhile i was there, i wanted to go the extra (ten :-P) miles :-D And i'm glad you liked i did It :-) |
I have a separate issue for tracking the UUID interface itself (burried in this huge conversation): #120878 (I added it to the issue; should have done that earlier...). It's only about the time fields but this is roughly one thing that is annoying (namely some attributes are not supported or have different meanings depending on the version). |
… column This would start a new convention with the v2 APIs and models in order to have consistent, clear naming, particularly when it comes to FK references. We currently have the `uuid` field as the self-referencial FK column on the `Workspace` model. More details around the impetus for changing the naming around IDs can be found in RedHatInsights#1257. These changes offer an alternate approach, since we have no data in stage/production, where we no longer use the `uuid` as the `lookup_field` in Django, but rather use a `uuid` as the `id` format. The rationale for not doing this, and having an explicit `uuid` was primarily for having sequential integers as the PK/FK relations. However, UUID7 is a time-ordered UUID, eliminating index issues and solving the need for having distributed ID values across our services. We're using `uuid-utils` [1] which is a compliant implementation using Rust's UUID library. There's also an open proposal [2,3] to add it to Python's standard library. This updates the model, view and serializer. In order to move the `id` from int to uuid, we need two migrations: - one to move the current `id` column, and the `parent` column (because of the FK ref) as well as making the current `uuid` column the PK - a second to then rename the `uuid` column to `id` and add the `parent` FK ref/column back [1] https://github.com/aminalaee/uuid-utils [2] python/cpython#89083 [3] https://discuss.python.org/t/add-uuid7-in-uuid-module-in-standard-library/44390
Co-authored-by: Hugo van Kemenade <[email protected]>
Change 03924b5 added |
) Co-authored-by: Hugo van Kemenade <[email protected]>
improve UUIDv8 uniqueness tests
improve UUIDv8 uniqueness tests
) Co-authored-by: Hugo van Kemenade <[email protected]>
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
Related
fields
andtime_*
properties must not be used on UUIDs that are time-agnostic. #120878The text was updated successfully, but these errors were encountered: