Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support UUIDv6, UUIDv7, and UUIDv8 from RFC 9562 #89083

Open
stevesimmons mannequin opened this issue Aug 15, 2021 · 42 comments
Open

Support UUIDv6, UUIDv7, and UUIDv8 from RFC 9562 #89083

stevesimmons mannequin opened this issue Aug 15, 2021 · 42 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@stevesimmons
Copy link
Mannequin

stevesimmons mannequin commented Aug 15, 2021

BPO 44920
Nosy @serhiy-storchaka, @loganasherjones, @stevesimmons

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-08-15.17:26:05.956>
labels = ['type-feature', 'library', '3.11']
title = 'Support UUIDv6, UUIDv7, and UUIDv8 from the new version of RFC4122'
updated_at = <Date 2021-08-17.00:36:53.563>
user = 'https://github.com/stevesimmons'

bugs.python.org fields:

activity = <Date 2021-08-17.00:36:53.563>
actor = 'loganasherjones'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2021-08-15.17:26:05.956>
creator = 'stevesimmons'
dependencies = []
files = []
hgrepos = []
issue_num = 44920
keywords = []
message_count = 2.0
messages = ['399624', '399647']
nosy_count = 3.0
nosy_names = ['serhiy.storchaka', 'loganasherjones', 'stevesimmons']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue44920'
versions = ['Python 3.11']

Linked PRs

Related

@stevesimmons
Copy link
Mannequin Author

stevesimmons mannequin commented Aug 15, 2021

Three new types of UUIDs have been proposed in the latest draft of the next version of RFC4122. Full text of that draft is in [1] (published 21 April 2021; draft period ends 21 Oct 2021).

Support for these should be included in uuid.py for Python 3.11, with backport for 3.9 and 3.10. The timetable for Python 3.11 should fit with the end of the IETF draft period.

Implementation should be similar to the existing UUID classes in uuid.py, the prototypes in [2], or even parts of my own uuid6 version [3].

[1] https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format
[2] https://github.com/uuid6/prototypes/tree/main/python
[3] https://github.com/stevesimmons/pyuuid6/blob/main/uuid6.py

@stevesimmons stevesimmons mannequin added 3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Aug 15, 2021
@serhiy-storchaka
Copy link
Member

It is a new feature, and we usually do not backport new features to old Python versions, so it can only be included in Python 3.11 (backports can be provided by third-party libraries). Do you want to create a PR?

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@freundTech
Copy link
Contributor

Is there anyone currently working on this?

If not I'd like to have a look at implementing this.

@AlexWaygood AlexWaygood added 3.12 bugs and security fixes and removed 3.11 only security fixes labels Sep 8, 2022
@ambv
Copy link
Contributor

ambv commented Oct 15, 2023

Note: the spec for UUIDv5 - UUIDv8 is still a draft, it's still being revised:
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis

Therefore, it is too early to add this to the Python standard library.

@erlend-aasland erlend-aasland added 3.13 bugs and security fixes and removed 3.12 bugs and security fixes labels Jan 5, 2024
@shane-ns1
Copy link

UUIDv6, UUIDv7, and UUIDv8 are now in a standards-track RFC:

https://www.rfc-editor.org/rfc/rfc9562

@picnixz
Copy link
Member

picnixz commented Jun 17, 2024

I'll make a PR for this (I'm interested in those versions).

@hugovk hugovk added 3.14 new features, bugs and security fixes and removed 3.13 bugs and security fixes labels Jun 17, 2024
@hugovk hugovk changed the title Support UUIDv6, UUIDv7, and UUIDv8 from the new version of RFC4122 Support UUIDv6, UUIDv7, and UUIDv8 from RFC 9562 Jun 17, 2024
@picnixz

This comment was marked as resolved.

@gpshead
Copy link
Member

gpshead commented Jun 22, 2024

FYI - there are PyPI packages from people in the community attempting to come up with ways to use UUID v6-8 today:

What we'd be seeking to do within the stdlib is settle upon how these should fit as features into the standard library's existing uuid APIs. (here and with your other fields related issue)

@picnixz
Copy link
Member

picnixz commented Jun 22, 2024

Actually, I first tried an implementation based on those packages but after reading the RFC again, I was wondering: "which is the best course of action for the standard library?" and thus I decided to pick the (only) possible variant of v6 where the implementation is RFC-compliant (and then I hit the issue with the fields...) and for v7 and v8, I decided to first take the generic one (and made an alternative for v7 using monotonicity as specified in the RFC alternatives). I did not decide anything on v8 since discussion should first be done.

Note that oittaa's v7 is more or less like #120650 (non-monotonous sub-sec v7) since it follows the basic RFC but Simmons' v7 seems to follow the alternative (Method 3) combined with Method 1, §6.2 (Fixed Bit-Length Dedicated Counter) whereas #120830 is Method 3 combined with Method 2, §6.2 (Monotonic Random). I say "seems to" because it's not really clear whether the RFC allows mixing Method 1 & Method 3 (Method 1 forces the counter to immediately follow the 48-bit timestamp part but Method 3 says that the sub-seconds precision should be at that place so...). Method 2 explicitly tells me that I need to use the last 62 bits to make whatever I need so it's closer to RFC compliance.

Actually, there are more prototypes that I found last week: https://github.com/uuid6/prototypes, and they like to differ in the implementation of v7 and v8... For v6, the implementation is RFC-decided so we don't need to bother with a discussion, just the other issue on the fields. For v7/v8, do you think we need a Discourse (different from https://discuss.python.org/t/add-uuid7-in-uuid-module-in-standard-library/44390/7) & a PEP perhaps?

There's also https://github.com/uuid-rs/uuid which uses the same techniques that I presented in the first PR (namely, UUIDv7 has 80-bit security and UUIDv8 has custom chunks).

@picnixz
Copy link
Member

picnixz commented Jun 26, 2024

I've opened https://discuss.python.org/t/rfc-4122-9562-uuid-version-7-and-8-implementation/56725 to discuss the implementations more in detail.

@kyzer-davis
Copy link

Edited my original comment to feature a note about uuid.RFC_4122 in the code and some suggestions for handling that.

@piranna
Copy link

piranna commented Sep 6, 2024

From my understand of the v7 spec, it's possible to have both sub-millisecond precission AND a counter, and also the sub-millisecond DON'T need to be at least 12 bits, just only recommended since it's the size of the rand_a field.

I was planning on doing my own implementation (I can share it if you are interested to include it), and my idea was to have a sub_ms_bits flag to specify how much bits do you want to use for the sub-millisecond field (0-16, that's the most we can get with time.time_ns() function, or True to enable it with the suggested 12 bits), and a counter_bits int flag to indicate how much bits to use for the counter (I think the spec says up to 48, to ensure there's at least 10-14 pure-random bits at the end, but maybe I misunderstood). Later we can have a third and final flag to define what method to use for the counter bits. This way we could fully implement the UUID v7 spec in a flexible way, by allowing to opt-in to the extended precision and counter bits and define their length.

@sergeyprokhorenko
Copy link

sergeyprokhorenko commented Sep 7, 2024

@piranna

When using submillisecond precision, I advise you to use the whole timestamp as a counter if the timestamps for consecutive UUIDs have not increased. This will eliminate the need for a separate counter, and therefore it will be possible to preserve a sufficiently long random segment (at least 32 bits) to make attacks difficult by sequential brute force of UUID values.

As for the length of the submillisecond part, it is necessary to take into account not only the available precision of time sources in operating systems, but also the maximum performance of recording in the DBMS, which does not require nanosecond accuracy.

@picnixz
Copy link
Member

picnixz commented Sep 7, 2024

You're right that the RFC does not give a lower bound but it gives an upper bound (emphasis mine):

An OPTIONAL sub-millisecond timestamp fraction (12 bits at maximum) as per Section 6.2 (Method 3).

That being said, I chose to have a non-modularizable implemention of v7 for the standard library as a first draft. It's always possible to make it extensible in the future but I think the standard library should propose one way (if you want more, then I think 3rd-party libs should be used).

The issue with specifying a flexible counter bit length is that we need to keep track of multiple global variables (for instance, all UUIDv7 objects generated with a counter with say 7 bits will have their own global timestamp and counter for synchronization). This is quite easy since you would create a dictionary entry with its state, each entry being a possible configuration. But I don't really want to go there.

What could be done, however, is to create a factory of UUID factories. In other words, you specify a configuration for your UUIDv7 algorithm and you'll get an object that would create UUIDv7 objects according to that specific configuration. The factory object would have a single method, namely new() which would just create a new UUIDv7.

This would probably the easiest way to have a flexible UUIDv7 implementation included in the standard library. The standard library would however expose by default the UUIDv7 implementation using the RFC recommended methods.

@piranna
Copy link

piranna commented Sep 7, 2024

My comments were about made the v7 implementation according to the spec, but i'm ok with having base support with just 48 timestamp bits and no counters or already enabled opt-in featured, as far as API is designes to allow add and enabled the featured in upcoming versions. It's said, i don't want implentation to be opinión based but be spec based, like 12 bits submilliseconds being all or nothing when spec allows It to be variable, or force to choose between submilliseconds precission or counter when spec allows both. Call me purist if you want.

@picnixz
Copy link
Member

picnixz commented Sep 8, 2024

I'll work on designing a separate factory for that. What I want to more or less ensure is to be consistent with other languages if possible (e.g., PHP/Rust that can be both used for microservices and/or backends) rather than having Python follows its own rules.

Call me purist if you want.

I won't call you that because that's what I would personally do for my personal projects. However, for the stdlib we sometimes need to make design and implementation choices. But part of me do like a flexible implementation (especially if it is desired by the community).

Now, another of example of this is actually the random module which features standard random techniques (namely MT-19937 based generators) but allows you to specify your own PRG algorithm if you want to. I think having a similar interface could be beneficial (I'll work on it a bit later, just to suggest a PoC).

@sergeyprokhorenko
Copy link

@picnixz

What I want to more or less ensure is to be consistent with other languages if possible (e.g., PHP/Rust that can be both used for microservices and/or backends) rather than having Python follows its own rules.

See uuid-rs/uuid#767 (comment)

@piranna
Copy link

piranna commented Sep 23, 2024

I have create a full implementation of the UUIDv7 latest frozen spec (RFC 9562) at https://github.com/piranna/UUIDv7, with 100% tests coverage. The most complex part was to understand that in practice, having a monotonic random makes the counter as a sort of guard, since spec text is confusing explaining the relationship between methods 1 & 2. After that, implementation was easy-ish and I think got to get a very simple API, although complete and and at the same time unopinionated.

I have done it with the intention of being considered to be included as a built-in library in the Python batteries. Besides adding (more and better) documentation, what else would I need to get it included?

@picnixz
Copy link
Member

picnixz commented Sep 23, 2024

Thank you for that but I already (and completely) implemented the v7 but I'll have a look at yours. We can update my PR and avoid having two different PRs but not today (I'm currently travelling)

The issue here is not really the API but rather which standard implementation to choose by default. I followed what other languages decided to do (and will update it accordingly) but we will probably have another interface for more versatility.

In general, the Python library does not like having a single function with lots of parameters which make the implementation different and rather like having different functions with different names (but I'm not sure if this principle applies here; it did for the (yet to be accepted) fnmatch.filterfalse function). This is also the reason why I first wanted a parameterless uuidv7 function as a first implementation.

@piranna
Copy link

piranna commented Sep 23, 2024

My own one by default works without arguments, all of them are optional, and that just provides a 48 bits timestamp + 74 random bits. Later you can provide the arguments to enable the different methods, or tune Up, or define the explicit values for each one of the fields.

@picnixz
Copy link
Member

picnixz commented Sep 25, 2024

My own one by default works without arguments, all of them are optional

Actually this is what we try to avoid. One reason is that it makes maintenance harder (if any) and optimization harder as well (you create if-branches due to that). Finally, the fact that there are assertions / checks for checking whether the parameters combinations are good or not is something I would like to avoid (I think it's easier to make multiple functions rather than a single one; but we can make a single class with multiple class methods acting as a factory, which is what I originally had in mind).

I had a quick look at the API and I think we'll need to rethink the UUID class itself and the class itself should only be a view and not be responsible for generating the value.


What I can suggest is: we first decide on a default implementation that is really parameterless, namely def uuid7(): ... as in my PR and then we can add the features one by one. While I may seem critizing I am grateful that you took the time to look at the RFC and provide a versatile implementation.

@picnixz picnixz self-assigned this Sep 25, 2024
@piranna
Copy link

piranna commented Sep 25, 2024

I also think UUID API needs to be rethink, It seems like It was a uuid1 that later was refitted to allow support for the other versions. A UUID base class and several UUIDx chikd classes with their own properties would be better.

A parameterless version of UUID would just only create current timestamp + random, that can work pretty much as replacement of uuid4. I think we can start with that, just only in my use case i needed to set the timestamp explicitly too, just only meanwhile i was there, i wanted to go the extra (ten :-P) miles :-D And i'm glad you liked i did It :-)

@picnixz
Copy link
Member

picnixz commented Sep 25, 2024

I have a separate issue for tracking the UUID interface itself (burried in this huge conversation): #120878 (I added it to the issue; should have done that earlier...). It's only about the time fields but this is roughly one thing that is annoying (namely some attributes are not supported or have different meanings depending on the version).

coderbydesign added a commit to coderbydesign/insights-rbac that referenced this issue Oct 25, 2024
… column

This would start a new convention with the v2 APIs and models in order to have
consistent, clear naming, particularly when it comes to FK references.

We currently have the `uuid` field as the self-referencial FK column on the `Workspace`
model. More details around the impetus for changing the naming around IDs can be
found in RedHatInsights#1257.

These changes offer an alternate approach, since we have no data in stage/production,
where we no longer use the `uuid` as the `lookup_field` in Django, but rather
use a `uuid` as the `id` format. The rationale for not doing this, and having an
explicit `uuid` was primarily for having sequential integers as the PK/FK relations.
However, UUID7 is a time-ordered UUID, eliminating index issues and solving the
need for having distributed ID values across our services.

We're using `uuid-utils` [1] which is a compliant implementation using Rust's
UUID library. There's also an open proposal [2,3] to add it to Python's standard
library.

This updates the model, view and serializer. In order to move the `id` from int
to uuid, we need two migrations:
- one to move the current `id` column, and the `parent` column (because of the FK ref)
  as well as making the current `uuid` column the PK
- a second to then rename the `uuid` column to `id` and add the `parent` FK ref/column back

[1] https://github.com/aminalaee/uuid-utils
[2] python/cpython#89083
[3] https://discuss.python.org/t/add-uuid7-in-uuid-module-in-standard-library/44390
vstinner pushed a commit that referenced this issue Nov 12, 2024
@vstinner
Copy link
Member

Change 03924b5 added uuid.uuid8().

picnixz added a commit to picnixz/cpython that referenced this issue Dec 8, 2024
picnixz added a commit to picnixz/cpython that referenced this issue Dec 8, 2024
encukou pushed a commit that referenced this issue Jan 3, 2025
WolframAlph pushed a commit to WolframAlph/cpython that referenced this issue Jan 4, 2025
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this issue Jan 8, 2025
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests