Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version metadata field size limits #14965

Open
fschulze opened this issue Nov 24, 2023 · 3 comments
Open

Version metadata field size limits #14965

fschulze opened this issue Nov 24, 2023 · 3 comments

Comments

@fschulze
Copy link

fschulze commented Nov 24, 2023

Are there currently any enforced limits on the version string in metadata? I looked through the code, but couldn't find anything.

With no limit we can get denial of service attacks, only with Python 3.11 this is mitigated to some extend. See https://docs.python.org/3/library/stdtypes.html#int-max-str-digits, which in practice would apply a limit of 4300 digits per number element of a version.

My initial motivation was database side sorting in devpi. It is possible to construct comparable version strings, but they require the order of magnitude for numbers (see https://stackoverflow.com/a/30752452/3748142) and without limits this isn't possible. Also see the currently inefficient ordering in warehouse:

# TODO: We need a better solution to this than to just do it inline inside
# this method. Ideally the version field would just be sortable, but
# at least this should be some sort of hook or trigger.
releases = (
request.db.query(Release)
.filter(Release.project == project)
.options(
orm.load_only(Release.project_id, Release.version, Release._pypi_ordering)
)
.all()
)
for i, r in enumerate(
sorted(releases, key=lambda x: packaging_legacy.version.parse(x.version))
):
r._pypi_ordering = i

@fschulze fschulze added feature request requires triaging maintainers need to do initial inspection of issue labels Nov 24, 2023
@miketheman miketheman added data quality and removed requires triaging maintainers need to do initial inspection of issue labels Nov 24, 2023
@miketheman
Copy link
Member

There's no current limit. See #12483 for previous report of this issue, and the conclusion that we should probably adhere to a PEP standard, but none exist yet.

In regards to the parsing and sorting inefficiency, what do you have in mind?

@fschulze
Copy link
Author

Currently the packaging Version implementation uses a tuple for its sort key. That tuple could be encoded as a string in a way which makes it sortable.

To make numbers sortable in that scenario one has to prefix them with their encoded length.

Some examples:

number encoded
1 A1
12 B12
123 C123

So a version string like 1.20.11.b1 has a key tuple of (0, (1, 20, 11), ('b', 1), -Infinity, Infinity, -Infinity). I still have to look at some details, but a possible encoding could work like this:

  • -Infinity becomes ! (ascii 33)
  • Infinity becomes ~ (ascii 126)
  • the length of the integers would be encoded with A-Z for 1 - 26 and a-z for 27 - 52 (or we allow [\\]^_\` and use chr(length + 64))
  • strings would be prefixed with @

So the result would be A0A1B20B11@bA1!~!.

For 1.20.11.b1.post1 the key is (0, (1, 20, 11), ('b', 1), ('post', 1), Infinity, -Infinity) which results in A0A1B20B11@bA1@postA1~!.

These sort correctly as strings:

>>> "A0A1B20B11@bA1!~!" > "A0A1B20B11@bA1@postA1~!"
False
>>> "A0A1B20B11@bA1!~!" < "A0A1B20B11@bA1@postA1~!"
True
>>> parse_version("1.20.11.b1") > parse_version("1.20.11.b1.post1")
False
>>> parse_version("1.20.11.b1") < parse_version("1.20.11.b1.post1")
True

In devpi and on pypi.org LegacyVersion from packaging-legacy needs to be supported, but that should work very similar. The - from the -1 in the first tuple element would work as is. Something like 2017i (pytz style) has the key (-1, ('00002017', '*i', '*final')) which would become -A1@00002017@*i@*final.

Open question would be what to do with integers with more than the encodable digits (52-61)? From the linked tickets there don't seem to be any relevant ones and we could store the version as a string prefixed by !. It would mess up the sort order in those cases, but it would be like for legacy versions which are sorted before all others as well.

@fschulze
Copy link
Author

I made a first draft implementation: fschulze/devpi@9448085

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants