Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] tool.setuptools.license-files results in invalid metadata #4759

Open
dnicolodi opened this issue Nov 26, 2024 · 30 comments
Open

[BUG] tool.setuptools.license-files results in invalid metadata #4759

dnicolodi opened this issue Nov 26, 2024 · 30 comments
Labels
bug Needs Triage Issues that need to be evaluated for severity and status.

Comments

@dnicolodi
Copy link
Contributor

setuptools version

setuptools==74.1.2

Python version

Python 3.13

OS

any

Additional environment information

No response

Description

If any of the the glob patterns specified intool.setuptools.license-files matches a file in the package, setuptools generates invalid metadata: it includes a License-File field while specifying Metadata-Version to be 2.1. This is invalid and packaging raises an exception while parsing the metadata. This likely results in the resulting distributions to not be accepted by PyPI.

Because tool.setuptools.license-files has a default value of ['LICEN[CS]E*', 'COPYING*', 'NOTICE*', 'AUTHORS*'] the problem can be encountered also in packages that do not explicitly set this field in pyproject.toml but happen to have a file matching the default glob pattern.

Expected behavior

Do not emit the License-File field or do it and specify Metadata-Version: 2.4 as per PEP 639.

How to Reproduce

Here is a short reproducer:

$ cat >pyproject.toml <<EOF
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
[project]
name = "foo"
version = "1.2.3"
[tool.setuptools]
license-files = ["LICENSE"]
EOF
$ touch LICENSE
$ uv build .
$ tar xf dist/foo-1.2.3.tar.gz
$ python
>>> import pathlib
>>> import packaging.metadata
>>> packaging.metadata.Metadata.from_email(pathlib.Path('dist/foo-1.2.3/PKG-INFO').read_text())

Output

  + Exception Group Traceback (most recent call last):
  |   File "<python-input-3>", line 1, in <module>
  |     packaging.metadata.Metadata.from_email(pathlib.Path('dist/foo-1.2.3/PKG-INFO').read_text())
  |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/Users/daniele/src/twine/.venv/lib/python3.13/site-packages/packaging/metadata.py", line 781, in from_email
  |     raise ExceptionGroup(
  |         "invalid or unparsed metadata", exc_group.exceptions
  |     ) from None
  | ExceptionGroup: invalid or unparsed metadata (1 sub-exception)
  +-+---------------- 1 ----------------
    | packaging.metadata.InvalidMetadata: license-file introduced in metadata version 2.4, not 2.1
    +------------------------------------
>>>
@dnicolodi dnicolodi added bug Needs Triage Issues that need to be evaluated for severity and status. labels Nov 26, 2024
@abravalheri
Copy link
Contributor

abravalheri commented Nov 26, 2024

This is a well-known case of an early implementation of a PEP 639 draft. Right now tools are equipped to accept this variation.

In time it will be fixed. But not immediately, due to effort constraints and release scheduling (we are in the process of implementing previous versions of metadata first).

Probably we can close this as a "kind of duplicate" of the request to implement PEP 639.

@dnicolodi
Copy link
Contributor Author

This is a well-known case of an early implementation of a PEP 639 draft. Right now tools are equipped to accept this variation.

What I am trying to say is that tools are not equipped to accept this variation. packaging is the PyPA library for parsing distribution metadata and, as shown above, it does not handle this variation. Which other tools dealing with packages metadata that handle this correctly do you have in mind?

@cdce8p
Copy link
Contributor

cdce8p commented Nov 27, 2024

@dnicolodi Is right here. The License-File metadata generated by setuptools causes an error during validation with packaging==24.2 since the metadata version isn't >=2.4. This hasn't been an issue so far as twine doesn't support version 2.4 and thus License-File isn't included in the form fields during upload.

What I'm uncertain about is the solution. Sure, we can remove the License-File field from the generated metadata, and probably do so anyway as it isn't spec compliant even with 2.4. However that wouldn't resolve the issue for all packages which pin older setuptool versions. As it was added here quite a while ago, I suspect the impact will be quite large. This might be something which needs to be handled upstream (twine / warehouse) too, e.g. only push the License-File field if the metadata version is >=2.4 and ignore it for older versions.

dnicolodi added a commit to dnicolodi/twine that referenced this issue Nov 27, 2024
@abravalheri
Copy link
Contributor

abravalheri commented Nov 27, 2024

Which other tools dealing with packages metadata that handle this correctly do you have in mind?

I was thinking that twine/PyPI/pip have been accepting License-File with metadata version 2.2 for a while.

Just to emphasize that we are going to tackle this problem in time (btw, thanks @cdce8p for the PRs). But before that we will implement metadata version 2.3 (I just got a review recently on one of the PRs necessary for 2.3, but this week I don't have the time to delve into it).

Meanwhile, my strong preference is to not change anything that has been in place for the last 2 or 3 years.

If you need the process core metadata using packaging before we proceed with the updates in setuptools from 2.2, thourgh 2.3 and all the way to 2.4, maybe you can consider deleting/ignoring the License-File field before the validation?

@dnicolodi
Copy link
Contributor Author

I was thinking that twine/PyPI/pip have been accepting License-File with metadata version 2.2 for a while.

I am not very familiar with all the consumers of package metadata. I have the impression that there are several tools using different approaches for metadata parsing and validation. Only relatively recently packaging implemented metadata validation. I hope there is going to be a convergence toward more strict parsing of metadata, either via packaging or via equivalent tools.

Indeed, I bumped into the issue with License-File because I would like to move twine from using a semi-maintained library for extracting metadata from distribution files to a solution based on packaging pypa/twine#1180 This comes with stricter metadata validation, which IMHO is a good thing. I have now implemented a "work-around" for the metadata emitted by existing setuptools versions as there is nothing we can do about those.

I was under the impression that PyPI implements validation of the distribution files metadata, but if it does, the validation is not very strict. What is validated strictly is the form data that is sent alongside the distribution files. That can be tweaked as needed to support the metadata emitted by existing setuptools releases.

pip is most likely the most permissive consumer of metadata, thus I don't expect it to do any validation.

I don't see any reason to change setuptools to fix this issue straight away. I filed this issue to make sure that you are aware of it. Because, as I wrote above, metadata parsing has not been very strict so far, it could have been that you are not aware of it.

@abravalheri
Copy link
Contributor

Thank you very much @dnicolodi.
Hopefully we can have it soon and start to converge towards packaging.

One thing that would help a lot is if packaging itself could provide functionally to emit metadata.
This way we use a single and consistent library for both emitting and parsing metadata.
It is probably been already tracked by pypa/packaging#570.

@dnicolodi
Copy link
Contributor Author

meson-python and scikit-core (and possibly other build backends) use pytproject-metadata https://github.com/pypa/pyproject-metadata/ for translating pyproject.toml into an RFC822 metadata file (I am one of the maintainers of meson-python).

There is some work toward incorporating pyproject-metadata into packaging pypa/packaging#846 pypa/packaging#847

On the other hand, I like that there are more that one implementation of the standard: that makes it easier to ensure that the only implementation does not diverge from the standard and avoids such implementation, bugs included, to become the de-facto standard. We were in that situation before PEP 517 and PEP 621, and we are still cleaning up the mess... 🙂

@cdce8p
Copy link
Contributor

cdce8p commented Nov 27, 2024

I was thinking that twine/PyPI/pip have been accepting License-File with metadata version 2.2 for a while.

PyPI uses packaging.metadata to validate the submitted form fields. Unknown fields are just ignored. With 24.2 license_files was defined as being added in 2.4 thus it would start to fail now when submitted with older version.

Saw the commit from @dnicolodi on the packaging PR: dnicolodi/twine@ab3bf7d. All considered, that's probably the most practical solution here.

Just to emphasize that we are going to tackle this problem in time (btw, thanks @cdce8p for the PRs). But before that we will implement metadata version 2.3.

Meanwhile, my strong preference is to not change anything that has been in place for the last 2 or 3 years.

Sorry in advance if I'm a bit annoying here. I can understand your position, but wouldn't fully agree with it. We can do more even before we implement metadata version 2.3.

Why am I pushing for these changes? For Home Assistant we use a script to try to validate the licenses of all requirements and tbh it's just a mess. Some packages use the outdated classifiers, some custom license strings and others the full license in the metadata. I'm prepared to open PRs for some of these dependencies but it only really makes sense if I can use the final pyproject.toml metadata. Otherwise I'd have to do that all over again once setuptools is able to emit valid 2.4 metadata. With the changes above, it only needs a new release with an updated setuptools version and everything would be fine.

@dnicolodi
Copy link
Contributor Author

PyPI uses packaging.metadata to validate the submitted form fields. Unknown fields are just ignored. With 24.2 license_files was defined as being added in 2.4 thus it would start to fail now when submitted with older version.

AFAIU warehouse (the software powering PyPI) does not ignore unknown fields. This should be the validation code: https://github.com/pypi/warehouse/blob/d57082ee37327bc1e8a28f96470b00ed226c0f87/warehouse/forklift/metadata.py#L263-L348

However, twine submits only fields it known about, and it does not know yet about license_files (which should appear as multiple license_file in the form data, but this is a minor technical detail).

@dnicolodi
Copy link
Contributor Author

hatchling and poetry [...] are also blocked on the twine update and can't yet move to 2.4.

Can you point me to more information regarding this? I was under the impression that hatchling users use the upload mechanism provided by the hatch framework, and same for poetry. Isn't this the case? One of the PyPI maintainers on Discourse said that there are PyPI uploads conforming to metadata 2.4. I thought those were hatchling projects.

@cdce8p
Copy link
Contributor

cdce8p commented Nov 27, 2024

PyPI uses packaging.metadata to validate the submitted form fields. Unknown fields are just ignored. With 24.2 license_files was defined as being added in 2.4 thus it would start to fail now when submitted with older version.

AFAIU warehouse (the software powering PyPI) does not ignore unknown fields. This should be the validation code: https://github.com/pypi/warehouse/blob/d57082ee37327bc1e8a28f96470b00ed226c0f87/warehouse/forklift/metadata.py#L263-L348

You're right. I only checked the packaging call Metadata.from_raw(...) which ignores unknown fields. Didn't know warehouse implemented additional checks.

hatchling and poetry [...] are also blocked on the twine update and can't yet move to 2.4.

Can you point me to more information regarding this? I was under the impression that hatchling users use the upload mechanism provided by the hatch framework, and same for poetry. Isn't this the case?

For hatchling see pypa/hatch#1828 (comment). The same would likely apply to poetry as well. Although they haven't yet started implementing PEP 639 to begin with python-poetry/poetry#9670. The comment for poetry was more referencing the fact that they already allow project.license = MIT in pyproject.toml and just back-populate it to the License field instead of License-Expression.

One of the PyPI maintainers on Discourse said that there are PyPI uploads conforming to metadata 2.4. I thought those were hatchling projects.

AFAIK you can overwrite the Metadata-Version with hatchling and at some point (before PyPI added official support for it I believe) this was even the default. It has since been reverted though.

The last comment on discuss I saw was

No further uploads have occurred since that first attempt.

@abravalheri
Copy link
Contributor

Hi guys, I understand the urgency of the topic, but I am also very conscious that we need to move very carefully to avoid breaking the ecosystem all at once. In the past we had a lot of bad experiences with botched releases, so I am trying to take the most conservative approach possible (and even with that it is possible there will be problems).

The good thing is that we have most of the pieces already in place (thanks again for the PRs). We now need to coordinate to release them, collect feedback and fix if things are broken.

My preference is to do things step by step and wait one week or so between steps to receive feedback of early adopters on edge cases (of course respecting the holiday season coming ahead and the times of all collaborations, so probably longer than that). I suggest the following:

  1. Setuptools to finish testing Metadata 2.2 implementation (followed by release + feedback + bug fixing)
  2. Setuptools to bump Metadata 2.3 - should be trivial - (followed by release + feedback + bug fixing)
  3. Setuptools to implement PEP 639 and Metadata 2.4 - (followed by release + feedback + bug fixing)
  4. Twine and other downstream consumers to support strict Metadata 2.4 1

I think that step 2 is going to be trivial to support, once we concede that we don't have to worry to much about the pypa/packaging#845 and treat it as a temporary bug.

Footnotes

  1. I am not sure twine will be able to fully drop lenient parsing for Metadata < 2.4 (many users do pin the version of their build dependencies, so there may be some backlash form the community). But this is a decision that the twine team can decide to take.

@cdce8p
Copy link
Contributor

cdce8p commented Nov 27, 2024

One last post from me and then I'll shut up and respect your decision :)

we need to move very carefully to avoid breaking the ecosystem all at once. In the past we had a lot of bad experiences with botched releases, so I am trying to take the most conservative approach possible (and even with that it is possible there will be problems).

Absolutely! The only thing our opinions differ I believe is in the risk these PRs actually pose. What I'm trying to say is it's quite small and we can safely do it now. Let me explain

Doing these now would provide valuable feedback way before we'd consider moving to 2.4. That's the only thing I would change about your proposed timeline.

@dnicolodi
Copy link
Contributor Author

I don't want to influence the pace of introduction of new features in any way (egoistically I would like support for metadata 2.4 to land as soon as possible to be able to use it in my projects, but it would only be something nice to have) so this should not be read as supporting one position or the other, but I would like to point out that pyproject-metadata and thus meson-python have taken the approach of emitting metadata where the metadata version is set to what is required to represent faithfully the user input, namely the content of pyproject.toml.

meson-python does not support any dynamic metadata fields, thus for it this means either metadata version 2.2 or 2.4. The latter is used only when pyproject.toml contains a PEP 639 style license declaration, namely a string value for project.license or a non empty project.license-files.

I think this is the only way to proceed as pre PEP 639 and post PEP 639 license declaration formats are not compatible with each other, thus emitting metadata version 2.4 with a pyproject.toml using an older metadata format is at minimum very tricky.

These PRs have more details pypa/pyproject-metadata#132 pypa/pyproject-metadata#206

@cdce8p
Copy link
Contributor

cdce8p commented Nov 27, 2024

I think this is the only way to proceed as pre PEP 639 and post PEP 639 license declaration formats are not compatible with each other, thus emitting metadata version 2.4 with a pyproject.toml using an older metadata format is at minimum very tricky.

I'm proposing the reverse: Emitting the current core metadata 2.1 with a post PEP 639 pyproject.toml. Even if that means some fields wouldn't be written even though the data in the project table is there (namely License-Expression).

That approach works fine for hatchling. I've also opened PRs to add the same to flit.

@dnicolodi
Copy link
Contributor Author

I'm proposing the reverse: Emitting the current core metadata 2.1 with a post PEP 639 pyproject.toml.

I don't understand what the advantage of doing this is. IIUC, your goal is to move as fast as possible to have packages with PEP 639 metadata. However, with this approach, having packages with PEP 639 license metadata will require two releases of the packages involved: one that updated the metadata fields in pyproject.toml to the ones specified in PEP 639 and one to rebuild the package metadata shipped in the distribution files with a future setuptools release.

The cost of doing this is not only the two releases, but also that between the two releases the involved packages will not have clear license information displayed on PyPI. IIRC, PEP 639 forbids having classifiers indicating the package license while using the License-Expression metadata field. Because currently PyPI takes the license information displayed on the package pages from the classifiers, the packages will do not show clear license information.

@cdce8p
Copy link
Contributor

cdce8p commented Nov 29, 2024

However, with this approach, having packages with PEP 639 license metadata will require two releases of the packages involved: one that updated the metadata fields in pyproject.toml to the ones specified in PEP 639 and one to rebuild the package metadata shipped in the distribution files with a future setuptools release.

Correct.

I don't understand what the advantage of doing this is. IIUC, your goal is to move as fast as possible to have packages with PEP 639 metadata.

Updating packages always takes time. What can be optimized though is developer time. I can either fix / convert all wrong licenses to SPDX expressions now and do a second pass to convert {text = "..."} to the new syntax, or just to the update once knowing that eventually a new release will just use the License-Expression instead. Given that it often takes >1 month for smaller projects to merge PRs, this would just be exhausting. So for now, I've paused updating the license strings until the new syntax in whatever form is supported.

To give an example, poetry already supports the license = "MIT" syntax while still missing PEP 639 support. So these aren't an issue.

Just for Home Assistant, I currently track over 650 packages with inaccurate license data, no SPDX expression in either the License-Expression or the License field.

The cost of doing this is not only the two releases, but also that between the two releases the involved packages will not have clear license information displayed on PyPI.

Not entirely. PyPI can handle all case. This is with both classifier and (old) license metadata 1

Screenshot 2024-11-29 at 20 13 23

and here an example with only the (old) license metadata, but still a valid SPDX expression. 2

Screenshot 2024-11-29 at 20 14 27

IIRC, PEP 639 forbids having classifiers indicating the package license while using the License-Expression metadata field. Because currently PyPI takes the license information displayed on the package pages from the classifiers, the packages will do not show clear license information.

No, build tools MAY raise an error if a license classifier is present.3 PyPI must only reject uploads with both License and License-Expression fields but those don't collide here.4

Footnotes

  1. https://pypi.org/project/homeassistant/

  2. https://pypi.org/project/AEMET-OpenData/

  3. https://peps.python.org/pep-0639/#deprecate-license-classifiers

  4. https://peps.python.org/pep-0639/#deprecate-license-field

dnicolodi added a commit to dnicolodi/twine that referenced this issue Nov 30, 2024
dnicolodi added a commit to dnicolodi/twine that referenced this issue Nov 30, 2024
dnicolodi added a commit to dnicolodi/twine that referenced this issue Nov 30, 2024
dnicolodi added a commit to dnicolodi/twine that referenced this issue Nov 30, 2024
dnicolodi added a commit to dnicolodi/twine that referenced this issue Nov 30, 2024
dnicolodi added a commit to dnicolodi/twine that referenced this issue Nov 30, 2024
dnicolodi added a commit to dnicolodi/twine that referenced this issue Dec 1, 2024
dnicolodi added a commit to dnicolodi/twine that referenced this issue Dec 1, 2024
nijel added a commit to WeblateOrg/wllegal that referenced this issue Dec 16, 2024
That seems to be the cleanest approach to pypa/setuptools#4759
nijel added a commit to WeblateOrg/wllegal that referenced this issue Dec 16, 2024
nijel added a commit to WeblateOrg/language-data that referenced this issue Dec 16, 2024
dnicolodi added a commit to dnicolodi/twine that referenced this issue Dec 16, 2024
dnicolodi added a commit to dnicolodi/twine that referenced this issue Dec 16, 2024
dnicolodi added a commit to dnicolodi/twine that referenced this issue Dec 16, 2024
@hauntsaninja
Copy link
Contributor

If we need to be really conservative about a new metadata version, can we revert whatever it is that is putting License-File into 2.1 metadata, since PyPI hard rejects that? Currently this metadata is hard broken so can't really break it further ;-)

@hauntsaninja
Copy link
Contributor

hauntsaninja commented Dec 17, 2024

My user story:

  • At some point I added license = {file = "LICENSE"} to some project. I think I got this from packaging.python.org
  • Later, I try to upload to PyPI. I get upload fail with 400 license-file introduced in metadata version 2.4, not 2.1. See https://packaging.python.org/specifications/core-metadata for more information.
  • As far as I can tell, I haven't done anything at all special in my packaging. I quickly shake a fist at Python packaging, then Google the error
  • There are only two hits on Google, one is this uv issue metadata version missmatch in uv 0.5.5 astral-sh/uv#9513 , the other is some Mastodon thing. I get from there to here
  • If there's a workaround setuptools would like to encourage, it would be nice to post here!

@dnicolodi
Copy link
Contributor Author

  • At some point I added license = {file = "LICENSE"} to some project

This is correct, and it is correclty handled by setuptools. It is not the cause of your issue.

The error you encounter is cause by the default value of the tool.setuptools.license-files field. To get rid of it, you can reset its value to an empty list. Add

[tool.setuptools]
license-files = []

to the pyproject.toml of your package.

I speculate that you are suing uv publish to upload your package to PyPI. If you use twine to upload your package you do not encounter the issue because the current version of twine does not transmit the License-File metadata field, and future versions will filter it out when reading metadata declaring a metadata versions <= 2.4, to work-around this setuptools issue.

@hauntsaninja
Copy link
Contributor

hauntsaninja commented Dec 17, 2024

Thanks for explaining that! Yeah, I ended up just using twine (before I saw your post and konstin's post here explaining that PyPI validates formdata, not the uploaded METADATA)... so I guess I just uploaded a wheel with invalid METADATA that will fail packaging.metadata.Metadata validation. At least I have a lot of company (IIUC ~every setuptools wheel in the last few years? Looks like a supermajority in a venv of 800 packages I have on hand). I guess off-topic for this tracker, but if it's so widespread maybe there's a case packaging.metadata should special case this one thing...

sigmavirus24 pushed a commit to pypa/twine that referenced this issue Dec 17, 2024
)

* Remove "content" from set of specially handled metadata fields

The "content" field is always added to the form data after the package
metadata has been flattened, thus it is not needed to handle it in the
flattening method. Remove the associated test.

This will allow to tighten typing in a successive commit.

* Remove "attestations" from the set of specially handled metadata fields

The "attestations" field is a string: strings do not need flattening.

* Refactor code a tiny bit

Avoid looking a key up into a set of one element and remove an
indirection through a module global variable. This will make it a bit
easier to extend the flattening logic in successive commits.

* Switch from pkginfo to packaging for parsing distribution metadata

The packaging package is maintained by the PyPA and it is the de-facto
reference implementation for the packaging standards. Using packaging
for parsing metadata guarantees support for the latest metadata
versions.

warehouse, the Python package index implementation used by PyPI, also
uses packaging for parsing metadata. This guarantees that metadata
parsing is the same on the client and server side, for the most
prominent index.

* Enable some more mypy checks

* Move monkeypatching of metadata 2.0 support to a more proper place

It was done in the support code for the wheel file format but it
affects metadata loading from all supported distribution types. Move
it to generic code.

* Accommodate for invalid metadata produced by setuptools

See pypa/setuptools#4759.
akrabat added a commit to akrabat/rst2pdf that referenced this issue Dec 24, 2024
Set license-files to an empty array to work around issues releasing to
PyPI. See:

- astral-sh/uv#9513
- pypa/setuptools#4759
akrabat added a commit to akrabat/rst2pdf that referenced this issue Dec 24, 2024
Set license-files to an empty array to work around issues releasing to
PyPI. See:

- astral-sh/uv#9513
- pypa/setuptools#4759
akrabat added a commit to rst2pdf/rst2pdf that referenced this issue Dec 24, 2024
@abravalheri
Copy link
Contributor

abravalheri commented Jan 8, 2025

At least I have a lot of company (IIUC ~every setuptools wheel in the last few years? Looks like a supermajority in a venv of 800 packages I have on hand). I guess off-topic for this tracker, but if it's so widespread maybe there's a case packaging.metadata should special case this one thing...

Yeah, there is a growing pain that we discovered in the community once PEP 639 was finally accepted... Probably because of the very long time the community took to finalise it, and the way the core metadata version follows a strictly monotonic model.

An earlier version of license-files was implemented a very long while ago because the community had some appetite for that (I guess... I was not involved at that point in time).

Since all the tools available were lenient in validating that field and happily accepted it, it kind of became a "de facto" standard, with widespread usage.

I think it would be an error to start strictly validating old versions of metadata regarding license-files as it would cause too much havoc.

In my opinion, the best way forward is to be backward compatible with the existing tools behaviour, and be lenient when validating license-files for metadata-version < 2.4 (it would be OK to emit warnings).

It should be not too difficult to implement as one can simply delete the license-files key of the Metadata before running the validation.

The implementation plan for PEP 639 in setuptools, was previously mentioned in #4759 (comment). It is:

  • Implement and release Dynamic field, bump metadata version to 2.2
  • Bump metadata version to 2.3 and release using an up-to-date version of packaging (there is an open issue Inconsistent extra normalisation in Requirement.__str__ packaging#845 that we would need to have fixed before we release if we want to be strict about it. OR we can release it anyway and consider it to be a well know packaging bug).
  • Implement the final version of PEP 639 and release.

I believe that keeping backwards compatibility is fundamental in the ecosystem because:

  1. There are many deployments that pin old versions of dependencies
  2. There are many wheels out there released in the last half-decade which contain such invalid metadata.
  3. People maintaining bug fixes in packages that need old/removed setuptools functionality would suffer.

@pfmoore
Copy link
Member

pfmoore commented Jan 8, 2025

Yeah, there is a growing pain that we discovered in the community once PEP 639 was finally accepted... Probably because of the very long time the community took to finalise it, and the way the core metadata version follows a strictly monotonic model.

The strictly monotonic approach could be revisited. It certainly has its downsides. But as usual, someone would need to put together a proposal, write a PEP, and push it through to approval. I'm not sure if there's the community bandwidth for that - apart from this situation, there haven't been that many problems with the current scheme.

An earlier version of license-files was implemented a very long while ago because the community had some appetite for that (I guess... I was not involved at that point in time).

I wasn't involved in that decision, either, but I believe it was ill-advised. Even if the situation with license information was a problem, adding fields not defined in the spec is not allowed. The core metadata spec says that it should be considered "complete" - which I view as meaning "only these fields are allowed". I'd be happy to have the wording clarified if people don't think it's sufficiently obvious that this is the case.

But debating the history isn't productive. Setuptools produced bad data in the past, and this is something we need to deal with now.

One thing I'm not clear on - is setuptools still producing bad metadata at the moment, and if so, do you intend to fix that before starting on the plan to migrate to metadata 2.4? Because I'd be less inclined to support loosening the validation if it means that setuptools will continue to publish bad data...

I think it would be an error to start strictly validating old versions of metadata regarding license-file as it would cause too much havoc.

I think this is a bit strong. I think we probably have to allow invalid license-files metadata, just because not doing so would cause too much disruption. But I don't think it's an "error" to do so. It's not possible to change every place where tools validate metadata. Nor is it possible to do anything about tools that ignore license-file if the metadata version is below 2.4. The best we can hope for is to ask commonly-used validation libraries (packaging, and probably uv as well) to add an exception for this case.

There's no good answer here, unfortunately.

I believe that keeping backwards compatibility is fundamental in the ecosystem because:

Absolutely. But part of the approach to backward compatibility is to prohibit arbitrary fields being added. Unfortunately bugs happen, and I agree that we have to handle them, but we shouldn't make life impossible for ourselves by assuming that we can't rely on the rules we set.

I'm not 100% sure what the actual next step is here. Medium term, setuptools will move to metadata 2.4. But in the short term, I see various possible actions:

  1. Setuptools stops emitting License-File for metadata < 2.4. This would presumably be a loss of functionality for users relying on the existing behaviour, and while I'm fine with that, I imagine the setuptools maintainers aren't...
  2. Someone tries to persuade packaging (and maybe uv, if they validate) to loosen the validation to allow License-File for medatata < 2.4. That feels to me like a permanent fix for a temporary problem, but I can't think of anything better.
  3. Do we need twine and/or PyPI to allow License-File on metadata < 2.4, or is that already happening (or covered by packaging allowing it)?

Did I miss any?

@abravalheri
Copy link
Contributor

I think this is a bit strong. I think we probably have to allow invalid license-files metadata, just because not doing so would cause too much disruption. But I don't think it's an "error" to do so. It's not possible to change every place where tools validate metadata. Nor is it possible to do anything about tools that ignore license-file if the metadata version is below 2.4. The best we can hope for is to ask commonly-used validation libraries (packaging, and probably uv as well) to add an exception for this case.

You are correct, apologies for the bad choice of words.

I'm not 100% sure what the actual next step is here. Medium term, setuptools will move to metadata 2.4. But in the short term, I see various possible actions:

  1. Setuptools stops emitting License-File for metadata < 2.4. This would presumably be a loss of functionality for users relying on the existing behaviour, and while I'm fine with that, I imagine the setuptools maintainers aren't..

Yesterday I released 7.5.8, which bumped the metadata version to 2.2, so I don't think we are that much far now (as long as we compromise and oversee the inconsistent extra normalisation in packaging), so my preference would be to continue working to catch up with metadata 2.4.

This preference is motivated because License-File does not simply affect code, but also may have other implications (e.g. consider automation tools that scan metadata for legal compliance). But I am not an specialist in legal subjects, so that preference is very subjective.

Do we need twine and/or PyPI to allow License-File on metadata < 2.4, or is that already happening (or covered by packaging allowing it)?

Currently, I believe PyPI, twine and pip gracefully ignore License-File in that scenario.

@pfmoore
Copy link
Member

pfmoore commented Jan 9, 2025

Yesterday I released 7.5.8, which bumped the metadata version to 2.2, so I don't think we are that much far now (as long as we compromise and oversee the pypa/packaging#845), so my preference would be to continue working to catch up with metadata 2.4.

OK. It may be worth checking back on this if there are unforeseen delays.

But I am not an specialist in legal subjects, so that preference is very subjective.

I'm not a specialist either. My concern was that tools could ignore License-File if the metadata version is < 2.4, resulting in silent errors. But we simply don't know.

Currently, I believe PyPI, twine and pip gracefully ignore License-File in that scenario.

The OP's issue was with packaging. Maybe the answer is simply to point out that packaging.metadata.Metadata validates by design, and if you want to parse potentially-invalid metadata (which this is) then you need to use packaging.metadata.parse_email and handle the invalid cases yourself. If you want it to be treated as valid, that would need to be raised with the packaging project - and I'd have some sympathy with them if they said that they wanted the standard changed before they changed their validation.

If we do need a spec change, I'd propose that it could be something like the following:

License-File (multiple use)

New in version 2.4.

Each entry is a string representation of the path of a license-related file. The path is located within the project source tree, relative to the project root directory. For details see PEP 639.

Note: Due to a setuptools bug, this field may be present in metadata versions before 2.4. Tools should not reject the metadata in this case, but they are allowed to ignore the field unless the metadata version is 2.4 or greater.

This would need to be brought up for discussion on Discourse. As it affects software behaviour, it may require a PEP (although if the community agrees, it could be approved as a text-only change).

@abravalheri
Copy link
Contributor

abravalheri commented Jan 9, 2025

OK. It may be worth checking back on this if there are unforeseen delays.

I agree. I want to bump the version step by step and give a couple of days between them so that we can receive feedback, but we can revisit this.

The OP's issue was with packaging. Maybe the answer is simply to point out that packaging.metadata.Metadata validates by design, and if you want to parse potentially-invalid metadata (which this is) then you need to use packaging.metadata.parse_email and handle the invalid cases yourself. If you want it to be treated as valid, that would need to be raised with the packaging project - and I'd have some sympathy with them if they said that they wanted the standard changed before they changed their validation.

I agree, the design in packaging already allows for a more lenient parsing, but it does need the developers to opt-in. Maybe we should include an example in the docs? It should not be super complicated to do, this is a rough example:

import warnings

from packaging.metadata import Metadata, parse_email


example = """\
Metadata-Version: 2.1
Name: hello-world
Version: 0.42
License-File: MIT.txt
License-File: BSD.txt
"""

raw, unparsed = parse_email(example)
if unparsed:
    raise ValueError(f"Invalid metadata fields: {unparsed!r}")

if raw.get("metadata_version", "0") < "2.4" and raw.pop("license_files", None):
    warnings.warn("License-File is not supported for Metadata-Version < 2.4")

print(Metadata.from_raw(raw))

Note: Due to a setuptools bug, this field may be present in metadata versions before 2.4. Tools should not reject the metadata in this case, but they are allowed to ignore the field unless the metadata version is 2.4 or greater.

I would like to point out that this is not an exclusivity of the current and old versions of setuptools. If we consider the whole spectrum of already existing and published wheels, old versions of other build tools also present a similar behaviour, for example: https://inspector.pypi.io/project/hatch/1.10.0/packages/24/cc/d4ff74c07e7aa12525aabe96dcb3e78068483f17423ef610894808aca9b0/hatch-1.10.0-py3-none-any.whl/hatch-1.10.0.dist-info/METADATA#line.12

@pfmoore
Copy link
Member

pfmoore commented Jan 9, 2025

It should not be super complicated to do, this is a rough example

Yeah, technically you should convert Metadata-Version to a Version and do a proper version comparison. Depends on how much you care, which is sort of the point after all 🙂

I would like to point out that this is not an exclusivity of the current and old versions of setuptools.

Hmm, has anyone flagged this up to hatch? I assumed it was purely a setuptools issue - my apologies. I do want to be clear it's a bug, though, so would you be OK with "Due to a bug in some build backends..."? If you're amenable, I can see advantages to adding "(including setuptools)" as the number of packages that use setuptools is what makes this such a significant problem, but I'd understand if you didn't want to see setuptools called out explicitly like that.

@abravalheri
Copy link
Contributor

Hmm, has anyone flagged this up to hatch?

In terms of actionable items, hatch probably does not have much to do, as I imagine they have already updated to 2.4 after the PEP was approved. There is almost nothing that can be done for the already existing/published wheels and/or projects that pin an old version of the build-backend in pyproject.toml.

Setuptools is lagging behind in this aspect because it was complicated to implement version 2.2 of the metadata spec.

@cdce8p
Copy link
Contributor

cdce8p commented Jan 9, 2025

I wasn't involved in that decision, either, but I believe it was ill-advised. Even if the situation with license information was a problem, adding fields not defined in the spec is not allowed.

I originally added License-File to the metadata in #2645 some years ago. At the time I didn't know that extra keys aren't allowed and publishing tools just ignored unspecified fields. So we didn't / or at least I didn't saw any reports of it breaking something.

If all had gone to plan, the PEP would have been finished shortly there after and the implementation finalized so nobody would have cared but alas.

Someone tries to persuade packaging (and maybe uv, if they validate) to loosen the validation to allow License-File for medatata < 2.4. That feels to me like a permanent fix for a temporary problem, but I can't think of anything better.
[...] Do we need twine and/or PyPI to allow License-File on metadata < 2.4, or is that already happening (or covered by packaging allowing it)?

Yes, validation the current metadata with packaging would fail. That's unlikely to be an actual issue for end users though. It has been this way likely since the change was added here and there weren't any reports.

The exception here is the latest twine change. As they rewrote which fields they submit to PyPI (basically all, instead of only a whitelist) this was any issue. Therefore a workaround was added, basically dropping the License-File metadata if the metadata version is less then 2.4.
https://github.com/pypa/twine/blob/2386ca5300cd7bde59432834d362c07de61e9a53/twine/package.py#L225-L234

Maybe uv is affected by this as well. Haven't looked into it yet.

If we do need a spec change, I'd propose that it could be something like the following:

License-File (multiple use)

New in version 2.4.
Each entry is a string representation of the path of a license-related file. The path is located within the project source tree, relative to the project root directory. For details see PEP 639.

Note: Due to a setuptools bug, this field may be present in metadata versions before 2.4. Tools should not reject the metadata in this case, but they are allowed to ignore the field unless the metadata version is 2.4 or greater.

This sounds reasonable to me.

Setuptools stops emitting License-File for metadata < 2.4. This would presumably be a loss of functionality for users relying on the existing behaviour, and while I'm fine with that, I imagine the setuptools maintainers aren't...

The current License-File metadata is broken anyway as the license files are stored in .dist-info and licenses subdirectory. Additionally the folder structure isn't kept resulting in potential file overwrites. Overall I'd be surprised if anyone relied on it. The initial implementation was better then nothing but not much more.

We could simply remove the field, although as tools already need to handle it, I don't see much reason for it. A better approach IMO would be to move forward towards full PEP 639 support. A first step could be #4728 which would at least make the License-File field and the file location spec compliant. This could potentially be adopted now and wouldn't need to wait for anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Needs Triage Issues that need to be evaluated for severity and status.
Projects
None yet
Development

No branches or pull requests

5 participants