Normalize for qualifiers #121

CannedFish · 2023-06-06T08:28:31Z

Hello.

As the example of purl-spec, urls in qualifiers should be normalized as below:

pkg:maven/org.apache.xmlgraphics/[email protected]?repository_url=repo.spring.io%2Frelease

But in packageurl-python, when I set qualifiers to {"repository_url": "repo.spring.io/release"}, and the to_string will return as below:

pkg:maven/org.apache.xmlgraphics/[email protected]?repository_url=repo.spring.io/release

If I set qualifiers to {"repository_url": "repo.spring.io%2Frelease"}, the to_string will return as below:

pkg:maven/org.apache.xmlgraphics/[email protected]?repository_url=repo.spring.io%252Frelease

Which means the letter % is normalized to %25.

Is that a bug?

The text was updated successfully, but these errors were encountered:

matt-phylum · 2023-11-08T17:01:46Z

According to the PURL test suite, the correct behavior is that / in a qualifier value should not be escaped (packageurl-python is correct). I've noticed the examples in the package types spec are sometimes improperly canonicalized or even wrong.

tichavskym · 2024-02-27T11:37:26Z

@matt-phylum so are you saying that the library intentionally doesn't adhere to the package-url/purl-spec standard or am I missing something?

Because it says in the Rules for each component section that

qualifiers:

For each pair of key = value

A value must be a percent-encoded string

I want PURLs produced by this library to be processed by some other tooling, so adhering to some standard for data exchange is quite important to me, and it seems like percent encoded qualifiers are the more standard approach.

Either way, if I created a merge request that would just add an optional argument to to_string method that would encode those qualifiers, would you accept it?

tichavskym · 2024-02-27T11:50:39Z

Looking into it, the URL encoding isn't specific only for qualifiers section, there's an open PR #123 that is trying to encode / in name and version sections of PURLs, but according to the package-url/purl-spec, these sections should be URL encoded as well (meaning not only / should be encoded).

matt-phylum · 2024-02-27T14:11:37Z

URL encoding, aka percent encoding, really only describes how to decode a string. It specifies how to encode individual characters, but it doesn't specify which characters are supposed to be encoded. The PURL spec specifies which characters are to be encoded, and / is not one of them: https://github.com/package-url/purl-spec/blob/master/PURL-SPECIFICATION.rst#character-encoding

However, the PURL spec is probably not explicit enough. It seems like most readers interpret "percent-encoded" to mean some variant of x-www-formurlencoded, which is never mentioned once in the PURL spec. This results in incorrect canonicalization, and, more critically, it sometimes results in incorrect handling of spaces and plus signs.

Besides the section on character encoding, which could be much more explicit, the PURL spec has a test suite which contains the expected canonicalizations for some of the cases. This is one of the cases which is covered. / should not be escaped in qualifiers.

{
  "description": "MLflow model tracked in Azure Databricks (case insensitive)",
  "purl": "pkg:mlflow/CreditFraud@3?repository_url=https://adb-5245952564735461.0.azuredatabricks.net/api/2.0/mlflow",
  "canonical_purl": "pkg:mlflow/creditfraud@3?repository_url=https://adb-5245952564735461.0.azuredatabricks.net/api/2.0/mlflow",
  "type": "mlflow",
  "namespace": null,
  "name": "creditfraud",
  "version": "3",
  "qualifiers": {"repository_url": "https://adb-5245952564735461.0.azuredatabricks.net/api/2.0/mlflow"},
  "subpath": null,
  "is_invalid": false
}

https://github.com/package-url/purl-spec/blob/346589846130317464b677bc4eab30bf5040183a/test-suite-data.json#L506-L517

tichavskym · 2024-02-27T16:30:44Z

You're right, thanks a lot for detailed clarification! I assumed the percent encoding I know from web forms is some sort of a standard.

tichavskym mentioned this issue Feb 27, 2024

fix: escape / in names and versions #123

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize for qualifiers #121

Normalize for qualifiers #121

CannedFish commented Jun 6, 2023

matt-phylum commented Nov 8, 2023

tichavskym commented Feb 27, 2024

tichavskym commented Feb 27, 2024

matt-phylum commented Feb 27, 2024

tichavskym commented Feb 27, 2024

Normalize for qualifiers #121

Normalize for qualifiers #121

Comments

CannedFish commented Jun 6, 2023

matt-phylum commented Nov 8, 2023

tichavskym commented Feb 27, 2024

tichavskym commented Feb 27, 2024

matt-phylum commented Feb 27, 2024

tichavskym commented Feb 27, 2024