-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize for qualifiers #121
Comments
According to the PURL test suite, the correct behavior is that |
@matt-phylum so are you saying that the library intentionally doesn't adhere to the package-url/purl-spec standard or am I missing something? Because it says in the Rules for each component section that
I want PURLs produced by this library to be processed by some other tooling, so adhering to some standard for data exchange is quite important to me, and it seems like percent encoded qualifiers are the more standard approach. Either way, if I created a merge request that would just add an optional argument to |
Looking into it, the URL encoding isn't specific only for qualifiers section, there's an open PR #123 that is trying to encode |
URL encoding, aka percent encoding, really only describes how to decode a string. It specifies how to encode individual characters, but it doesn't specify which characters are supposed to be encoded. The PURL spec specifies which characters are to be encoded, and However, the PURL spec is probably not explicit enough. It seems like most readers interpret "percent-encoded" to mean some variant of x-www-formurlencoded, which is never mentioned once in the PURL spec. This results in incorrect canonicalization, and, more critically, it sometimes results in incorrect handling of spaces and plus signs. Besides the section on character encoding, which could be much more explicit, the PURL spec has a test suite which contains the expected canonicalizations for some of the cases. This is one of the cases which is covered. {
"description": "MLflow model tracked in Azure Databricks (case insensitive)",
"purl": "pkg:mlflow/CreditFraud@3?repository_url=https://adb-5245952564735461.0.azuredatabricks.net/api/2.0/mlflow",
"canonical_purl": "pkg:mlflow/creditfraud@3?repository_url=https://adb-5245952564735461.0.azuredatabricks.net/api/2.0/mlflow",
"type": "mlflow",
"namespace": null,
"name": "creditfraud",
"version": "3",
"qualifiers": {"repository_url": "https://adb-5245952564735461.0.azuredatabricks.net/api/2.0/mlflow"},
"subpath": null,
"is_invalid": false
} |
You're right, thanks a lot for detailed clarification! I assumed the percent encoding I know from web forms is some sort of a standard. |
Hello.
As the example of purl-spec, urls in qualifiers should be normalized as below:
But in packageurl-python, when I set qualifiers to
{"repository_url": "repo.spring.io/release"}
, and the to_string will return as below:If I set qualifiers to
{"repository_url": "repo.spring.io%2Frelease"}
, the to_string will return as below:Which means the letter
%
is normalized to%25
.Is that a bug?
The text was updated successfully, but these errors were encountered: