Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creation PackageURL objects for invalid PURLs / Possible improper encoding of colons (":") in PURL fields #152

Open
njv299 opened this issue Mar 27, 2024 · 2 comments

Comments

@njv299
Copy link

njv299 commented Mar 27, 2024

It is possible to create PackageURL objects that contain invalid fields, specifically by using the PackageURL kwarg constructor and passing in values that contain colons.

Simple example:

>>> from packageurl import PackageURL
>>> p = PackageURL(type="generic", name="Foo: <Bar>", version="1.2.3")
>>> p.to_string()
'pkg:generic/Foo:%20%3CBar%[email protected]'
>>> PackageURL.from_string(p.to_string())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/vossn/finitestate/finite-state-sip/venv/lib/python3.10/site-packages/packageurl/__init__.py", line 514, in from_string
    raise ValueError(msg)
ValueError: Invalid purl 'pkg:generic/Foo:%20%3CBar%[email protected]' cannot contain a "user:pass@host:port" URL Authority component: ''.

On closer inspection, it looks like the problem might be that colons (:) are not being percent-encoded correctly? I would expect the colon in the name to be encoded to %3A, but it looks like it is being left as a literal : in the to_string() function:

>>> p = PackageURL(type="generic", name="Foo: <Bar>", version="1.2.3")
>>> p
PackageURL(type='generic', namespace=None, name='Foo: <Bar>', version='1.2.3', qualifiers={}, subpath=None)
>>> p.to_string()
'pkg:generic/Foo:%20%3CBar%[email protected]'

I'm not sure I'm interpreting the PURL spec correctly with regards to the treatment of colon characters, but on the surface it sounds like any colon appearing within an individual field value should simply be percent-encoded if the field itself calls for it.

@njv299 njv299 changed the title PackageURL objects can represent invalid PURLs. Creation PackageURL objects for invalid PURLs / Possible improper encoding of colons (":") in PURL fields Mar 27, 2024
@itookyourboo
Copy link

Totally agree. Quote from PURL specification:

A name must be a percent-encoded string

@mprpic
Copy link

mprpic commented Jul 25, 2024

Colons are explicitly not encoded for an unknown reason:

https://github.com/package-url/packageurl-python/blob/f98abf0f3c295873e18f968ebd00138a02d63b25/src/packageurl/__init__.py#L71C40-L71C40

This line was added as part of commit d7be020, but there is no explanation as to why. This applies to all fields, not just name, fwiw.

@pombredanne as the author of this particular piece of the code, can you share why colons are treated special in this context? :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants