-
-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-ASCII hostnames are not validated and so allow "@" and other reserved gen-delims characters #955
Comments
Nope, that's not correct. Instead, we should perhaps just reject IDNA 2003 support? I'm not versed in IDNA RFCs so it'll take me some time to figure this out. @asvetlov: can you weigh in on this? Why do we need to be able to encode host strings using IDNA 2003 if IDNA 2008 encoding fails? |
I have now found and read the Further Notes section of the Python
If this is the reason, then we'll need to explicitly handle non-compiant characters in the host separately. Alternatively, we could decide to reject non-IDNA 2008 hostnames. |
Oh boi. We should replace that with something adequate. It's a resource praising the r*ssian colonialism and related imperialist propaganda 🤮. I opened it just to make sure and immediately got a panic attack that will take days to recover from :( |
These problems are probably why requests appear to have just rejected it: Although, as that issue states, several TLDs choose not to enforce IDNA 2008 for their domain registries (and anybody could used them for subdomains), so it's not exactly forbidden globally either. |
This change doesn't edit the test semantics, nor does it make any functional changes. It removes references to the ruscist culture that are as triggering to a lot of people as using "slave/master", "blacklist/whitelist" terminology. In particular, this gets rid of a link to the website that exists to justify ethnic cleansing in Crimea and genocides in Ukraine, being sponsored by the government of the terrorist state of muscovy. Ref #955 (comment)
Describe the bug
For #880 we fixed handling of ASCII hostnames, rejecting hostnames that contain characters or sequences that are explicitly excluded (see RFC3986, section 3.2.2, the
reg-name
grammar rule.When implementing the PR for this I had tested
idna.encode(host, uts46=True)
and verified that it correctly rejects hostnames that use ASCII characters outside of thereg-name
rule, not realising that_idna_encode()
catches the exception raised for this and then falls back tohost.encode('idna')
, which doesn't reject such hostnames.The exception handling is there to allow for IDNA 2003 / 2008 compatibility (see #152). I suspect that we need to use
idna.encode(host)
instead ofhost.encode('idna')
here to ensure that invalid characters are still rejected.Because there is no validation it is trivial to create invalid URLs, e.g. by passing in a non-ascii authority value to
host
plus a user or password value.To Reproduce
Expected behavior
The
user:pass@историк.рф
host value contains invalid characters,:
and@
and so aValueError
should have been raised.The text was updated successfully, but these errors were encountered: