Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic host validation in Uri and add port validation to the Uri constructor #196

Draft
wants to merge 2 commits into
base: 3.4.x
Choose a base branch
from

Conversation

Xerkus
Copy link
Member

@Xerkus Xerkus commented Sep 11, 2024

Q A
Documentation no
Bugfix yes
BC Break yes/no
New Feature no
RFC yes
QA no

Description

This change introduces validation for the uri host as parse_url() does not validate host subcomponent permitting outright forbidden characters to be used.

This PR introduces basic host subcomponent validation that would reject forbidden characters, validate IPv6 address and wrap it the square brackets.
This validation does not ensure hostname is actually valid. For example, URI allows various sub-delimeters and percent encoding which would be invalid in the context of the http urls. This validation does not prevent them and it does not validate percent encoding is actually valid.

See RFC3986 3.2.2

Port validation was added to the Uri constructor as parse_url() would allow port 0 which is a wildcard port not valid for the Uri.

@Xerkus Xerkus added Bug Something isn't working Unit Test Needed RFC labels Sep 11, 2024
@Xerkus Xerkus added this to the 3.4.0 milestone Sep 11, 2024
@Xerkus Xerkus linked an issue Sep 11, 2024 that may be closed by this pull request
Signed-off-by: Aleksei Khudiakov <[email protected]>
@Xerkus Xerkus modified the milestones: 3.4.0, 3.5.0 Sep 11, 2024
@Xerkus
Copy link
Member Author

Xerkus commented Sep 11, 2024

@laminas/technical-steering-committee this one is tricky. Needs feedback.

Interop integration tests do not establish how any of this should be treated. Other implementations allow any crap to be set as host and as a result produce junk url.

php > var_export(parse_url('http://(:1'));
array (
  'scheme' => 'http',
  'host' => '(',
  'port' => 1,
)
php > var_export(parse_url('http://,:1'));
array (
  'scheme' => 'http',
  'host' => ',',
  'port' => 1,
)
php > var_export(parse_url('http://[:1'));
array (
  'scheme' => 'http',
  'host' => '[',
  'port' => 1,
)
php > var_export(parse_url('http://]:1'));
array (
  'scheme' => 'http',
  'host' => ']',
  'port' => 1,
)
php > var_export(parse_url('http://][:1'));
array (
  'scheme' => 'http',
  'host' => '][',
  'port' => 1,
)
php > var_export(parse_url('http://;:1'));
array (
  'scheme' => 'http',
  'host' => ';',
  'port' => 1,
)
php > var_export(parse_url('http://::1'));
array (
  'scheme' => 'http',
  'host' => ':',
  'port' => 1,
)
php > var_export(parse_url('http:// :1'));
array (
  'scheme' => 'http',
  'host' => ' ',
  'port' => 1,
)
php > var_export(parse_url('http:// '));
array (
  'scheme' => 'http',
  'host' => ' ',
)

url_parse handles malformed IPv6 that is not enclosed in brackets but the last segment is treated as a port.

Malformed: https://user:pass@fe80::200:5aee:feaa:20a2:3001/foo?bar=baz#quz
Valid: https://user:pass@[fe80::200:5aee:feaa:20a2]:3001/foo?bar=baz#quz

Since this malformed format is accepted I feel we should allow to set IPv6 address as a host using withHost() but enclose it in brackets ourselves.
Alternative would be to reject IPv6 address that is not enclosed with brackets in all cases. It will be safer and frankly the proper approach but it also can be considered a breaking change.

For the remaining filtering introduced in this PR it only explicitly forbids general delimeters but allows anything else to run rampant.

I believe validation needs to be clamped down to only allow IPv6, IPv4 and valid for DNS hostnames. Probably pass it through validate domain filter. Any other URI variants are not suitable for http message context. Something along those lines:

filter_var(idn_to_ascii($host), FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME)

@Xerkus Xerkus marked this pull request as draft September 11, 2024 22:34
@gsteel gsteel removed this from the 3.5.0 milestone Oct 14, 2024
@nyamsprod
Copy link
Contributor

AFAIK the following

var_export(parse_url('http:// '));

should fail already because RFC3986 does not allow spaces in the URI the test should then be on

var_export(parse_url('http://%20'));

But perhaps this is another issue which is not related to the current ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working RFC Unit Test Needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Uri::__toString() can yield malformed URIs
3 participants