Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode path parameters and encode URLRouter paths. #35

Conversation

vnayar
Copy link
Contributor

@vnayar vnayar commented Mar 24, 2024

There are two changes in this MR that work together:

  • Remove logic to abort routing if a path segment contains a path separator, e.g. '%2F'.
  • URL path parameters and decoded before being passed to route handlers.
  • To still permit matching URL routes that are not URL encoded, e.g. '/foo bar', encode these paths before adding them to route matching.

See: vibe-d/vibe-core#396
See: #34

There are two changes in this MR that work together:
* Remove logic to abort routing if a path segment contains a path separator, e.g. '%2F'.
* URL path parameters and decoded before being passed to route handlers.
* To still permit matching URL routes that are not URL encoded, e.g. '/foo bar', encode
  these paths before adding them to route matching.

See: vibe-d#34
@s-ludwig
Copy link
Member

Does this still work for matching GET /tes%74 against get("/test")? I'm not completely sure from the context, but it seems like this changes the semantics outside of the slash issue somewhat, which could cause subtle breakage in existing projects.

When going this route, as opposed to the replacement character approach, AFAICS we'd have to insert a kind of encoding normalization step for requestPath (i.e. something like path = InetPath(req.requestPath.bySegment.map!(s => InetPath.Segment(s.toString()))).toString()) - the problem being that this normalization must not be done unless it's actually necessary, in order to avoid additional memory allocations.

@vnayar
Copy link
Contributor Author

vnayar commented Mar 26, 2024

When going this route, as opposed to the replacement character approach, AFAICS we'd have to insert a kind of encoding normalization step for requestPath (i.e. something like path = InetPath(req.requestPath.bySegment.map!(s => InetPath.Segment(s.toString()))).toString()) - the problem being that this normalization must not be done unless it's actually necessary, in order to avoid additional memory allocations.

Just for clarification, inserting routes is a 1-time operation, thus any memory allocation done during the normalization of a route would only happen a single time but not during request handling, which happens in much higher volume throughout the lifetime of the program. As I dig in further, should avoiding memory allocation when adding a route be considered a priority or is it a 1-time small cost that can be safely ignored?

@vnayar
Copy link
Contributor Author

vnayar commented Mar 27, 2024

I added some changes that, without memory allocation, decode individual characters that are encoded and refer to a non-reserved character. Essentially both routing paths and request paths are converted into the same form, URL-encoded with only reserved-characters being encoded.

@s-ludwig
Copy link
Member

This is a nice approach and the allocations during route adding can indeed be ignored in terms of performance (rebuilding the match graph will be the far bigger issue, in case somebody actually dynamically adds routes while the server is running).

Now the only issues appears to be to get the semantics fully nailed down:

  1. The paths that are passed to match() - are they supposed to be given in Posix or Inet format? The former wouldn't allow to specify a route like /foo%2Fbar/, so, while we are at it, it might make more sense to assume Inet format here and use something similar to the expression given in Decode path parameters and encode URLRouter paths. #35 (comment) to normalize
  2. The internal representation of reserved characters must match exactly between the code that normalizes in match() and the logic in nextMatchChar (which is not the case right now, e.g. for "="). Maybe we should only encode "/" in the internal representation and pass all other chars through unencoded. Those other reserved characters aren't really relevant in this context and this would be the simplest way to ensure a correct match, plus result in the most efficient representation.

@vnayar
Copy link
Contributor Author

vnayar commented Apr 9, 2024

  1. The internal representation of reserved characters must match exactly between the code that normalizes in match() and the logic in nextMatchChar (which is not the case right now, e.g. for "="). Maybe we should only encode "/" in the internal representation and pass all other chars through unencoded. Those other reserved characters aren't really relevant in this context and this would be the simplest way to ensure a correct match, plus result in the most efficient representation.

To clarify, = is a reserved character, thus, it would not be considered a valid character in a URL path. Currently the code looks like this in match(...):

		// Perform URL-encoding on the path before adding it as a route.
		string iPath = (cast(InetPath) PosixPath(path)).toString();
		m_routes.addTerminal(iPath, Route(method, iPath, handlerDelegate(handler)));

An = character that is passed in would get URL encoded as %3D, and the code in handleRequest would match it if the incoming request also contains a %3D, but would not match it if the incoming request contains = directly, which is the expected behavior, because this would not be a valid URL.

  1. The paths that are passed to match() - are they supposed to be given in Posix or Inet format? The former wouldn't allow to specify a route like /foo%2Fbar/, so, while we are at it, it might make more sense to assume Inet format here and use something similar to the expression given in Decode path parameters and encode URLRouter paths. #35 (comment) to normalize

The current code in match uses the code above, namely (cast(InetPath) PosixPath(path)).toString();. The current code does indeed accept paths such as "/hawaii toast", which is expected to match incoming requests like "/hawaii%20toast". If I were to change this method to only accept Inet format, that would be a backwards incompatible change, no?

@s-ludwig
Copy link
Member

  1. The internal representation of reserved characters must match exactly between the code that normalizes in match() and the logic in nextMatchChar (which is not the case right now, e.g. for "="). Maybe we should only encode "/" in the internal representation and pass all other chars through unencoded. Those other reserved characters aren't really relevant in this context and this would be the simplest way to ensure a correct match, plus result in the most efficient representation.

To clarify, = is a reserved character, thus, it would not be considered a valid character in a URL path. Currently the code looks like this in match(...):

		// Perform URL-encoding on the path before adding it as a route.
		string iPath = (cast(InetPath) PosixPath(path)).toString();
		m_routes.addTerminal(iPath, Route(method, iPath, handlerDelegate(handler)));

An = character that is passed in would get URL encoded as %3D, and the code in handleRequest would match it if the incoming request also contains a %3D, but would not match it if the incoming request contains = directly, which is the expected behavior, because this would not be a valid URL.

You are right, I skimmed over the code a bit too quickly and mistook the list of special characters to be the reserved characters instead of the unreserved ones. It should be noted that since requestPath is an InetPath, it is already validated at this point and the client would have gotten a "bad request" reply before getting here, though, so this kind of validation is not strictly necessary here and reserved/disallowed characters other than '/' could still be represented unencoded.

  1. The paths that are passed to match() - are they supposed to be given in Posix or Inet format? The former wouldn't allow to specify a route like /foo%2Fbar/, so, while we are at it, it might make more sense to assume Inet format here and use something similar to the expression given in Decode path parameters and encode URLRouter paths. #35 (comment) to normalize

The current code in match uses the code above, namely (cast(InetPath) PosixPath(path)).toString();. The current code does indeed accept paths such as "/hawaii toast", which is expected to match incoming requests like "/hawaii%20toast". If I were to change this method to only accept Inet format, that would be a backwards incompatible change, no?

I meant the case of "/"->"%2F" specifically here, so with this transformation it is impossible to define a path segment with an interior slash.

@vnayar
Copy link
Contributor Author

vnayar commented Apr 18, 2024

The current code in match uses the code above, namely (cast(InetPath) PosixPath(path)).toString();. The current code does indeed accept paths such as "/hawaii toast", which is expected to match incoming requests like "/hawaii%20toast". If I were to change this method to only accept Inet format, that would be a backwards incompatible change, no?

I meant the case of "/"->"%2F" specifically here, so with this transformation it is impossible to define a path segment with an interior slash.

I got a bit of time today to take another look. Indeed, if the interface allows match to be called with a PosixPath, then you can enter things like "/hawaii toast", however, it does make entering things like "/hawaii%2Ftoast" no longer possible, because there is no way to know if the user meant an escapted '%' then a '2' then a 'F' or if they meant a URL escaped '%2F', i.e. '/'.

The current code assumes Posix paths and has this problem. The original unmodified code for adding a path is m_routes.addTerminal(path, Route(method, path, handlerDelegate(handler)));, which simply adds a path character-by-character which will be used for matching (which leads to the original bug).

It can be made more consistent by changing the interface to always assume that mach will be given an Inet path string, but this will break compatibility with anyone who was doing things like having spaces in the path given to match and expecting the callers to know to escape it as '%20' when making HTTP requests. Is that acceptable to change?

@vnayar
Copy link
Contributor Author

vnayar commented Apr 23, 2024

I added changes to change the router path to be InetPath format. This has the following impacts:

  • Previously accepted paths with unescaped characters, like "/bob cat" instead of "/bob%20cat" will no longer be accepted.
  • URL-escaped characters can now be used in router paths, e.g. "/f%2Fg", and matched by requests.
    • In master this was possible because no escaping was performed on the path given to the router.
    • In the prior commit to this one, the path was converted from Posix to Inet, meaning that '%' itself was being escaped as '%25', thus '%2F' became '%252F'.

Now, the code can route and match paths with URL-escaped characters, and include path parameters that have such characters as well.

While looking through the code, I had the following observation:

  • URL encoding is not unique, e.g. %2F and %2f both encode to /.
    Therefore, matching based on URL-escaped forms require extra checks for case.
  • During route matching, no conversions or extra computational logic should be performed.
    Matching done on URL-decoded paths, would require decoding the URL path, an extra computation.
  • Therefore, one MUST perform an extra computation during route matching, either:
    1. Match capital/lower cases of URL-escaped character, e.g. %2F or %2f.
    2. OR perform URL-decoding of characters in the request path and compare them to the already decoded route path.

The code I have here goes down the pathway of matching URL-encoded paths, thus this requires making sure URL-encoding is applied consistently, e.g. only URL-encoding reserved characters. I believe this path achieves the end-goal with as few changes as possible.

The alternative pathway would be to only accept InetPath format paths in the router, and decode them when adding them to the match-graph. But this would require decoding the request paths as well. Because a decoded path may contain additional / characters, the match graph algorithm would also have to be changed to match segment at a time, rather than character at a time, otherwise, it would have no way to tell apart a decoded / inside a path segment vs. a / which was the original segment delimiter. This pathway could potentially lead to better performance, because segment-based matching could take advantage of SIMD CPU instructions, using either string comparison for normal segments or a wildcard match for path-variable segments.

@vnayar
Copy link
Contributor Author

vnayar commented Apr 23, 2024

Doing some local testing, another issue was discovered. There mere existence of routes added by calling serveStaticFiles causes an error during route matching whenever an incoming GET request arrives containing a URL-encoded / as %2F.

A snippet of the relevant stack trace:

[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/fileserver.d:58 @safe void vibe.http.fileserver.serveStaticFiles(vibe.core.path.GenericPath!(vibe.core.path.PosixPathFormat).GenericPath, vibe.http.fileserver.HTTPFileServerSettings).callback(scope vibe.http.server.HTTPServerRequest, vibe.http.server.HTTPServerResponse) [0x5f5597a3f127]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:239 @safe bool vibe.http.router.URLRouter.handleRequest(vibe.http.server.HTTPServerRequest, vibe.http.server.HTTPServerResponse).__lambda6!(ulong, immutable(char)[][]).__lambda6(ulong, scope immutable(char)[][]) [0x5f55979b47ae]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:764 const @safe bool vibe.http.router.MatchTree!(vibe.http.router.Route).MatchTree.doMatch(immutable(char)[], scope bool delegate(ulong, scope immutable(char)[][]) @safe) [0x5f55979b7104]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:697 @safe bool vibe.http.router.MatchTree!(vibe.http.router.Route).MatchTree.match(immutable(char)[], scope bool delegate(ulong, scope immutable(char)[][]) @safe) [0x5f55979b6947]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:227 @safe void vibe.http.router.URLRouter.handleRequest(vibe.http.server.HTTPServerRequest, vibe.http.server.HTTPServerResponse) [0x5f55979b4440]

@vnayar
Copy link
Contributor Author

vnayar commented Apr 23, 2024

Doing some local testing, another issue was discovered. There mere existence of routes added by calling serveStaticFiles causes an error during route matching whenever an incoming GET request arrives containing a URL-encoded / as %2F.

A snippet of the relevant stack trace:

[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/fileserver.d:58 @safe void vibe.http.fileserver.serveStaticFiles(vibe.core.path.GenericPath!(vibe.core.path.PosixPathFormat).GenericPath, vibe.http.fileserver.HTTPFileServerSettings).callback(scope vibe.http.server.HTTPServerRequest, vibe.http.server.HTTPServerResponse) [0x5f5597a3f127]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:239 @safe bool vibe.http.router.URLRouter.handleRequest(vibe.http.server.HTTPServerRequest, vibe.http.server.HTTPServerResponse).__lambda6!(ulong, immutable(char)[][]).__lambda6(ulong, scope immutable(char)[][]) [0x5f55979b47ae]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:764 const @safe bool vibe.http.router.MatchTree!(vibe.http.router.Route).MatchTree.doMatch(immutable(char)[], scope bool delegate(ulong, scope immutable(char)[][]) @safe) [0x5f55979b7104]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:697 @safe bool vibe.http.router.MatchTree!(vibe.http.router.Route).MatchTree.match(immutable(char)[], scope bool delegate(ulong, scope immutable(char)[][]) @safe) [0x5f55979b6947]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:227 @safe void vibe.http.router.URLRouter.handleRequest(vibe.http.server.HTTPServerRequest, vibe.http.server.HTTPServerResponse) [0x5f55979b4440]

Seems to be a red-herring. In my local setup, the serveStaticFiles path was set to "*", and for some reason, that code was being called despite a different path also matching. I had assumed that router matches were ordered, and that earlier routes would take priority over later routes, but maybe that's not the case. Removing the overlap of responsible URL paths makes the problem go away.

@s-ludwig
Copy link
Member

I had assumed that router matches were ordered, and that earlier routes would take priority over later routes, but maybe that's not the case.

This is indeed the case, but requests will fall-through to the next matching route if no a response is written and no exception is thrown.

Previously accepted paths with unescaped characters, like "/bob cat" instead of "/bob%20cat" will no longer be accepted.

We can't afford to make any changes to behavior that was previously accepted, as that would mean a high risk of breaking production code. But I think we are actually fine here, except for the InetPath(path).toString() part of the route registration, where we'd need something like this instead:

PosixPath(path)
  .bySegment
  .map!(s => InetPath.Segment(urlDecode(s.toString())))
  .InetPath
  .toString

That would URL decode each segment of the incoming string path as if it were an Inet path segment, but without checking for reserved characters. Constructing an InetPath.Segment from the result will then re-encode properly before converting to the finally registered string.

@vnayar vnayar force-pushed the vibe-http-34_decode-path-params-encode-router-paths branch from 37f8fbd to 886fffa Compare April 23, 2024 20:04
@vnayar
Copy link
Contributor Author

vnayar commented Apr 23, 2024

PosixPath(path)
  .bySegment
  .map!(s => InetPath.Segment(urlDecode(s.toString())))
  .InetPath
  .toString

Clever solution, I was worried that things had worked themselves into a corner. The solution worked with only minor modifications. The Segment separator had to be preserved in order to distinguish paths like /a/:test from /a/:test/.

Copy link
Member

@s-ludwig s-ludwig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch with the trailing slash. Looks good to merge now, as far as I can see!

@s-ludwig s-ludwig merged commit fcebb30 into vibe-d:master Apr 24, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants