Decode path parameters and encode URLRouter paths. #35

vnayar · 2024-03-24T17:36:32Z

There are two changes in this MR that work together:

Remove logic to abort routing if a path segment contains a path separator, e.g. '%2F'.
URL path parameters and decoded before being passed to route handlers.
To still permit matching URL routes that are not URL encoded, e.g. '/foo bar', encode these paths before adding them to route matching.

There are two changes in this MR that work together: * Remove logic to abort routing if a path segment contains a path separator, e.g. '%2F'. * URL path parameters and decoded before being passed to route handlers. * To still permit matching URL routes that are not URL encoded, e.g. '/foo bar', encode these paths before adding them to route matching. See: vibe-d#34

s-ludwig · 2024-03-25T10:09:27Z

Does this still work for matching GET /tes%74 against get("/test")? I'm not completely sure from the context, but it seems like this changes the semantics outside of the slash issue somewhat, which could cause subtle breakage in existing projects.

When going this route, as opposed to the replacement character approach, AFAICS we'd have to insert a kind of encoding normalization step for requestPath (i.e. something like path = InetPath(req.requestPath.bySegment.map!(s => InetPath.Segment(s.toString()))).toString()) - the problem being that this normalization must not be done unless it's actually necessary, in order to avoid additional memory allocations.

vnayar · 2024-03-26T19:46:28Z

When going this route, as opposed to the replacement character approach, AFAICS we'd have to insert a kind of encoding normalization step for requestPath (i.e. something like path = InetPath(req.requestPath.bySegment.map!(s => InetPath.Segment(s.toString()))).toString()) - the problem being that this normalization must not be done unless it's actually necessary, in order to avoid additional memory allocations.

Just for clarification, inserting routes is a 1-time operation, thus any memory allocation done during the normalization of a route would only happen a single time but not during request handling, which happens in much higher volume throughout the lifetime of the program. As I dig in further, should avoiding memory allocation when adding a route be considered a priority or is it a 1-time small cost that can be safely ignored?

…path matching.

vnayar · 2024-03-27T07:52:12Z

I added some changes that, without memory allocation, decode individual characters that are encoded and refer to a non-reserved character. Essentially both routing paths and request paths are converted into the same form, URL-encoded with only reserved-characters being encoded.

s-ludwig · 2024-03-28T07:44:18Z

This is a nice approach and the allocations during route adding can indeed be ignored in terms of performance (rebuilding the match graph will be the far bigger issue, in case somebody actually dynamically adds routes while the server is running).

Now the only issues appears to be to get the semantics fully nailed down:

The paths that are passed to match() - are they supposed to be given in Posix or Inet format? The former wouldn't allow to specify a route like /foo%2Fbar/, so, while we are at it, it might make more sense to assume Inet format here and use something similar to the expression given in Decode path parameters and encode URLRouter paths. #35 (comment) to normalize
The internal representation of reserved characters must match exactly between the code that normalizes in match() and the logic in nextMatchChar (which is not the case right now, e.g. for "="). Maybe we should only encode "/" in the internal representation and pass all other chars through unencoded. Those other reserved characters aren't really relevant in this context and this would be the simplest way to ensure a correct match, plus result in the most efficient representation.

vnayar · 2024-04-09T20:24:02Z

The internal representation of reserved characters must match exactly between the code that normalizes in match() and the logic in nextMatchChar (which is not the case right now, e.g. for "="). Maybe we should only encode "/" in the internal representation and pass all other chars through unencoded. Those other reserved characters aren't really relevant in this context and this would be the simplest way to ensure a correct match, plus result in the most efficient representation.

To clarify, = is a reserved character, thus, it would not be considered a valid character in a URL path. Currently the code looks like this in match(...):

		// Perform URL-encoding on the path before adding it as a route.
		string iPath = (cast(InetPath) PosixPath(path)).toString();
		m_routes.addTerminal(iPath, Route(method, iPath, handlerDelegate(handler)));

An = character that is passed in would get URL encoded as %3D, and the code in handleRequest would match it if the incoming request also contains a %3D, but would not match it if the incoming request contains = directly, which is the expected behavior, because this would not be a valid URL.

The paths that are passed to match() - are they supposed to be given in Posix or Inet format? The former wouldn't allow to specify a route like /foo%2Fbar/, so, while we are at it, it might make more sense to assume Inet format here and use something similar to the expression given in Decode path parameters and encode URLRouter paths. #35 (comment) to normalize

The current code in match uses the code above, namely (cast(InetPath) PosixPath(path)).toString();. The current code does indeed accept paths such as "/hawaii toast", which is expected to match incoming requests like "/hawaii%20toast". If I were to change this method to only accept Inet format, that would be a backwards incompatible change, no?

s-ludwig · 2024-04-10T08:11:17Z

The internal representation of reserved characters must match exactly between the code that normalizes in match() and the logic in nextMatchChar (which is not the case right now, e.g. for "="). Maybe we should only encode "/" in the internal representation and pass all other chars through unencoded. Those other reserved characters aren't really relevant in this context and this would be the simplest way to ensure a correct match, plus result in the most efficient representation.

To clarify, = is a reserved character, thus, it would not be considered a valid character in a URL path. Currently the code looks like this in match(...):
		// Perform URL-encoding on the path before adding it as a route.
		string iPath = (cast(InetPath) PosixPath(path)).toString();
		m_routes.addTerminal(iPath, Route(method, iPath, handlerDelegate(handler)));
An = character that is passed in would get URL encoded as %3D, and the code in handleRequest would match it if the incoming request also contains a %3D, but would not match it if the incoming request contains = directly, which is the expected behavior, because this would not be a valid URL.

You are right, I skimmed over the code a bit too quickly and mistook the list of special characters to be the reserved characters instead of the unreserved ones. It should be noted that since requestPath is an InetPath, it is already validated at this point and the client would have gotten a "bad request" reply before getting here, though, so this kind of validation is not strictly necessary here and reserved/disallowed characters other than '/' could still be represented unencoded.

The paths that are passed to match() - are they supposed to be given in Posix or Inet format? The former wouldn't allow to specify a route like /foo%2Fbar/, so, while we are at it, it might make more sense to assume Inet format here and use something similar to the expression given in Decode path parameters and encode URLRouter paths. #35 (comment) to normalize

The current code in match uses the code above, namely (cast(InetPath) PosixPath(path)).toString();. The current code does indeed accept paths such as "/hawaii toast", which is expected to match incoming requests like "/hawaii%20toast". If I were to change this method to only accept Inet format, that would be a backwards incompatible change, no?

I meant the case of "/"->"%2F" specifically here, so with this transformation it is impossible to define a path segment with an interior slash.

vnayar · 2024-04-18T19:58:51Z

The current code in match uses the code above, namely (cast(InetPath) PosixPath(path)).toString();. The current code does indeed accept paths such as "/hawaii toast", which is expected to match incoming requests like "/hawaii%20toast". If I were to change this method to only accept Inet format, that would be a backwards incompatible change, no?

I meant the case of "/"->"%2F" specifically here, so with this transformation it is impossible to define a path segment with an interior slash.

I got a bit of time today to take another look. Indeed, if the interface allows match to be called with a PosixPath, then you can enter things like "/hawaii toast", however, it does make entering things like "/hawaii%2Ftoast" no longer possible, because there is no way to know if the user meant an escapted '%' then a '2' then a 'F' or if they meant a URL escaped '%2F', i.e. '/'.

The current code assumes Posix paths and has this problem. The original unmodified code for adding a path is m_routes.addTerminal(path, Route(method, path, handlerDelegate(handler)));, which simply adds a path character-by-character which will be used for matching (which leads to the original bug).

It can be made more consistent by changing the interface to always assume that mach will be given an Inet path string, but this will break compatibility with anyone who was doing things like having spaces in the path given to match and expecting the callers to know to escape it as '%20' when making HTTP requests. Is that acceptable to change?

vnayar · 2024-04-23T10:17:04Z

I added changes to change the router path to be InetPath format. This has the following impacts:

Previously accepted paths with unescaped characters, like "/bob cat" instead of "/bob%20cat" will no longer be accepted.
URL-escaped characters can now be used in router paths, e.g. "/f%2Fg", and matched by requests.
- In master this was possible because no escaping was performed on the path given to the router.
- In the prior commit to this one, the path was converted from Posix to Inet, meaning that '%' itself was being escaped as '%25', thus '%2F' became '%252F'.

Now, the code can route and match paths with URL-escaped characters, and include path parameters that have such characters as well.

While looking through the code, I had the following observation:

URL encoding is not unique, e.g. %2F and %2f both encode to /.
Therefore, matching based on URL-escaped forms require extra checks for case.
During route matching, no conversions or extra computational logic should be performed.
Matching done on URL-decoded paths, would require decoding the URL path, an extra computation.
Therefore, one MUST perform an extra computation during route matching, either:
1. Match capital/lower cases of URL-escaped character, e.g. %2F or %2f.
2. OR perform URL-decoding of characters in the request path and compare them to the already decoded route path.

The code I have here goes down the pathway of matching URL-encoded paths, thus this requires making sure URL-encoding is applied consistently, e.g. only URL-encoding reserved characters. I believe this path achieves the end-goal with as few changes as possible.

The alternative pathway would be to only accept InetPath format paths in the router, and decode them when adding them to the match-graph. But this would require decoding the request paths as well. Because a decoded path may contain additional / characters, the match graph algorithm would also have to be changed to match segment at a time, rather than character at a time, otherwise, it would have no way to tell apart a decoded / inside a path segment vs. a / which was the original segment delimiter. This pathway could potentially lead to better performance, because segment-based matching could take advantage of SIMD CPU instructions, using either string comparison for normal segments or a wildcard match for path-variable segments.

vnayar · 2024-04-23T11:58:01Z

Doing some local testing, another issue was discovered. There mere existence of routes added by calling serveStaticFiles causes an error during route matching whenever an incoming GET request arrives containing a URL-encoded / as %2F.

A snippet of the relevant stack trace:

[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/fileserver.d:58 @safe void vibe.http.fileserver.serveStaticFiles(vibe.core.path.GenericPath!(vibe.core.path.PosixPathFormat).GenericPath, vibe.http.fileserver.HTTPFileServerSettings).callback(scope vibe.http.server.HTTPServerRequest, vibe.http.server.HTTPServerResponse) [0x5f5597a3f127]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:239 @safe bool vibe.http.router.URLRouter.handleRequest(vibe.http.server.HTTPServerRequest, vibe.http.server.HTTPServerResponse).__lambda6!(ulong, immutable(char)[][]).__lambda6(ulong, scope immutable(char)[][]) [0x5f55979b47ae]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:764 const @safe bool vibe.http.router.MatchTree!(vibe.http.router.Route).MatchTree.doMatch(immutable(char)[], scope bool delegate(ulong, scope immutable(char)[][]) @safe) [0x5f55979b7104]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:697 @safe bool vibe.http.router.MatchTree!(vibe.http.router.Route).MatchTree.match(immutable(char)[], scope bool delegate(ulong, scope immutable(char)[][]) @safe) [0x5f55979b6947]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:227 @safe void vibe.http.router.URLRouter.handleRequest(vibe.http.server.HTTPServerRequest, vibe.http.server.HTTPServerResponse) [0x5f55979b4440]

vnayar · 2024-04-23T15:29:10Z

Doing some local testing, another issue was discovered. There mere existence of routes added by calling serveStaticFiles causes an error during route matching whenever an incoming GET request arrives containing a URL-encoded / as %2F.

A snippet of the relevant stack trace:

[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/fileserver.d:58 @safe void vibe.http.fileserver.serveStaticFiles(vibe.core.path.GenericPath!(vibe.core.path.PosixPathFormat).GenericPath, vibe.http.fileserver.HTTPFileServerSettings).callback(scope vibe.http.server.HTTPServerRequest, vibe.http.server.HTTPServerResponse) [0x5f5597a3f127]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:239 @safe bool vibe.http.router.URLRouter.handleRequest(vibe.http.server.HTTPServerRequest, vibe.http.server.HTTPServerResponse).__lambda6!(ulong, immutable(char)[][]).__lambda6(ulong, scope immutable(char)[][]) [0x5f55979b47ae]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:764 const @safe bool vibe.http.router.MatchTree!(vibe.http.router.Route).MatchTree.doMatch(immutable(char)[], scope bool delegate(ulong, scope immutable(char)[][]) @safe) [0x5f55979b7104]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:697 @safe bool vibe.http.router.MatchTree!(vibe.http.router.Route).MatchTree.match(immutable(char)[], scope bool delegate(ulong, scope immutable(char)[][]) @safe) [0x5f55979b6947]
[main(HrWZ) dbg] ../../.dub/packages/vibe-http/37f8fbda98fff60471ea9a2ba5f8ea3cb2bf5c00/vibe-http/source/vibe/http/router.d:227 @safe void vibe.http.router.URLRouter.handleRequest(vibe.http.server.HTTPServerRequest, vibe.http.server.HTTPServerResponse) [0x5f55979b4440]

Seems to be a red-herring. In my local setup, the serveStaticFiles path was set to "*", and for some reason, that code was being called despite a different path also matching. I had assumed that router matches were ordered, and that earlier routes would take priority over later routes, but maybe that's not the case. Removing the overlap of responsible URL paths makes the problem go away.

s-ludwig · 2024-04-23T17:01:15Z

I had assumed that router matches were ordered, and that earlier routes would take priority over later routes, but maybe that's not the case.

This is indeed the case, but requests will fall-through to the next matching route if no a response is written and no exception is thrown.

Previously accepted paths with unescaped characters, like "/bob cat" instead of "/bob%20cat" will no longer be accepted.

We can't afford to make any changes to behavior that was previously accepted, as that would mean a high risk of breaking production code. But I think we are actually fine here, except for the InetPath(path).toString() part of the route registration, where we'd need something like this instead:

PosixPath(path)
  .bySegment
  .map!(s => InetPath.Segment(urlDecode(s.toString())))
  .InetPath
  .toString

That would URL decode each segment of the incoming string path as if it were an Inet path segment, but without checking for reserved characters. Constructing an InetPath.Segment from the result will then re-encode properly before converting to the finally registered string.

…th characters.

…the router.

vnayar · 2024-04-23T20:06:29Z

PosixPath(path)
  .bySegment
  .map!(s => InetPath.Segment(urlDecode(s.toString())))
  .InetPath
  .toString

Clever solution, I was worried that things had worked themselves into a corner. The solution worked with only minor modifications. The Segment separator had to be preserved in order to distinguish paths like /a/:test from /a/:test/.

s-ludwig

Good catch with the trailing slash. Looks good to merge now, as far as I can see!

This was referenced Mar 24, 2024

Escape Character %2F in Path Parameter of RestInterface Leads to 404 #34

Closed

Add utility to build GenericPaths from unencoded strings. vibe-d/vibe-core#396

Closed

Replace use of with .

92a559d

Add logic to decode percent-encoded but unreserved characters during …

1d7abe2

…path matching.

vnayar added 2 commits April 23, 2024 21:37

Change router path format from Posix to Inet to permit URL-escaped pa…

fef4345

…th characters.

Apply s-ludwig's suggested algorithm to insert normalized paths into …

886fffa

…the router.

vnayar force-pushed the vibe-http-34_decode-path-params-encode-router-paths branch from 37f8fbd to 886fffa Compare April 23, 2024 20:04

Code cleanup.

1d24e18

s-ludwig approved these changes Apr 24, 2024

View reviewed changes

s-ludwig merged commit fcebb30 into vibe-d:master Apr 24, 2024
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decode path parameters and encode URLRouter paths. #35

Decode path parameters and encode URLRouter paths. #35

vnayar commented Mar 24, 2024

s-ludwig commented Mar 25, 2024

vnayar commented Mar 26, 2024

vnayar commented Mar 27, 2024

s-ludwig commented Mar 28, 2024

vnayar commented Apr 9, 2024 •

edited

Loading

s-ludwig commented Apr 10, 2024

vnayar commented Apr 18, 2024

vnayar commented Apr 23, 2024

vnayar commented Apr 23, 2024

vnayar commented Apr 23, 2024

s-ludwig commented Apr 23, 2024

vnayar commented Apr 23, 2024

s-ludwig left a comment

Decode path parameters and encode URLRouter paths. #35

Decode path parameters and encode URLRouter paths. #35

Conversation

vnayar commented Mar 24, 2024

s-ludwig commented Mar 25, 2024

vnayar commented Mar 26, 2024

vnayar commented Mar 27, 2024

s-ludwig commented Mar 28, 2024

vnayar commented Apr 9, 2024 • edited Loading

s-ludwig commented Apr 10, 2024

vnayar commented Apr 18, 2024

vnayar commented Apr 23, 2024

vnayar commented Apr 23, 2024

vnayar commented Apr 23, 2024

s-ludwig commented Apr 23, 2024

vnayar commented Apr 23, 2024

s-ludwig left a comment

Choose a reason for hiding this comment

vnayar commented Apr 9, 2024 •

edited

Loading