-
-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Trie implementation when different path variables used in same path location #273
base: main
Are you sure you want to change the base?
Conversation
This is a fantastic PR 👏 I looked at this issue a couple years ago and wasn't able to solve it due to time constraints. I originally tried to keep a stack of each opportunity to branch and then just keep trying each branch from the stack when there was a routing miss. I like your solution a lot better. Well done! 🥂 |
@adam12 Thanks! We considered using a search algorithm like that, but in addition to the speed issue, you would have to maintain a parallel stack with the params collected on-the-fly (and prune it as needed). That was starting to feel complicated, so we looked for different solutions that would let us keep the speed a trie offers. Cheers! |
Following is a discussion of the two tests currently broken by the proposed changes. First Broken TestThe first broken test is here, in describe "relative variable with permissive constraint" do
let(:router) do
described_class.new do
get ":test", as: :regex, test: /.*/, to: RecognitionTestCase.endpoint("regex")
end
end
it "recognizes route(s)" do
runner.run!([
[:regex, "/test/", {test: "test"}]
])
end
end There are several issues with this test. The first is that the route specified in the test setup does not include a leading The fact that this test passed previously appears to be a quirk of the current router implementation. The test is essentially designed to illustrate that quirk. If this test accurately implemented the route > m = Mustermann.new(':test', type: :rails, version: "5.0", capture: {test: /.*/})
=> #<Mustermann::Rails:":test">
> m.match("/test/").named_captures
=> {"test"=>"/test/"} Even if a leading > m = Mustermann.new('/:test', type: :rails, version: "5.0", capture: {test: /.*/})
=> #<Mustermann::Rails:":test">
> m.match("/test/").named_captures
=> {"test"=>"test/"} The bottom line is that this type of catch-all route is not possible with either the current router implementation or the proposed change. Both implementations assume the ability to split incoming paths based on the presence of Neither implementation has the ability to introspect any route constraints that would conflict with this expectation of being able to split incoming paths. Therefore, such practice should be discouraged as long as the current router algorithm is used (that is, a routing trie). I recommend deleting this test and documenting the reasons at this time. Second Broken TestThe second breaking test is here, in it "allows interpolation" do
router = described_class.new do
scope ":locale" do
root to: ->(env) { [200, {}, ["Root (#{env.dig('router.params', :locale)})!"]] }
get "/trees", to: ->(env) { [200, {}, ["Trees (#{env.dig('router.params', :locale)})!"]] }
end
end
app = Rack::MockRequest.new(router)
expect(app.request("GET", "/it", lint: true).body).to eq("Root (it)!")
expect(app.request("GET", "/it/", lint: true).body).to eq("Root (it)!")
expect(app.request("GET", "/it/trees", lint: true).body).to eq("Trees (it)!")
end This test is a little different. It is designed to test the usage of "scopes" (similar to path namespaces). The breakage of this test may expose a different bug in the router. Both of the routes defined in this test are fixed and, therefore, should never exercise the routing trie, which only handles routes with variable segments and only searches for incoming paths that don't match an existing fixed route. So why should any changes to the Trie class implementation break this test? Without looking, I suspect that the problem may lie in the treatment of GET "/it" # passing
GET "/it/" # failing
GET "/it/trees" # passing I would argue:
The fact that the opposite behavior is occurring is a sign that the router is not handling the edge case of scope root routes correctly at present. For example, following is the current Router handling of a normal root route: > Hanami::Router.new do
> root to: ->(env) { [200, {}, ["Root!"]] }
> end
=> #<Hanami::Router:0x00007898bff1b690
@base_url="http://localhost",
@blk=#<Proc:0x00007898bfbe7e60 (pry):7>,
@block_context=nil,
@fixed=
{"GET"=>{"/"=>#<Proc:0x00007898bfbe7de8 (pry):8 (lambda)>}
"HEAD"=>{"/"=>#<Proc:0x00007898bfbe7de8 (pry):8 (lambda)>}},
@globs_and_mounts=[],
@inspector=nil,
@name_prefix=#<Hanami::Router::Prefix:0x00007898bfbe81a8 @prefix="">,
@not_allowed=#<Proc:0x00007898c039c1d8 /home/damian/.gem/ruby/3.2.5/bundler/gems/router-66668b309034/lib/hanami/router.rb:760 (lambda)>,
@not_found=#<Proc:0x00007898c039c1b0 /home/damian/.gem/ruby/3.2.5/bundler/gems/router-66668b309034/lib/hanami/router.rb:775 (lambda)>,
@path_prefix=#<Hanami::Router::Prefix:0x00007898bfbe81d0 @prefix="/">,
@resolver=#<Proc:0x00007898c039c4d0 /home/damian/.gem/ruby/3.2.5/bundler/gems/router-66668b309034/lib/hanami/router.rb:706 (lambda)>,
@url_helpers=
#<Hanami::Router::UrlHelpers:0x00007898bfbe8180
@base_url=#<URI::HTTP http://localhost>,
@named={:root=>#<Mustermann::Rails:"/">},
@prefix=#<Hanami::Router::Prefix:0x00007898bfbe7f50 @prefix="/">>,
@variable={}> You can see that a fixed route is created for Compare to the route created for a scoped root (using the routes from the breaking test): > Hanami::Router.new do
> scope "it" do
> root to: ->(env) { [200, {}, ["Root (it)!"]] }
> get "/trees", to: ->(env) { [200, {}, ["Trees (it)!"]] }
> end
> end
=> #<Hanami::Router:0x00007898c0202bd8
@base_url="http://localhost",
@blk=#<Proc:0x00007898bffc89f8 (pry):1>,
@block_context=nil,
@fixed=
{"GET"=>{"/it"=>#<Proc:0x00007898bffc8430 (pry):3 (lambda)>, "/it/trees"=>#<Proc:0x00007898bffda608 (pry):4 (lambda)>},
"HEAD"=>{"/it"=>#<Proc:0x00007898bffc8430 (pry):3 (lambda)>, "/it/trees"=>#<Proc:0x00007898bffda608 (pry):4 (lambda)>}},
@globs_and_mounts=[],
@inspector=nil,
@name_prefix=#<Hanami::Router::Prefix:0x00007898bffc9448 @prefix="">,
@not_allowed=#<Proc:0x00007898c039c1d8 /home/damian/.gem/ruby/3.2.5/bundler/gems/router-66668b309034/lib/hanami/router.rb:760 (lambda)>,
@not_found=#<Proc:0x00007898c039c1b0 /home/damian/.gem/ruby/3.2.5/bundler/gems/router-66668b309034/lib/hanami/router.rb:775 (lambda)>,
@path_prefix=#<Hanami::Router::Prefix:0x00007898bffc94c0 @prefix="/">,
@resolver=#<Proc:0x00007898c039c4d0 /home/damian/.gem/ruby/3.2.5/bundler/gems/router-66668b309034/lib/hanami/router.rb:706 (lambda)>,
@url_helpers=
#<Hanami::Router::UrlHelpers:0x00007898bffc93d0
@base_url=#<URI::HTTP http://localhost>,
@named={:it_root=>#<Mustermann::Rails:"/it">},
@prefix=#<Hanami::Router::Prefix:0x00007898bffc8b38 @prefix="/">>,
@variable={}> Here, a fixed route is created for I believe that the second expectation was probably passing previously due to a bug in the current Trie class implementation. Since the proposed implementation matches against a full route, instead of individual segments, this is no longer true and the test now fails. I recommend suspending this test at present to allow the pull request to pass. If you agree that scope root route handling should start at |
I see that the CI build is failing at the RSpec step, before any tests are run. I don't think this is caused by anything in this pull request. It looks like Note that the RSpec step will fail when CI runs correctly, due to the two broken integration tests mentioned above. |
Thanks for this amazing PR, folks! And I think it's so cool that the two of you worked together on this 😍 I want to give this proper attention, so it'll need some brain space. Right now I'm finishing off multiple gateway support for Hanami's DB layer. As soon as I'm done with that, attending to this PR will be my #1 priority. Speak again soon :) |
@kyleplump and I collaborated on this PR.
Summary
Router cannot differentiate between multiple defined routes that contain different path variable names in the same location. For example:
This PR changes the router behavior to group all variable slugs together for matching purposes and to lazily create and evaluate Mustermann matchers. Discussion below.
The above symptom is illustrated in other open issues, such as #255 (Routes with multiple variables not working in common nested route scenario) and #256 (Inconsistent trailing slash behaviour). Although trailing slashes are not a documented use case, they sometimes work (and sometimes don't) due to segment-by-segment matching.
The proposed implementation should close #255 , #256 , and #272 , by applying complete route matchers that provide consistent, predictable behavior, as documented in the current Hanami Guides.
We also suggest closing #247 (Failing edge cases for optional variables) as an undocumented use case that is not currently supported. Optional segments are not supported by either the current router or the proposed update.
Closes #247
Closes #255
Closes #256
Closes #272
Discussion
Current Router Behavior
Hanami Router 2.0 uses a trie data structure to build a representation of all routes that contain a "dynamic segment," a.k.a., a path variable. It should be noted that all routes that contain only fixed segments are stored in a separate Hash where they can be looked up directly.
Notable features of the current router:
Because the Mustermann matcher for each dynamic segment is used as the key for that node, different matchers will result in different nodes being created, even if the dynamic segments in question are in parallel locations in different routes.
That is, in the route example above, different matchers would be created for
:id
and:uuid
, since different path variable names are used. However, the same node would be have been used for both routes if the same path variable name had been used (say,:id
). It is worth noting also that different matchers (and therefore different nodes) would also be created if the same path variable names were used, but with different constraints.The problem arises when matching paths and collecting parameters on the fly, as illustrated below.
The following routes:
Would produce a trie like this:
Where the double circles represent potential path endpoints, so both
:id
andedit
are potential path endpoints.In this case, the router has no trouble matching either
/1234
or/1234/edit
to the correct routes.However, if the following routes are used:
A trie like this will be created:
Here,
:id
and:uuid
are different nodes, and:id
is no longer an endpoint. Now, when the router tries to match a path like "/9d5794e7-12ed-4f00-a6d1-a5be9cd4072e", it has no way to know that this is a UUID and incorrectly matches it to the:id
node. Crucially, the UUID is collected and stored in the params Hash under the key:id
. When the router then realizes that this path does not match any route at or below the:id
node, it has no way to backtrack, "forget" the incorrect:id
param, and search down a different node path. In fact, backtracking in this way is contrary to trie design and would essentially convert the trie to an ordinary tree data structure.Proposed Router Behavior
The proposed router behavior is to store a single variable node for all dynamic segments that are defined in parallel locations in an app's route table. Also, rather than attempting to parse and collect path variable params on the fly, the proposed router delays the creation and evaluation of Mustermann matchers until a potential full path/route match has been identified.
The following routes:
Would now create a trie like this:
And each endpoint would contain the information needed to create and evaluate a matcher for a proposed path that evaluates to that endpoint.
Tests
The proposed implementation adds a new class, Leaf, and refactors the Trie and Node classes. New unit tests are included for each of these classes. New integration tests are also included (in
recognition_spec.rb
). All of these tests are passing.The proposed implementation breaks two existing tests, here and here. These tests each break for reasons that merit further, separate discussion in another comment below, but we believe these tests exercise undocumented behavior and should be changed or deleted. In the alternative, if this behavior is desired, it should be documented and described and we would be happy to try to implement it accordingly.
Interestingly, there is also a commented-out test here that documents the current issue, and now passes with the proposed changes. 🤓
Conclusion
With guidance, we will delete or modify the breaking tests or, in the alternative, implement whatever documented behavior is desired for these use cases.
Finally, performance is important in this gem. We believe it is possible that the proposed implementation is faster than the current one, but we have not performance tested it. We have no reason to believe it will be slower. Memory usage is also not tested, but we believe fewer objects will be created with the proposed changes.
Thank you!