Improve Trie implementation when different path variables used in same path location #273

dcr8898 · 2024-09-04T14:31:09Z

@kyleplump and I collaborated on this PR.

Summary

Router cannot differentiate between multiple defined routes that contain different path variable names in the same location. For example:

get "/:id/edit"
get "/:uuid"

This PR changes the router behavior to group all variable slugs together for matching purposes and to lazily create and evaluate Mustermann matchers. Discussion below.

The above symptom is illustrated in other open issues, such as #255 (Routes with multiple variables not working in common nested route scenario) and #256 (Inconsistent trailing slash behaviour). Although trailing slashes are not a documented use case, they sometimes work (and sometimes don't) due to segment-by-segment matching.

The proposed implementation should close #255 , #256 , and #272 , by applying complete route matchers that provide consistent, predictable behavior, as documented in the current Hanami Guides.

We also suggest closing #247 (Failing edge cases for optional variables) as an undocumented use case that is not currently supported. Optional segments are not supported by either the current router or the proposed update.

Closes #247
Closes #255
Closes #256
Closes #272

Discussion

Current Router Behavior

Hanami Router 2.0 uses a trie data structure to build a representation of all routes that contain a "dynamic segment," a.k.a., a path variable. It should be noted that all routes that contain only fixed segments are stored in a separate Hash where they can be looked up directly.

Notable features of the current router:

Routes are stored in the trie as nodes based on each segment of the route (segments defined by the "/" delimiter in the route).
For path variables/dynamic segments, a Mustermann matcher is created and stored in the trie node, with the matcher used as the key for the node.
When matching an incoming path to a route, the trie evaluates each segment of the path and calculates and collects derived parameter values from each dynamic segment as it goes (this is important).

Because the Mustermann matcher for each dynamic segment is used as the key for that node, different matchers will result in different nodes being created, even if the dynamic segments in question are in parallel locations in different routes.

That is, in the route example above, different matchers would be created for :id and :uuid, since different path variable names are used. However, the same node would be have been used for both routes if the same path variable name had been used (say, :id). It is worth noting also that different matchers (and therefore different nodes) would also be created if the same path variable names were used, but with different constraints.

The problem arises when matching paths and collecting parameters on the fly, as illustrated below.

The following routes:

get "/:id/edit"
get "/:id"

Would produce a trie like this:

flowchart LR
  A((root))-->B(((":id")))
  B-->C((("edit")))

Where the double circles represent potential path endpoints, so both :id and edit are potential path endpoints.

In this case, the router has no trouble matching either /1234 or /1234/edit to the correct routes.

However, if the following routes are used:

get "/:id/edit"
get "/:uuid"

A trie like this will be created:

flowchart LR
  A((root))-->B((":id"))
  B-->C((("edit")))
  A-->D(((":uuid")))

Here, :id and :uuid are different nodes, and :id is no longer an endpoint. Now, when the router tries to match a path like "/9d5794e7-12ed-4f00-a6d1-a5be9cd4072e", it has no way to know that this is a UUID and incorrectly matches it to the :id node. Crucially, the UUID is collected and stored in the params Hash under the key :id. When the router then realizes that this path does not match any route at or below the :id node, it has no way to backtrack, "forget" the incorrect :id param, and search down a different node path. In fact, backtracking in this way is contrary to trie design and would essentially convert the trie to an ordinary tree data structure.

Proposed Router Behavior

The proposed router behavior is to store a single variable node for all dynamic segments that are defined in parallel locations in an app's route table. Also, rather than attempting to parse and collect path variable params on the fly, the proposed router delays the creation and evaluation of Mustermann matchers until a potential full path/route match has been identified.

The following routes:

get "/:id/edit"
get "/:uuid"

Would now create a trie like this:

flowchart LR
  A((root))-->B(((":*")))
  B-->C((("edit")))

And each endpoint would contain the information needed to create and evaluate a matcher for a proposed path that evaluates to that endpoint.

Tests

The proposed implementation adds a new class, Leaf, and refactors the Trie and Node classes. New unit tests are included for each of these classes. New integration tests are also included (in recognition_spec.rb). All of these tests are passing.

The proposed implementation breaks two existing tests, here and here. These tests each break for reasons that merit further, separate discussion in another comment below, but we believe these tests exercise undocumented behavior and should be changed or deleted. In the alternative, if this behavior is desired, it should be documented and described and we would be happy to try to implement it accordingly.

Interestingly, there is also a commented-out test here that documents the current issue, and now passes with the proposed changes. 🤓

Conclusion

With guidance, we will delete or modify the breaking tests or, in the alternative, implement whatever documented behavior is desired for these use cases.

Finally, performance is important in this gem. We believe it is possible that the proposed implementation is faster than the current one, but we have not performance tested it. We have no reason to believe it will be slower. Memory usage is also not tested, but we believe fewer objects will be created with the proposed changes.

Thank you!

adam12 · 2024-09-04T18:03:57Z

This is a fantastic PR 👏

I looked at this issue a couple years ago and wasn't able to solve it due to time constraints. I originally tried to keep a stack of each opportunity to branch and then just keep trying each branch from the stack when there was a routing miss. I like your solution a lot better.

Well done! 🥂

dcr8898 · 2024-09-04T20:16:35Z

@adam12 Thanks!

We considered using a search algorithm like that, but in addition to the speed issue, you would have to maintain a parallel stack with the params collected on-the-fly (and prune it as needed). That was starting to feel complicated, so we looked for different solutions that would let us keep the speed a trie offers.

Cheers!

dcr8898 · 2024-09-05T21:34:22Z

Following is a discussion of the two tests currently broken by the proposed changes.

First Broken Test

The first broken test is here, in integration/hanami/router/recognition_spec.rb.

    describe "relative variable with permissive constraint" do
      let(:router) do
        described_class.new do
          get ":test", as: :regex, test: /.*/, to: RecognitionTestCase.endpoint("regex")
        end
      end


      it "recognizes route(s)" do
        runner.run!([
          [:regex, "/test/", {test: "test"}]
        ])
      end
    end

There are several issues with this test. The first is that the route specified in the test setup does not include a leading /. The second is that the constraints for this single-segment route are permissive and allow it to match any characters, including /. In this case, that essentially means that this route should match all incoming paths, and the full path string itself should be assigned to the path variable test.

The fact that this test passed previously appears to be a quirk of the current router implementation. The test is essentially designed to illustrate that quirk. If this test accurately implemented the route get ":test", and then received the incoming path GET "/test/", the returned params should be {test: "/test/"}, not {test: "test"} as specified in the test.

> m = Mustermann.new(':test', type: :rails, version: "5.0", capture: {test: /.*/})
=> #<Mustermann::Rails:":test">

> m.match("/test/").named_captures
=> {"test"=>"/test/"}

Even if a leading / is added to the initial route, the outcome should still be different than the current test outcome.

> m = Mustermann.new('/:test', type: :rails, version: "5.0", capture: {test: /.*/})
=> #<Mustermann::Rails:":test">

> m.match("/test/").named_captures
=> {"test"=>"test/"}

The bottom line is that this type of catch-all route is not possible with either the current router implementation or the proposed change. Both implementations assume the ability to split incoming paths based on the presence of / characters and process the resulting path segments sequentially.

Neither implementation has the ability to introspect any route constraints that would conflict with this expectation of being able to split incoming paths. Therefore, such practice should be discouraged as long as the current router algorithm is used (that is, a routing trie).

I recommend deleting this test and documenting the reasons at this time.

Second Broken Test

The second breaking test is here, in integration/hanami/router/scope_spec.rb.

  it "allows interpolation" do
    router = described_class.new do
      scope ":locale" do
        root to: ->(env) { [200, {}, ["Root (#{env.dig('router.params', :locale)})!"]] }

        get "/trees", to: ->(env) { [200, {}, ["Trees (#{env.dig('router.params', :locale)})!"]] }
      end
    end


    app = Rack::MockRequest.new(router)


    expect(app.request("GET", "/it", lint: true).body).to eq("Root (it)!")
    expect(app.request("GET", "/it/", lint: true).body).to eq("Root (it)!")
    expect(app.request("GET", "/it/trees", lint: true).body).to eq("Trees (it)!")
  end

This test is a little different. It is designed to test the usage of "scopes" (similar to path namespaces). The breakage of this test may expose a different bug in the router.

Both of the routes defined in this test are fixed and, therefore, should never exercise the routing trie, which only handles routes with variable segments and only searches for incoming paths that don't match an existing fixed route. So why should any changes to the Trie class implementation break this test?

Without looking, I suspect that the problem may lie in the treatment of /, especially trailing slashes. In this test, there are three expectations:

GET "/it"       # passing
GET "/it/"      # failing
GET "/it/trees" # passing

I would argue:

the first path, /it, should fail, since this path falls outside the scope of /it/, and
the second path, /it/, should pass without resorting to the routing trie because it is the true root of the /it/ scope.

The fact that the opposite behavior is occurring is a sign that the router is not handling the edge case of scope root routes correctly at present.

For example, following is the current Router handling of a normal root route:

> Hanami::Router.new do
>   root to: ->(env) { [200, {}, ["Root!"]] }    
> end  

=> #<Hanami::Router:0x00007898bff1b690
 @base_url="http://localhost",
 @blk=#<Proc:0x00007898bfbe7e60 (pry):7>,
 @block_context=nil,
 @fixed=
  {"GET"=>{"/"=>#<Proc:0x00007898bfbe7de8 (pry):8 (lambda)>}
   "HEAD"=>{"/"=>#<Proc:0x00007898bfbe7de8 (pry):8 (lambda)>}},
 @globs_and_mounts=[],
 @inspector=nil,
 @name_prefix=#<Hanami::Router::Prefix:0x00007898bfbe81a8 @prefix="">,
 @not_allowed=#<Proc:0x00007898c039c1d8 /home/damian/.gem/ruby/3.2.5/bundler/gems/router-66668b309034/lib/hanami/router.rb:760 (lambda)>,
 @not_found=#<Proc:0x00007898c039c1b0 /home/damian/.gem/ruby/3.2.5/bundler/gems/router-66668b309034/lib/hanami/router.rb:775 (lambda)>,
 @path_prefix=#<Hanami::Router::Prefix:0x00007898bfbe81d0 @prefix="/">,
 @resolver=#<Proc:0x00007898c039c4d0 /home/damian/.gem/ruby/3.2.5/bundler/gems/router-66668b309034/lib/hanami/router.rb:706 (lambda)>,
 @url_helpers=
  #<Hanami::Router::UrlHelpers:0x00007898bfbe8180
   @base_url=#<URI::HTTP http://localhost>,
   @named={:root=>#<Mustermann::Rails:"/">},
   @prefix=#<Hanami::Router::Prefix:0x00007898bfbe7f50 @prefix="/">>,
 @variable={}>

You can see that a fixed route is created for /.

Compare to the route created for a scoped root (using the routes from the breaking test):

> Hanami::Router.new do
>   scope "it" do  
>     root to: ->(env) { [200, {}, ["Root (it)!"]] }
>     get "/trees", to: ->(env) { [200, {}, ["Trees (it)!"]] }
>   end  
> end  

=> #<Hanami::Router:0x00007898c0202bd8
 @base_url="http://localhost",
 @blk=#<Proc:0x00007898bffc89f8 (pry):1>,
 @block_context=nil,
 @fixed=
  {"GET"=>{"/it"=>#<Proc:0x00007898bffc8430 (pry):3 (lambda)>, "/it/trees"=>#<Proc:0x00007898bffda608 (pry):4 (lambda)>},
   "HEAD"=>{"/it"=>#<Proc:0x00007898bffc8430 (pry):3 (lambda)>, "/it/trees"=>#<Proc:0x00007898bffda608 (pry):4 (lambda)>}},
 @globs_and_mounts=[],
 @inspector=nil,
 @name_prefix=#<Hanami::Router::Prefix:0x00007898bffc9448 @prefix="">,
 @not_allowed=#<Proc:0x00007898c039c1d8 /home/damian/.gem/ruby/3.2.5/bundler/gems/router-66668b309034/lib/hanami/router.rb:760 (lambda)>,
 @not_found=#<Proc:0x00007898c039c1b0 /home/damian/.gem/ruby/3.2.5/bundler/gems/router-66668b309034/lib/hanami/router.rb:775 (lambda)>,
 @path_prefix=#<Hanami::Router::Prefix:0x00007898bffc94c0 @prefix="/">,
 @resolver=#<Proc:0x00007898c039c4d0 /home/damian/.gem/ruby/3.2.5/bundler/gems/router-66668b309034/lib/hanami/router.rb:706 (lambda)>,
 @url_helpers=
  #<Hanami::Router::UrlHelpers:0x00007898bffc93d0
   @base_url=#<URI::HTTP http://localhost>,
   @named={:it_root=>#<Mustermann::Rails:"/it">},
   @prefix=#<Hanami::Router::Prefix:0x00007898bffc8b38 @prefix="/">>,
 @variable={}>

Here, a fixed route is created for /it, but no route is created for /it/. Therefore, handling of this incoming path (/it/) falls through to the routing trie. As I said above, I think this behavior is backwards. In any event, it is not implemented in the way the test appears to specify.

I believe that the second expectation was probably passing previously due to a bug in the current Trie class implementation. Since the proposed implementation matches against a full route, instead of individual segments, this is no longer true and the test now fails.

I recommend suspending this test at present to allow the pull request to pass.

If you agree that scope root route handling should start at /it/, and not /it, then we will be happy to address this in a separate pull request. In the alternative, if you believe that a scoped root route should match both /it and /it/, then we can implement that behavior instead.

dcr8898 · 2024-09-05T21:36:57Z

I see that the CI build is failing at the RSpec step, before any tests are run.

I don't think this is caused by anything in this pull request. It looks like rexml is no longer in the standard library and may need to be added to a gemfile or gemspec (it's used in spec/support/fixtures.rb). Reference here. But I'm not sure this gem is the best place for it. Maybe hanami-devtools or hanami-utils?

Note that the RSpec step will fail when CI runs correctly, due to the two broken integration tests mentioned above.

timriley · 2024-09-12T10:46:10Z

Thanks for this amazing PR, folks! And I think it's so cool that the two of you worked together on this 😍

I want to give this proper attention, so it'll need some brain space. Right now I'm finishing off multiple gateway support for Hanami's DB layer. As soon as I'm done with that, attending to this PR will be my #1 priority. Speak again soon :)

dcr8898 and others added 5 commits August 28, 2024 21:32

add failing test case to document Issue hanami#272

b5b4c93

implement Leaf class with tests

4b82b83

new integration test case, created router/trie_spec

3eced52

refactor Node class with new tests

3618568

refactor Trie class with new tests

e5a4c0e

appease Rubocop

66668b3

rename Trie#split private method to Trie#segments_from

1c794bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Trie implementation when different path variables used in same path location #273

Improve Trie implementation when different path variables used in same path location #273

dcr8898 commented Sep 4, 2024 •

edited

Loading

adam12 commented Sep 4, 2024

dcr8898 commented Sep 4, 2024

dcr8898 commented Sep 5, 2024 •

edited

Loading

dcr8898 commented Sep 5, 2024 •

edited

Loading

timriley commented Sep 12, 2024

Improve Trie implementation when different path variables used in same path location #273

Are you sure you want to change the base?

Improve Trie implementation when different path variables used in same path location #273

Conversation

dcr8898 commented Sep 4, 2024 • edited Loading

Summary

Discussion

Current Router Behavior

Proposed Router Behavior

Tests

Conclusion

adam12 commented Sep 4, 2024

dcr8898 commented Sep 4, 2024

dcr8898 commented Sep 5, 2024 • edited Loading

First Broken Test

Second Broken Test

dcr8898 commented Sep 5, 2024 • edited Loading

timriley commented Sep 12, 2024

dcr8898 commented Sep 4, 2024 •

edited

Loading

dcr8898 commented Sep 5, 2024 •

edited

Loading

dcr8898 commented Sep 5, 2024 •

edited

Loading