Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

req.url: fixed consistency with Fastly implementation #262

Merged
merged 4 commits into from
Mar 28, 2024

Conversation

akrainiouk
Copy link
Contributor

Fastly implementation of req.url, req.url.path, req.url.basename and req.url.ext returns the raw version of these values while falco returns urldecoded versions.
This PR fixes this inconsistency

@richardmarshall
Copy link
Collaborator

Could you please provide an example of how Falco deviates from fastly on this? A Fiddle vs a falco vcl test case for example.

@akrainiouk
Copy link
Contributor Author

akrainiouk commented Feb 19, 2024

This fiddle demonstrates fastly behavior: https://fiddle.fastly.dev/fiddle/86d69ef4
Request path value contains url encoded characters as well as literal '+' character: /d%69rname/ba%73e+name.extensi%6Fn
the code in vcl_error dumps the values of various req.url subfields into the response body as a plain text.
Here is the response:


    req.url:          '/d%69rname/ba%73e+name.extensi%6Fn'
    req.url.path:     '/d%69rname/ba%73e+name.extensi%6Fn'
    req.url.dirname:  '/d%69rname'
    req.url.basename: 'ba%73e+name.extensi%6Fn'
    req.url.ext:      'extensi%6Fn'

As you can see all req.url fields maintains the original encoding.
It is a little tricker to see falco's behavior as I couldn't find the way to see the synthetic responses in the simulator.
The easiest way is probably to save this fragment into the file (e.g. main.vcl):

sub vcl_recv {
  // @debugger
  return(lookup);
}

and run it in the debugger:

falco simulate -debug main.vcl

Send test http request:

curl -Ss 'http://localhost:3124/d%69rname/ba%73e+name.extensi%6Fn'

And then when the execution breaks into debugger just manually url components while watching the output:

> req.url
>> (STRING)/dirname/base+name.extension
> req.url.path
>> (STRING)/dirname/base+name.extension
> req.url.dirname
>> (STRING)/dirname
> req.url.basename
>> (STRING)base+name.extension
> req.url.ext
>> (STRING)extension

Note that as opposed to Fastly behavior all url components come out decoded.
'+' character is (correctly) preserved though.

Also falco behavior can be demonstrated with this test case (sorry for some reason I am unable to attach it as a file):

// @scope: recv
// @suite: test url variable representation
sub test_url {
  set req.url = {"/d%69rname/ba%73ename.extensi%6Fn"};
  assert.equal(req.url, {"/d%69rname/ba%73ename.extensi%6Fn"});
} 

Save it to 'main.test.vcl' file (no need for main.vcl as it never gets called) and then run:

falco test test/url-representation/main.test.vcl

The test fails with this message:

Running tests... Done.
 FAIL  /Users/akrainio/work/vcl-rule-compiler/test/url-representation/main.test.vcl
  ●  [RECV] test url variable representation

    Assertion Error: Assertion error: expect=/d%69rname/ba%73ename.extensi%6Fn, actual=/dirname/basename.extension
    Actual Value: /dirname/basename.extension

    4|   set req.url = {"/d%69rname/ba%73ename.extensi%6Fn"};
    5|   assert.equal(req.url, {"/d%69rname/ba%73ename.extensi%6Fn"});
    6| }  

0 passed, 1 failed, 1 total, 1 assertions

This test demonstrates that the decoding actually happens when the internal representation of url.path is accessed and converted to string.

@richardmarshall
Copy link
Collaborator

Thank you for the details. So this is an interesting case; these changes do fix a consistency issue but also break an edge case where we're currently compatible with Fastly.

Fastly allows you to set req.url to a value that is not escaped properly and will not escape the value when read back.

set req.url = "/a b";
log req.url; // yields `/a b` not `/a%20b`

Granted we're only compatible with this edge case by way of an accident but still worth maintaining. It should be pretty straightforward to retain compatibility by comparing the result of EscapedPath() against RawPath. If RawPath is not empty and is not equal to the escaped value return the RawPath instead of the escaped path.

Something along the lines of:

u := req.URL.EscapedPath()
if req.URL.RawPath != "" && u != req.url.RawPath {
    u = req.URL.RawPath
}

@akrainiouk
Copy link
Contributor Author

akrainiouk commented Feb 27, 2024

@richardmarshall thanks for your advice. I amended code responsible for accessing url raw path as you suggested and this took care of space character handling.

One case that is still not covered is when assigned URL value contains invalid URL encoding (and probably any malformed URL).
In Falco it leads to parsing error (because it attempts to pre-parse the value into URL object).
Fastly apparently just holds req.url as a string and only splits it into path ? query # fragment when necessary.

Fixing this in Falco will require deeper changes so I am holding for now.

Copy link
Collaborator

@richardmarshall richardmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Just one typo and a linter issue that need fixing.

Regarding the invalid escape sequence issue I agree that it can be dealt with separately.

interpreter/variable/all.go Outdated Show resolved Hide resolved
interpreter/variable/all.go Outdated Show resolved Hide resolved
@akrainiouk
Copy link
Contributor Author

Regarding the invalid escape sequence issue I agree that it can be dealt with separately.

There is a similarity in behavior of req.url and querystring function family (see #264).

From what I can see from Fastly behavior req.url handling shares the same implementation as querystring functions, at least in the sense that they both avoid full URL parsing. Perhaps the same code can be used for both.

@ysugimoto
Copy link
Owner

Thanks both, I've just caught up on the implementation and discussions - understanding req.url handling.
sometimes we met some problems with the AST-based vs context parsing approach but we trust AST-based could be more robust (but I know it's hard to follow Fastly's spec).

@ysugimoto ysugimoto merged commit 13d2ef9 into ysugimoto:main Mar 28, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants