Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support NotSet string behavior in falco #248

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

ysugimoto
Copy link
Owner

@ysugimoto ysugimoto commented Jan 27, 2024

Fixes #235 #236 #237

I apologize this PR includes massive changes.

This PR implements NotSet string support which is described here. To support this behavior, I had to make fundamental changes to HTTP header manipulation:

Implement original HTTP-related struct

To simulate Fastly HTTP-related behavior conveniently, I implemented the following structs originally:

  • http.Request
  • http.Response
  • http.Header
  • cookie retrieving function

These are equivalent to Golang's net/http package and can convert each other through the utility function.
You can see the above structs as flchttp aliased import.

Particularly, http.Header implements Fastly header manipulation feature, colon-separated syntax, multiple header behavior, and adding (null) string to the actual HTTP request when NotSet string is assigned.

Implement more string value type

I implemented LenientString, which keeps assigned/concatenated values inside and can support to output (null) string if the value is entirely NotSet.

According to declare the new value type, I needed to change the entire expression evaluation stuff, assign, operator, built-in function, etc.

After this implementation, we need to consider whether the string type is either *value.String or *value.LenientiString, then we can use value.GetString(value.Value) as wrap them transparently.

Implement new transport package

This is not related to NotSet-related things but the actual origin requests feature handles this package.
This package includes creating Golang's http.Request pointer from flchttp.Request, making director, and so on.
Please look at the implementation.

This change should bump the major version I think.

Again, sorry for the huge changes, please take a look at the changesets with having the above perspective and perhaps you have unclear points, then feel free to put comments to diff, and let's discuss it, thanks.

@ysugimoto ysugimoto changed the base branch from main to develop January 27, 2024 21:43
@ysugimoto ysugimoto changed the base branch from develop to main January 27, 2024 21:44
@richardmarshall
Copy link
Collaborator

Definitely a lot to look through, will start reviewing the changes in the next day or two when I have a bigger block of free time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started looking through the PR this morning and the test assertions in this file stood out to me.

For example in this test:

// @scope: recv
// @suite: UNSET header(add-add-unset)
sub test_recv {
    set req.http.VALUE = req.http.NOTSET;
    assert.is_notset(req.http.VALUE);

    set req.http.MESSAGE = req.http.VALUE;
    assert.equal(req.http.MESSAGE, "(null)");
}

What seemed odd to me was the fact that assert.equal(req.http.MESSAGE, "(null)"); was passing when the header is unset. A condition like req.http.MESSAGE == "(null)" in an if statement for example should evaluate to false. So assert.equal should seemingly also reject this case.

Put together a fiddle that reproduces the sequence of sets from the test and inspecting the results.
https://fiddle.fastly.dev/fiddle/d1c67281

The log output of that fiddle is

req.http.VALUE
not set
logged value: (null)
req.http.MESSAGE
not set
logged value: (null)

Both the VALUE and MESSAGE headers are "not set" and don't equal "(null)" in the context of an if statement condition. But what I found interesting is that the logged value is "(null)". That fact started a whole other path of looking into the behavior of all this and I ran out of time to be able to get back to reviewing the PR.

I started putting together another fiddle with as many permutations of usage of unset header and local string variables I could think of to use as a basis for comparing the falco not set string behavior with what Fastly is doing. Will share that when I have some time to finish it and will get back to reading through more of the PR soon.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right - this looks strange to us but it simulates Fastly's behavior.
The NotSet string evaluation depends on the context, sometimes empty string, and sometimes "(null)", and unfortunately Fastly VCL developer also could not grab all cases that the NotSet string could evaluate as "(null)".

However, I'd understand the cases following:

  1. on log, synthetic, and synthetic_base64 statements, message string includes NotSet will evaluate as "(null)"
  2. on making actual HTTP headers that send to the origin, the header value string includes NotSet will evaluate as "(null)"
  3. on assigning into STRING local variable like var.SomeString will evaluate as empty string
  4. In other cases, as you pointed out if expression, NotSet string will be evaluated as falsy and any comparison operation should be treated like NULL.
  5. built-in function - some functions will be treated as "(null)" but nobody is figuring it out now.

Probably there are more cases where the NotSet string is evaluated as "(null)" hence this PR includes the features that I only understand.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't see this when writing up the comment with the test case fiddle/tests yesterday. I had missed the synthetic statements when thinking through various places where not set values would need to be considered.

You're right synthetic does behave like the log statement and always expands not set values to (null).

I'm not seeing the (null) behavior with synthetic.base64 though. The response body in a fiddle with synthetic.base64 req.http.unset seems to be empty. Likewise doing a string concat expression with that statement synthetic.base64 req.http.unset + "aGkK" results in the body hi without the (null) expansion for the unset header.

@richardmarshall
Copy link
Collaborator

Put together this fiddle / falco test file to evaluate as many not set conditions I could come up with in a reasonable amount of time.

https://fiddle.fastly.dev/fiddle/3f8ea76a

Running these tests against this PR branch shows a few discrepancies from what occurs in Fastly.

rgm@penguin:~/src/github.com/richardmarshall/falco$ git checkout feat/re-implementation-for-header-value
Switched to branch 'feat/re-implementation-for-header-value'
Your branch is up to date with 'upstream/feat/re-implementation-for-header-value'.
rgm@penguin:~/src/github.com/richardmarshall/falco$ make linux_amd64 && ./dist/falco test test_cases/not_set/main.test.vcl 
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build \
                 -ldflags "-X main.version=dev" \
                 -o dist/falco-linux-amd64 ./cmd/falco
cd ./dist/ && cp ./falco-linux-amd64 ./falco && tar cfz falco-linux-amd64.tar.gz ./falco
Running tests... Done.
 FAIL  /home/rgm/src/github.com/richardmarshall/falco/test_cases/not_set/main.test.vcl
  ✓ [RECV] Set STRING var the result of concatenation of an unset STRING var and unset STRING var
  ✓ [RECV] Set STRING var the result of concatenation of a string literal and unset STRING var
  ✓ [RECV] Set STRING var the result of concatenation of a string literal and unset IP var
  ✓ [RECV] Set STRING var the result of space concatenation of an unset STRING var and unset STRING var
  ✓ [RECV] Set STRING var the result of space concatenation of a string literal and unset STRING var
  ✓ [RECV] Set STRING var the result of space concatenation of a string literal and unset IP var
  ✓ [RECV] Set STRING var the result of concatenation of a string literal and unset header
  ✓ [RECV] Set STRING var the result of space concatenation of a string literal and unset header
  ●  [RECV] Set header the result of concatenation of a string literal and unset STRING var

    Assertion Error: got: empty string, expected: (null)
    Actual Value: 0

    225|     log "empty string";
    226|     assert.true(false, "got: empty string, expected: (null)");
    227|   } else if (req.http.unset_string == "left(null)") {

  ●  [RECV] Set header the result of concatenation of a string literal and unset IP var

    Assertion Error: got: empty string, expected: (null)
    Actual Value: 0

    254|     log "empty string";
    255|     assert.true(false, "got: empty string, expected: (null)");
    256|   } else if (req.http.unset_ip == "left(null)") {

  ●  [RECV] Set header the result of concatenation of a string literal and header

    Assertion Error: got: empty string, expected: (null)
    Actual Value: 0

    282|     log "empty string";
    283|     assert.true(false, "got: empty string, expected: (null)");
    284|   } else if (req.http.concat_unset_header == "left(null)") {

  ●  [RECV] Add header the result of concatenation of a string literal and unset header

    Assertion Error: got: empty string, expected: (null)
    Actual Value: 0

    310|     log "empty string";
    311|     assert.true(false, "got: empty string, expected: (null)");
    312|   } else if (req.http.add_concat_unset_header_left == "left(null)") {

  ●  [RECV] Add header the result of concatenation of a string literal and unset STRING var

    Assertion Error: got: empty string, expected: (null)
    Actual Value: 0

    339|     log "empty string";
    340|     assert.true(false, "got: empty string, expected: (null)");
    341|   } else if (req.http.add_concat_unset_string_left == "left(null)") {

  ●  [RECV] Add header the result of concatenation of a string literal and unset IP var

    Assertion Error: got: empty string, expected: (null)
    Actual Value: 0

    368|     log "empty string";
    369|     assert.true(false, "got: empty string, expected: (null)");
    370|   } else if (req.http.add_concat_unset_ip_left == "left(null)") {

  ✓ [RECV] Set header an unset STRING var
  ✓ [RECV] Set header an unset IP var
  ✓ [RECV] Set header an unset header
  ✓ [RECV] Add header an unset header
  ✓ [RECV] Add header an unset STRING
  ✓ [RECV] Add header an unset IP
  ✓ [RECV] Set STRING var an unset STRING var
  ✓ [RECV] Set STRING var an unset IP var
  ✓ [RECV] Set STRING var an unset header
  ✓ [RECV] Set STRING var result of json.escape with unset STRING argument
  ✓ [RECV] Set STRING var result of regsub with unset STRING replacement argument
  ✓ [RECV] Set STRING var result of json.escape with string concatenation argument containing an unset STRING
  ✓ [RECV] Use unset STRING variable as switch statement control
  ✓ [RECV] Use unset IP variable as switch statement control
  ●  [RECV] Use unset header as switch statement control

    Assertion Error: got: (null), expected: empty string
    Actual Value: 0

    674|       log "(null)";
    675|       assert.true(false, "got: (null), expected: empty string");
    676|       break;

  ✓ [RECV] Unset STRING variable in if statement condition
  ✓ [RECV] Unset header in if statement condition
  ✓ [RECV] Log statement with unset STRING variable
  ✓ [RECV] Log statement with unset IP variable
  ✓ [RECV] Log statement with unset header
27 passed, 7 failed, 34 total, 37 assertions

It seems the (null) insertion primarily occurs with the concatenation operator specifically when the target of the resulting assignment is a header. If a not set value is assigned directly to a header this does not occur but in the case of the log statement both these cases trigger (null) insertion.

I have not done an exhaustive set of testing of the built-in functions that accept strings but the ones I have tested all seem to handle not set STRING/IP values a empty strings.

Back to the code review aspect of this I have some reservations about the LenientString solution that I will try and write up tonight or tomorrow morning. Not having as much time to work on this as I typically would so please bear with me and I'll get through the review as soon as I can.

@richardmarshall
Copy link
Collaborator

The LenientString/VString types are handling two use-cases; one for handling the (null) expansion, and the other to store a collection of values for the header object representation (req.http.foo:bar).

Multiple values

Since the only use case for the multiple values is the Header value KV object feature it feels like a detail that should be internal to the Header type rather than part of the interpreter type system.

(null) expansion

Again this feels like functionality that should be internal to the statement (set, add, log, synthetic) / expression (concatenation) handling rather than part of the STRING/IP value types or a wrapper for them in this case with the LenientString type. Since there are contextual details that need to be taken into account for when to perform expansion and when not to there will inevitably need to be logic in the expression parsing stage so it seems to me to make more sense for this logic to live there rather than in a value type.

Took a very quick stab at handling just the (null) expansion without making changes to the String or IP value type handling.

richardmarshall@3212a48

This approach is able to pass all the tests and limits the scope of changes needed to support the (null) behavior considerably compared to using the LenientString / VString model.

rgm@penguin:~/src/github.com/richardmarshall/falco$ make linux_amd64 && ./dist/falco test examples/testing/not_set/main.vcl 
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build \
                 -ldflags "-X main.version=dev" \
                 -o dist/falco-linux-amd64 ./cmd/falco
cd ./dist/ && cp ./falco-linux-amd64 ./falco && tar cfz falco-linux-amd64.tar.gz ./falco
Running tests... Done.
 PASS  /home/rgm/src/github.com/richardmarshall/falco/examples/testing/not_set/not_set.test.vcl
  ✓ [RECV] Set STRING var the result of concatenation of an unset STRING var and unset STRING var
  ✓ [RECV] Set STRING var the result of concatenation of a string literal and unset STRING var
  ✓ [RECV] Set STRING var the result of concatenation of a string literal and unset IP var
  ✓ [RECV] Set STRING var the result of space concatenation of an unset STRING var and unset STRING var
  ✓ [RECV] Set STRING var the result of space concatenation of a string literal and unset STRING var
  ✓ [RECV] Set STRING var the result of space concatenation of a string literal and unset IP var
  ✓ [RECV] Set STRING var the result of concatenation of a string literal and unset header
  ✓ [RECV] Set STRING var the result of space concatenation of a string literal and unset header
  ✓ [RECV] Set header the result of concatenation of a string literal and unset STRING var
  ✓ [RECV] Set header the result of concatenation of a string literal and unset IP var
  ✓ [RECV] Set header the result of concatenation of a string literal and header
  ✓ [RECV] Add header the result of concatenation of a string literal and unset header
  ✓ [RECV] Add header the result of concatenation of a string literal and unset STRING var
  ✓ [RECV] Add header the result of concatenation of a string literal and unset IP var
  ✓ [RECV] Set header an unset STRING var
  ✓ [RECV] Set header an unset IP var
  ✓ [RECV] Set header an unset header
  ✓ [RECV] Add header an unset header
  ✓ [RECV] Add header an unset STRING
  ✓ [RECV] Add header an unset IP
  ✓ [RECV] Set STRING var an unset STRING var
  ✓ [RECV] Set STRING var an unset IP var
  ✓ [RECV] Set STRING var an unset header
  ✓ [RECV] Set STRING var result of json.escape with unset STRING argument
  ✓ [RECV] Set STRING var result of regsub with unset STRING replacement argument
  ✓ [RECV] Set STRING var result of json.escape with string concatenation argument containing an unset STRING
  ✓ [RECV] Use unset STRING variable as switch statement control
  ✓ [RECV] Use unset IP variable as switch statement control
  ✓ [RECV] Use unset header as switch statement control
  ✓ [RECV] Unset STRING variable in if statement condition
  ✓ [RECV] Unset header in if statement condition
  ✓ [RECV] Log statement with unset STRING variable
  ✓ [RECV] Log statement with unset IP variable
  ✓ [RECV] Log statement with unset header
34 passed, 0 failed, 34 total, 43 assertions

I'll start looking through the new transport package part of this PR soon.

Copy link
Collaborator

@richardmarshall richardmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transport package looks good, just a couple small comments.

}

for retry := 0; retry < maxRetry; retry++ {
// Check backens are enough healthy to determine
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo backends.

)
}
if v, ok := prop.Value.(*ast.Ident); !ok {
return nil, exception.Runtime(&prop.GetMeta().Token, ".key value must be integer")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo integer should be ident.

}
}

rand.New(rand.NewSource(time.Now().Unix()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generator created by this isn't stored, but for this use case a local generator isn't really needed so this line can just be removed.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be stored to variable and call r.Intn to be sure, thanks.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you're right, we should remove this line to correct choosing backend by weight.

"github.com/ysugimoto/falco/interpreter/limitations"
)

// Send proxy request to spefic origin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo specific

if v, ok := prop.(T); ok {
return v, nil
}
return nil, errors.New("Type assertion failed")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the linter will catch an invalid property value type it would be good to give the user a more helpful error message here so they know what backend property is faulty in the test and simulator output.

Copy link
Owner Author

@ysugimoto ysugimoto Feb 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T is just generics type info so now I improved the message that the user needs to set the correct type for the property name of key.

func backendHost(backend *value.Backend, override *config.OverrideBackend) (string, error) {
// host may be overrided by config
if override != nil {
return override.Host, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this have a check for override.Host not being an empty string?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I think it should be validated on config initialization. I'll do it on another issue.


func backendScheme(backend *value.Backend, override *config.OverrideBackend) (string, error) {
// scheme may be overrided by config
if override != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this pair of if statements are logically one thing it could be simplified into a single condition. override != nil && override.SSL. There are a couple similar pairings in this file as well, lines 56/57 and 131/132.

Largely a style preference nit though.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we could make it clear the override.SSL would be compared with a bool type.

@ysugimoto
Copy link
Owner Author

@richardmarshall richardmarshall@3212a48 is a great approach to consider the (null) string! could you send me a PR for this branch of feat/re-implementation-for-header-value please?

@richardmarshall
Copy link
Collaborator

@richardmarshall richardmarshall@3212a48 is a great approach to consider the (null) string! could you send me a PR for this branch of feat/re-implementation-for-header-value please?

Sure will do, but probably won't have time for a few days.

@richardmarshall
Copy link
Collaborator

I got started on rebasing my experimental changes onto this branch when I got derailed by discovering what seems to be a bug in Fastly's not_set handling for header fields. Realized I hadn't tested how header fields behave when assigned not_set values or are included in concatenation, etc. Through that I found that the length of the field name has an impact on the behavior.

set req.http.test1 = req.http.unset;    // test1 header is a not_set string
set req.http.test2:a = req.http.unset;  // field a of the test2 header is a not_set string
set req.http.test3:ab = req.http.unset; // field ab of the test3 header is an empty string

Header fields with a key length of 1 behave differently than keys with length > 1 when the header is unset. Which doesn't make any logical sense so seems like a bug.

Fiddle showing this odd behavior:
https://fiddle.fastly.dev/fiddle/5dd31959

Anyway will get back to the rebase work soon.

@richardmarshall
Copy link
Collaborator

Ended up running into another issue that opened up a whole new problem area for our compatibility with Fastly's header field implementation. Currently in our implementation setting the main header value to an encoded set of fields like set req.http.foo = "a=1,b=2"; does not handle accessing the keys correctly. In this example accessing foo:a would return 1,b=2 and trying to access foo:b would return a not_set value.

Fastly parses the full set of fields and makes them accessible. I initially though supporting this seemed like it would be doable with some tweaks to the implementation in this PR until I found another weird case.

set req.http.foo = "a=1,b=2";
log req.http.foo:a; // `1` as expected
log req.http.foo:b; // `2` as expected
set req.http.foo:c = ",d=3";
log req.http.foo:c; // `,d=3` as expected
log req.http.foo:d; // `3"` unexpected

Setting a field value that contains a field separator , and a new key/value pair then allows you to access the new key. This seems to indicate that Fastly isn't storing the header fields in a map of sorts and is instead parsing the main header value to extract the fields each time a field is accessed.

At this point I started putting together a set of validation tests to figure out as much about how fastly is dealing with these header fields. Found that the RFC-8941 specification for structured header values contained a number of test cases and used that as a starting point.

RFC - https://www.rfc-editor.org/rfc/rfc8941.html
Dictionary test cases - https://github.com/httpwg/structured-field-tests/blob/main/dictionary.json
Validation tests: https://github.com/richardmarshall/falco-validation-tests/blob/main/header_fields/main.test.vcl

I don't have a fiddle prepared confirming the Fastly results of all the tests but I can put that together soon.

Results for main and this PRs branch:

main:

5 passed, 36 failed, 41 total, 44 assertions

This PR:

2 passed, 39 failed, 41 total, 41 assertions

Took a quick stab at trying to handle field access in a way that didn't rely on storing the fields independently from the main header value. Started with an approach based on simple string splits on the field separator and was able to significantly improve the compatibility test results.

17 passed, 24 failed, 41 total, 70 assertions

But ran into challenges with properly dealing with field values that contain a ,. Started over with a regex solution and was able to construct an expression that seems to properly match the behavior of Fastly and gets all the compatibility tests to pass.

41 passed, 0 failed, 41 total, 107 assertions

The regex pattern is hard to interpret and could likely be optimized for clarity but it handles the idiosyncrasies of the Fastly behavior.

main...richardmarshall:falco:header_field_experiment

Additionally I did open a PR against this branch for my not_set handling #265.

@ysugimoto
Copy link
Owner Author

Oh really, I also did not know such behavior in dealing with comma-separated header values.
So then it makes us confused about which set or add statements should we use... but your investigation would be correct, thanks

@richardmarshall
Copy link
Collaborator

Oh really, I also did not know such behavior in dealing with comma-separated header values.

Yeah it was quite a surprise when I first saw it. Further experimentation found a number of additional unexpected or currently unaccounted for behaviors. I believe all of the ones I found are handled properly in the proof of concept implementation in #267

So then it makes us confused about which set or add statements should we use...

Don't follow, what would be confusing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] (simulator) set req.http.VALUE = "" should not be [not set]
2 participants