-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add req.filters and bereq.filters #4035
Conversation
6b4bcdd
to
dd2d224
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that instead of breaking so much of the API to enable it for *req.body
we could instead enhance the vdp_ctx
and add a bunch of fields (bo
, hd
and cl
) and let processors check or use what they need from the context (similarly to vrt_ctx
).
For example, the ESI processor could assert(vdc->hd == req->resp)
after grabbing req
and we wouldn't need to change much more.
That should also mean changing this prototype:
void VDP_Init(struct vdp_ctx *, struct *req, struct *bo);
Considering how it currently operates strictly on req
fields it would probably be an improvement anyway. And that would open the pipe case, that I didn't see addressed in this patch series (but this was not a thorough review so don't be mad if pipe is covered).
Now regarding the objcore, we could imagine something like this for VDPs that need it:
struct objcore * VDP_Objcore(struct vdp_ctx *);
It would check whether this is the first member of the stack and grab the right one depending on the calling context.
Overall I'm in favor but I get the feeling that this change unpacks too much of the API instead of keeping as much as relevant contained.
client c1 { | ||
# Start with enough workspace to receive a good result | ||
txreq -hdr "WS: 320" | ||
# Start with enough workspace to receive some good results | ||
client c1 -repeat 5 { | ||
txreq -hdr "Connection: close" | ||
rxresp | ||
expect resp.status == 200 | ||
} -run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe submit that kind of change independently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
@@ -1,6 +1,9 @@ | |||
varnishtest "sweep through tight backend workspace conditions" | |||
|
|||
# XXX almost the same as r01834.vtc - retire? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not if r1834 didn´t catch #2645 at the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r1834 was not originally using the "sweep" idea. Now I have changed it to also use that, so now they are almost the same.
VDP_Push(VRT_CTX, struct vdp_ctx *vdc, struct ws *ws, const struct vdp *vdp, | ||
void *priv) | ||
VDP_Push(VRT_CTX, struct vdp_ctx *vdc, struct ws *ws, | ||
const struct vdp *vdp, void *priv, | ||
struct objcore *oc, struct req *req, | ||
struct http *hd, intmax_t *cl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit pick, the priv
argument tends to be first in callbacks and last otherwise (though not always).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did ponder this question for a bit: The priv
argument is the priv
member from struct vdp_entry
, so it belongs to the vdp
. Thus I kept it next to it.
These questions are, in my mind, answered in the commit message of 0274ebc: We already have (had, from the perspective of the patch) a I think the API should make it clear that VDPs should not mess with a request, a busy object, headers or the content length after init time, thus I think that we should change the init function signature and specifically not add Also please note the following from the commit message regarding the motivation for this change:
I did actually try this for VDPs and ended up with repetitive code - you basically have to repeat the additions to VCL_StackVDP() in each VDP init function. The changed API makes it easier (and, in my mind, clearer) how to write a VDP which works everywhere: You get the headers and the content length, make modifications to both as needed and be done. See for example the vmod_re change, which mostly consists of boilerplate changes to For VDPs which need objcore access or only work on the client side, I believe, the API change also clarifies the situation: The respective init function arguments are present only if the respective access is allowed.
Please look at the commit again, it also does not change much more: Besides the
I think we should work on one question at a time, and if pipe mode requires another API change, then so be it. But does it? |
There is a leak reported by ASAN:
|
109ad13
to
ba78308
Compare
This is not going through a VDP, but it eventually will. Refs varnishcache#4035
ba78308
to
e851210
Compare
I have rebased, squashed the squashable commits and force-pushed . |
Obviously this is a good idea. I have thought about the general shape of the filter-API, and checked some old notes, and I think I come down on the side of putting more stuff into the But as @nigoroll rightly points out, that does carry the risk that all filter implementations will start with long list of asserts, if nothing else to clearly document the situation. Some of that comes down to But some of it can be solved with alias fields, for instance a |
Thank you for the review, @bsdphk . As Dridi and you gave similar feedback on this aspect, I will obviously respect your verdict. Yet one thing bothers me: If we simply added So, to improve API stability and still make it clear how a well behaved filter should work, I would suggest to introduce a Note that any filter which "knows what it is doing" could still use the Side note that in terms of "api stability": I can follow the argument that changing function signatures is particularly painful downstream, and also struct members can more easily be added in a backwards compatible manner. But to achieve full binary compatibility, we would need to start versioning structures and/or add a bitfield with supported features. It was and is my understanding that, with Varnish-Cache, we do not want to take this route. So moving the problem to structs will only help us so far and comes with its own challenges. |
I deliberately said "API" and not "ABI": Recompilation is cheap, stuffing source-code with #ifdef is not. Neither I nor Occam, share your passion for "multiplying entities" and I really do not see what possible benefit a dedicated |
As I said: avoid code duplication. I think all a generic vdp init should be concerned about is:
A less generic vdp is going to want to know
None of these has any relevance once the init function is done. The req can go, it can be taken from |
I'll prepare a diff to look at... |
FWIW my goal was to take your change and make an attempt at implementing it with limited (or if possible no) changes to the function signatures. I haven't had a chance to spend time on this.
FYI we are making progress on a limited initial support for trailers, and we should have something ready in a matter of weeks from the look of it. As per #3809 (comment) we are moving trailers out of the body filtering code. |
In case you missed it, we already had that in #2477. Based on the experience with this implementation, I think we would need to first agree to two prerequisites to not create another dead end:
|
We can have that discussion in #3809 if you want. The short answer is that we want to take this iteratively and not involve VCL support on day 1 to get something useful sooner. |
This commit is to prepare use of the VDP API also for the backend side to filter bereq.body through bereq.filters by adding a struct vrt_init_ctx argument to the VDP init function. For background, see varnishcache#4035
This commit is to prepare use of the VDP API also for the backend side to filter bereq.body through bereq.filters by putting all pointers _intended_ to be used by a VDP init function into vdp_ctx. For background, see varnishcache#4035
Based on #4035 (comment) I revisited this PR and implemented two alternative approaches on changing the VDP API to accommodate usage on the backend side. I would appreciate if reviewers read all of this comment before or while looking at the alternative suggestions, which are: To facilitate reviews and judgement on the alternatives, I would like to reiterate over some aspects already mentioned. There will be some repetition, but I think it might be helpful to recapitulate in a holistic manner. The essence of a filter is to modify a stream of bytes. What follows from this property alone is that the filter might have knowledge about the amount of data it will provide. For example, a filter might know that it will not add or remove any bytes, it might know that it will only return a fixed amount, or it might know that it does not know how many bytes it will return. So it makes sense to give the filter a length integer, which it can inspect and modify. In the context of HTTP and filters, we also have headers sent *1) with the filtered bytes, so the filter should have access to headers to, for example, set the content-encoding or provide range information. Finally, in the context of varnish-cache, a delivery filter might be operating directly on a cache object, in which case it might use metadata to optimize its function. For example, the gunzip filter can use metadata to return the decompressed length. For now, this is what most of our filters need and use. I agree that we might find out in the future that some filters might need more, and we will cater for that. To summarize this part, a VDP's init function should have access to (and, to be generic, limit itself to) using
In my mind, a VDP init function should be called with these three pieces of information prepared, and it should not itself be tasked with having to find out the context in which it is called (that would be to use So far the question which option to prefer seems to be a matter of taste to some extent. I would argue that having a long(er) lived struct containing members which are only meant to be used briefly during init seems not to be the best option, but I can live with the other option, too. Now let's change perspective and look at filters from the varnishd/vrt side:
To simplify use on the client and backend side, either Again, I do not have that strong an opinion, but, from the varnishd/vrt side, the difference between the two options is to either only change What remains is that for all scenarios, we should change the interface anyway and
I hope this helps to make a good decision. *1) or in the case of a VFP, which we do not handle here, stored with the received object |
I somehow missed the additional SLOW_BEREQ getting in (could be that this was during my vacation). I think we should avoid adding behavioural changes by special cased debug flags whenever we have better options, and in this case the better option are filters. Which, in this case, would come with #4035, a PR which ironically now needs additional attention just because of these changes.
This commit is to prepare use of the VDP API also for the backend side to filter bereq.body through bereq.filters by putting all pointers _intended_ to be used by a VDP init function into vdp_ctx. For background, see varnishcache#4035
This commit is to prepare use of the VDP API also for the backend side to filter bereq.body through bereq.filters by putting all pointers _intended_ to be used by a VDP init function into vdp_ctx. For background, see varnishcache#4035
This commit is to prepare use of the VDP API also for the backend side to filter bereq.body through bereq.filters by putting all pointers _intended_ to be used by a VDP init function into vdp_ctx. For background, see varnishcache#4035
This commit is to prepare use of the VDP API also for the backend side to filter bereq.body through bereq.filters by putting all pointers _intended_ to be used by a VDP init function into vdp_ctx. For background, see varnishcache#4035
This commit is to prepare use of the VDP API also for the backend side to filter bereq.body through bereq.filters by putting all pointers _intended_ to be used by a VDP init function into vdp_ctx. For background, see varnishcache#4035
900738b
to
8ebcaca
Compare
I have updated this patch. With #4112 merged, let's see if we have any other controversial bits left. |
@dridi any objections? otherwise bugwash said this should go in next week. |
fcf0ace
to
fa37222
Compare
This change introduces additional failure points to V1F_SendReq() when there is insufficient workspace to push the V1L VDP. We adjust r01834.vtc to cover the respective 503 errors and simplify it by example of r02645.vtc. The two test cases now resemble each other a lot, but I would think, at least for now, their purpose is so important that the overhead does not matter.
fa37222
to
563c738
Compare
Summary
These changes add support for request body filters on the client and backend side. The motivation for this feature will hopefully be obvious: To be able to transform or process "uploads" in varnish-cache without the need for a backend application.
VDPs and VFPs are used here "in reverse" to the existing use on response bodies: the client side fetches the request body, so it uses a VFPs, while the backend side delivers the request body to be backend side, so it uses VDPs.
Please read the individual commit messages. This description is intended so serve as an overview.
Preparation work
This PR is based on some minor changes which were pushed directly to trunk because they do not represent API changes, preparing VDP for use on the backend side, where we do not have a
struct req
:vdp_ctx
esstruct req
from VDPs to where absolutely necessary (client side only VDPs).(struct req).filter_list
for consistencyOBJ_ITER_END
flag to the request bodyobjiterate_f
callOverview of commits
The first commit generalizes the VDP API to make it suitable for use on the backend side. The main change here is to pass, for generic VDPs (those which are suitable for backend side use), the "outgoing" headers and a pointer to a length field, rather than using
struct req
. As is, this change serves clarity and code compaction only, all information is already present in theVRT_CTX
argument, but thecl
argument becomes important later on.The next two commits add VDPs for
bereq.body
without introducing custom VDPs yet. That is, only the http1 VDP is pushed, keeping functionality otherwise. As a consequence of this change, fetches can now also fail for out-of-workspace for VDP.Prepare VCL_StackVDP for backend use changes the implementation to work either on a
struct req
orstruct busyobj
, but not both.Once this groundwork is laid, adding the actual fields and VRT interface for
req.filters
andbereq.filters
is rather trivial. We stack the filters like elsewhere and add a test case.To be done after merge
changelog