-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: max-changes prefer header to limit mutations #2164
Conversation
I wonder whether
thinking a little bit So what is this feature actually about? I guess the idea is to add a safe-guard to a request, for the case that the query string does not describe the "resource" that we actually want to target, well enough. A resource could be a single entity - or a number of entities. And especially in the latter case, the descriptors for that resource, that we use through filters could be wrong. So basically we want to add more information about the resource we're requesting. Basically some more restrictions that are not regular filters. Of course this leads us to So this is basically a way of saying "please change the 5 entities that have ids between X and Y to the following" instead of "please change the entities that have ids between X and Y to the following". I'm now thinking the answer is somewhere in the query string. We don't need to support any nesting here, because those mutations only apply to the top-level resource. This reminds me of one my ideas in #2066:
This also makes sense somehow, because the So maybe something like: PATCH /people;1?uid=eq.123
PATCH /people;0-3?age=gt.100 The first one would be an exact match, much like Edit:
This would leave it open for us to use path parameters of the form I think we can always require the lower boundary, because 0 can be set explicitly. The "at least X" reads really well as |
Hm, on #2156 (comment) we agreed that using
Yeah, the Expect header has been useless for some time now because it was confirmed Nginx doesn't proxy pass it #748 (comment). |
Historically(
This blog post also mentions a path segment can be used like that. Do note that the RFC says "applications often use", which means it's taking inspiration from previous cases. Have you ever seen an API that uses parameters in a path segment? I have not and I doubt anyone would call that design "elegant", I think it breaks clients expectations(and perhaps some libraries/tools as well).
If this is about consistency, then diverging from not using path parameters is much more inconsistent. I also think |
One such case might be OpenAPI, I don't see a way on their docs to specify a path segment parameter. Just found this comment that indicates this style is legacy. |
Yes, I agreed with you that throwing errors on |
It does not say the style is legacy. It does say the API is legacy (and therefore they can't change it) and it uses those parameters. |
Ah, I've also found the Perhaps we could enforce its presence for |
The link you gave lead me to https://swagger.io/docs/specification/describing-parameters/#path-parameters. This specifically mentions a "simple" style:
|
This is missing the point of my concern. My point is not about strict vs lax handling. I think the response for this kind of request should always be to error out, i.e. strict handling. As mentioned above, this is about "what happens when the server does not understand the prefer header?". This is about a safe-guard. So it must provide safety. |
As mentioned in the other comments, this is not about consistency. But anyway: "diverging from not using path parameters"? I have not seen anything anywhere that says we are avoiding path parameters on purpose. This is just not the case. They have not been used, yet, because there was no good use for them. And they are not the same as path segments, so the example regarding nested routes does not apply here either. |
Hm, wait, I don' see the point here. Say you're specifying a |
Hm. Yes, nice example. I would agree - it is the same problem. Although, hopefully, less likely to occur, because that feature is meant for things like unit testing the api etc. - while the feature we are discussing here is for production use. But yes, in hindsight, this has the same problem. If I knew back then what I know now, I might have looked for a different approach. Not sure whether I would have found one or not. |
If anything, perhaps we should have handled |
No, that's not an option. The RFC specifically says:
|
I understand you might be opposed to using path parameters here, because it's:
But think about how filters define the requested resource by imposing limitations on the set of available entities. In this way, adding those further limitations directly to the request URI makes a ton of sense. Whether that ends up the suggested syntax in the path segment or via
tbh, I think it is elegant. It reads nicely. It uses more of what the URI spec allows us to do. It's consistent and conceptually sound. Yeah - that's elegant. |
It's just more complexity for the users... having more ways to do the same thing...
The tx=rollback is already being used by Supabase users, once the feature it's out you never know how the feature might be exploited.
That's why we have the
It reminds me of my critique against Odata here, those
Yeah, a new query string param would be more consistent. However I don't see any problem with the new Prefer, considering the |
Messing with the Prefer strict semantics is still more complex(users/us) than a query param. I think another option worth considering is just reusing the |
I've been looking at some additions to SQL for DELETE. I've found that for example MySQL has DELETE LIMIT and SQL Server has DELETE TOP(you can even specify a percentage of rows there). Nothing for failing in case a threshold is exceeded though. However, SQL Server also has the OPTION clause, could we use the same idea for our own options? |
So perhaps we can say: DELETE /projects?options=max-changes.10 Some other options could be added later like |
Another option might be borrowing ROWCOUNT( DELETE /projects?rowcount=lte.10 It's a really special case though, because we'd fail if the filter doesn't pass. |
Those seem like the path forward for now(header or new query param). In general, I think a new URI syntax(whether adding nested routes or path segments paremeters) needs much more discussion(like on #2066) and consensus. I don't see myself trying to add new syntax here. |
Well, my point is that there should be only the way I suggested. So not "more ways" to do the same thing.
I wouldn't say exploited. They might have a proper use-case. Although I can't tell from the PR, whether they plan to use it in production or in unit testing. I didn't mean "unit testing in the postgrest repo". I meant "unit testing your api".
Yeah, it's cool to know that my Preference was applied. Or not. And my transaction was not rolled back. And data is lost. :D So really,
Two drawbacks:
Although I can see a limited (pun intended) feature set work with
This would be somehow consistent. But it would be very limited, because:
You can work around those issues by adding I'm very opposed to overload the
Implementing batching for
I don't understand this.
I really don't want to make
And I wouldn't want to introduce any other keyword for the query string, if we don't have to. Those are all breaking changes.
Yeah, that would be a no-go, too - because the syntax is really way too similar to regular filters. That would be verrry confusing.
My conclusion is different.
I fully agree. We need discussion and consensus. But is there any rush in implementing this as quickly as this is happening right now? The issue for this was opened only a few days ago. The only thing a quick implementation is doing is to kill all discussion. Because once implemented via query string or header there is no way back as users start using things and we don't want to force so many breaking changes. This reminds me of #1949 (comment). I kind of gave up on the discussion at one point. And now we have Ok, given: The On a meta-level: We don't really have any project policy for making decisions / reaching consensus. If it's the two of us discussing and we have a different opinion and can't reach consensus.. there's nothing really saying how to move forward. Except of course to say: You're the maintainer, so you'll have to make a decision. No matter which policy it is, I think it would be helpful to state it somewhere. So I'd know what to expect ;) |
Oh, no, I don't want to be taken as that I have the "last word". How about if the policy is: whenever there's no agreement, the decision must be subject to a voting from the PostgREST's org members. Then we just call to voting in a comment(do 👍 or 👎). Might take a while of course, since all the team members have to respond, a time limit could be set as well. Hopefully there's a majority in such cases so we can move forward, if not, we'd have to just stall a feature/fix I guess. |
On this particular case though, I think we'd have to ping begriffs so he can chime in on the REST syntax. For me, I just see elegance/simplicity in how he originally designed it(despite we know now that Now, if there would be no time limit for this discussion. I think the right way to do it would be to propose a new header to IANA, a replacement for |
Thanks for sharing that, I didn't know it, yet. I think PostgREST has deviated from those original ideas a lot by now, so I don't see that as something that should stop evolving this. One thing however I can't fail to notice: The idea about schema based versioning including fall-through is so nice, that I can't believe it's done anymore. Was it never implemented or was it changed somehow?
That would be the answer to the But even with that kind of header, I still think those max-rows settings should not be part of a header. They need to be part of the URI, because they describe which resource this request is targeting. I think the concept of what "resources" are in the http/rest context is still not appreciated enough in postgrest, so far. So, say we have a table of items as we do in our test data: CREATE TABLE items AS
SELECT generate_series(1,10) AS id; If we map that to http, we have our endpoint:
This will return something like This "list of 10 items" is one possible resource that we can return from the api. Another resource would be a list of 5 items, e.g. So basically:
That means, all of the following are different resources:
A URI as the Uniform Resource Identifier describes which of those resources you want to target your request at. Now assume you have a query like the following: DELETE /items?id=gt.5 You assume this represents the resource But you want to safeguard against that, so you need to specify some more restrictions to the URI. So you say: DELETE /5items?id=gt.5 This is not meant to represent the syntax I proposed but just a different URI with a different meaning: that the resource we are requesting is described by length 5. What would be the answer to this query? A This is HTTP/REST. But using a |
Makes a lot of sense, seems that mechanism could be useful for GETs as well. I think this concept now fits exactly with the GET /items?id=gt.5
[6,7,8,9,10,11] GET /items?id=gt.5&row_count=eq.5
404 DELETE /items?id=gt.5&row_count=eq.5
404
I think
Edit: I do realize |
Btw, I gave it second thought and I'm not opposed to changing our REST syntax(adding path parameters). What I think is not right is to change it abruptly(on a new feature). We should at least document our REST syntax v1 and then introduce path parameters and the ideas in #2066, ideally providing back compat(also mentioned in #2066).
Exactly, that's why I think we should document REST syntax v1 first. I've always had that rule in my mind(while reviewing/adding features/fixes) and likely all users and client libraries have come to expect a syntax from us as well. |
I thought about this a little bit more, and even before your latest comments - which are well in line with what I am proposing now - I came to the following conclusion: I actually suggested two separate things, namely:
The second suggestion is not something that we can solve for just this specific case, but we need to discuss it in a bigger context. The basic question is "how many reserved keywords can we remove from the querystring?". If we were to start from scratch, the best answer would be "all" - so that we could use arbitrary names for filtering. But this is for sure better discussed in #2066 - and not here. For this PR my main point is about headers vs URI. I have thought about this a little bit more. Our current usage of URI features and syntax certainly means we should implement this as a query string parameter - this is how we do those things right now. Even though I don't know any specific examples right now, I can see this general concept of "higher level filters", so to say, being extended in the future with more than just I thought a little bit more of what that "higher level filters" concept is about here and I came up with the following. Our SQL queries have a general simplified structure of: SELECT json_agg(<select>)
FROM <endpoint>
WHERE <filters>; I.e. we have an aggregation at the end, and before that we have our regular filters. I think it would help to think about those new filters like SELECT json_agg(<select>)
FROM <endpoint>
WHERE <filters>
HAVING count(*) BETWEEN x AND y; In fact, this very much looks like a query similar to what we need for $subject. Maybe we can find a syntax for this, that would allow extending it to more general aggregation scenarios and TLDR: I think the correct solution to $subject is actually aggregation +
This is a much better argument for I was about to agree to
Well, we do have other non-filter query string parameters, too - e.g. I think we have basically two options of implementing the
I feel we should discuss the details of that in #915. Then we can come back and implement something that looks the same for now in this PR. |
Also, another thing we lose with DELETE /users?status=eq.inactive&row_count=gt.20
DELETE /users?status=eq.inactive&row_count=eq.20
DELETE /users?status=eq.inactive&row_count=lt.20 So with that we can say delete "at most"/"at least"/exactly this amount of rows. With path parameters I guess we could do: GET /projects.row_count=gt.20?status=eq.inactive But again that form is just the same we already have with our query parameters. |
Ok, so let's step back to the GET /projects?when.row_count=lt.5 It seems more clear than doing Then we could have the |
Namespace-wise, this is better than using a plain However I don't see why we should use a renaming construct here? Introducing an arbitrary mapping makes it really hard to possibly extend this to using arbitrary aggregation functions later on. The function used is called
That would be very nice actually. It looks the same as a query parameter, because it's a filter. But there is a difference: Query parameters are optional, in the sense that not matching them will still return a response and not an error. Path parameters are required. This maps very nicely to the fact that query parameters are put after the Something like: GET /projects!count=gt.20?status=eq.inactive
This also matches up very nicely with embedding hints: When you specify a
In a sense a "required filter" for an embedding would turn the embedding into an |
One more thing to note here: Using If we were ok to introduce a breaking change, we could do the following:
In this case using a mapping of If we do it like this, I'd be fine with using However, tbh: I like the
|
The path parameters have the advantage of not colliding with another namespace as well, unlike the However I still feel this choice will give us trouble in the future. Like when we chose making the For example, on curl you can use the curl -X DELETE "localhost:3000/limited_update_items" -G \
-d order=id \
-d limit=1 There's no option for doing this with the new path parameters. Likely other tools/libraries also have facilities for dealing with regular query parameters as well. I have another proposal that would solve the Prefer problem(older PostgREST versions will err on it and not just ignore it), the namespacing ones and wouldn't require a new syntax. Basically use another vendored media type: Accept: application/vnd.pgrst.mutation+json;rowcount=lte.50 The media type parameter already has our namespace, so it doesn't conflict with existing embeds or columns. Also older PostgREST versions will just fail if this media type is sent. The existing Accept: application/vnd.pgrst.mutation+json;rowcount=eq.1 This namespace would also allow us to set a So far I think this is the most consistent way of adding this feature. @wolfgangwalther WDYT? |
That's because I don't think
Could you elaborate on that? I don't see a problem, yet, with the path parameter syntax and what we've discussed around this matter.
Hm. However, tools/library support for mimetype parameters in the accept header is even worse, right?
Hm. It is consistent with what was previously done with
Uh, mh. I mean But what does Even worse: When you do prefer not to return a representation, which is especially likely on a In terms of HTTP semantics, the path parameter is really the right place to put it. |
Yeah, noticed that as well for the path. Then I think the cost only makes sense as a header, seems it'd fit as a Prefer.
On #1417 (comment), we concluded that
How about a different name then: Accept: application/vnd.pgrst.array+json;size=lte.50 Then it does describe the shape of the body. I was thinking of |
Hm. Assume we have no cost limit by default and want to enforce one from the client-side: In this case, we'll just have the same problem with the But if we think about cost the other way around: Let's say we have a rather low server-side default limit for cost ("soft-limit"). And then we send a
Yeah, that would be very consistent with how we use
Wait, what? A generic mimetype that would return either json or csv? How, if not by mimetype, would I then even tell PostgREST to return json or csv? A mimetype should describe the output format. The csv argument is a very good one: We should not do this via mimetype. Otherwise, we'd need a special mimetype for each kind of response. And suddenly it will be very hard to support this for custom mimetypes. Why should this feature not be available, once we support something like |
Isn't that possible by having suffixes in the mime type like the https://datatracker.ietf.org/doc/html/draft-ietf-appsawg-media-type-suffix-regs-02#section-3.1
So with the above we could do json and csv with: Accept: application/vnd.pgrst.rows+json;rowcount=lte.50
Accept: application/vnd.pgrst.rows+csv;rowcount=lte.50 |
The iana registered media types seem pretty liberal. For example, the subtype of this application/conference-info+xml doesn't define a structure on his own but the suffix does. It would be the same case with our |
Though I think we need the above as well. We discussed this somewhere else but the |
But how would you do that with custom mimetypes that we don't handle out of the box but that could be supported by #1582? This will limit the ability to use custom mimetypes heavily.
This is not a requirement for |
Hm, considering all the alternatives here, I think we shouldn't let the the perfect be enemy of the good. The PATCH /tbl
Prefer: rowcount=eq.0, rowcount=lt.50 (We need to make sure the value syntax accepts two row counts, otherwise we'd need to collapse into a single val) Related #2343 (comment) |
This partially reverts PostgREST#1257 / PostgREST#1272 / 5535317 where the 404 was introduced. A 406 error is still returned when requesting a single object via accept header. Returning an error when no rows are changed can be introduced through a different syntax again, see the discussion in PostgREST#2164. Fixes PostgREST#2343 Signed-off-by: Wolfgang Walther <[email protected]>
This partially reverts PostgREST#1257 / PostgREST#1272 / 5535317 where the 404 was introduced. A 406 error is still returned when requesting a single object via accept header. Returning an error when no rows are changed can be introduced through a different syntax again, see the discussion in PostgREST#2164. Fixes PostgREST#2343 Signed-off-by: Wolfgang Walther <[email protected]>
This partially reverts #1257 / #1272 / 5535317 where the 404 was introduced. A 406 error is still returned when requesting a single object via accept header. Returning an error when no rows are changed can be introduced through a different syntax again, see the discussion in #2164. Fixes #2343 Signed-off-by: Wolfgang Walther <[email protected]>
Taking the dollar namespace idea on #2125 (comment)
How about if we do it like this: GET /tbl?$rowcount=eq.3 Since pg identifiers cannot start with a In this case I wouldn't be opposed to a namespace like GET /tbl?$when.rowcount=eq.3
@wolfgangwalther WDYT? No more debate on the |
The above still seems a bit unwieldy to me. Really the lack of the If Kong can introduce such non-standard headers as It could be called Personally I'd vote for (Maybe It's introduction it's pretty much justified, as the standard |
I still think this was the best idea:
|
Closes #2156. Basically:
Tasks