Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: max-changes prefer header to limit mutations #2164

Closed

Conversation

steve-chavez
Copy link
Member

@steve-chavez steve-chavez commented Feb 10, 2022

Closes #2156. Basically:

http PATCH localhost:3000/projects "Prefer: max-changes=4" <<JSON
{"name": "PopOS"}
JSON

HTTP/1.1 400 Bad Request

{
    "details": "Results contain 57 rows changed but the maximum number allowed is 4",
    "message": "The maximum number of rows changed per operation was surpassed"
}

Tasks

  • Decide whether a header or query parameter will be used
  • max-changes for PATCH
  • max-changes for DELETE
  • max-changes for POST?
  • Docs

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Feb 10, 2022

I wonder whether Prefer is really the best choice here. For everything else we do with Prefer, there is not much harm done when the server doesn't understand / use it. But here, we really want the server to not proceed, when it doesn't understand the request. It's missing the "optional" part, for which Prefer was designed.

The Prefer request header field is used to indicate that particular
server behaviors are preferred by the client but are not required for
successful completion of the request. Prefer is similar in nature to
the Expect header field defined by Section 6.1.2 of [RFC7231] with
the exception that servers are allowed to ignore stated preferences.

Unfortunately I don't have a better way to express this right now. Expect headers have the "requiredness" factor - but they require each hop (proxy) to understand those headers. So that won't ever work.


thinking a little bit

So what is this feature actually about?

I guess the idea is to add a safe-guard to a request, for the case that the query string does not describe the "resource" that we actually want to target, well enough.

A resource could be a single entity - or a number of entities. And especially in the latter case, the descriptors for that resource, that we use through filters could be wrong. So basically we want to add more information about the resource we're requesting. Basically some more restrictions that are not regular filters.

Of course this leads us to ?limit directly, but we already discussed that in the issue: It's too close to both SQL LIMIT and the Range header semantically, which is something that this is not about.

So this is basically a way of saying "please change the 5 entities that have ids between X and Y to the following" instead of "please change the entities that have ids between X and Y to the following".

I'm now thinking the answer is somewhere in the query string.

We don't need to support any nesting here, because those mutations only apply to the top-level resource.

This reminds me of one my ideas in #2066:

Change the Accept: application/vnd.pgrst.object+json way to request a single resource to a path parameter something like this:

GET /people;single?uid.eq=123
GET /people;one?uid.eq=123

This also makes sense somehow, because the application/vnd.pgrst.object+json currently does something very similar - it rejects every request that would affect fewer or more than 1 row, right?

So maybe something like:

PATCH /people;1?uid=eq.123
PATCH /people;0-3?age=gt.100

The first one would be an exact match, much like application/vnd.pgrst.object+json - but more flexible. The second one would be a range of acceptable number of rows.


Edit:
Or maybe even more naturally readable:

# exactly 1
PATCH /1,people?uid=eq.123
# between 1 and 3
PATCH /1-3,people?age=gt.100
# up to 3
PATCH /0-3,people?age=gt.100
# at least 1
PATCH /1+,people?age=gt.100
# no limit
PATCH /people?age=gt.100

This would leave it open for us to use path parameters of the form ...;name=value?... later on, if we need to.

I think we can always require the lower boundary, because 0 can be set explicitly. The "at least X" reads really well as x+ - and that's not a problem. Spaces should only be encoded as + in the query string, not in the path segment. Afaik, we can safely use + in the path.

@steve-chavez
Copy link
Member Author

Hm, on #2156 (comment) we agreed that using Prefer for rejecting requests was fine, since the handling=strict in the RFC already mentions it can be used for that.

Expect headers have the "requiredness" factor - but they require each hop (proxy) to understand those headers. So that won't ever work

Yeah, the Expect header has been useless for some time now because it was confirmed Nginx doesn't proxy pass it #748 (comment).

@steve-chavez
Copy link
Member Author

PATCH /people;1?uid=eq.123
PATCH /1-3,people?age=gt.100

Historically(Why not provide nested routes?..) we have frowned upon using path parameters, I'm surprised to see the RFC containing a section on using "," and ";" on a path segment:

URI producing applications
often use the reserved characters allowed in a segment to delimit
scheme-specific or dereference-handler-specific subcomponents. For
example, the semicolon (";") and equals ("=") reserved characters are
often used to delimit parameters and parameter values applicable to
that segment. The comma (",") reserved character is often used for
similar purposes. For example, one URI producer might use a segment
such as "name;v=1.1" to indicate a reference to version 1.1 of
"name", whereas another might use a segment such as "name,1.1" to
indicate the same.

This blog post also mentions a path segment can be used like that.

Do note that the RFC says "applications often use", which means it's taking inspiration from previous cases. Have you ever seen an API that uses parameters in a path segment? I have not and I doubt anyone would call that design "elegant", I think it breaks clients expectations(and perhaps some libraries/tools as well).

I wonder whether Prefer is really the best choice here. For everything else we do with Prefer, there is not much harm done when the server doesn't understand / use it.

If this is about consistency, then diverging from not using path parameters is much more inconsistent.

I also think Prefer is not ideal, but unless we invent a new header, I don't see any other better way around it.

@steve-chavez
Copy link
Member Author

and perhaps some libraries/tools as well

One such case might be OpenAPI, I don't see a way on their docs to specify a path segment parameter.

Just found this comment that indicates this style is legacy.

@wolfgangwalther
Copy link
Member

Hm, on #2156 (comment) we agreed that using Prefer for rejecting requests was fine, since the handling=strict in the RFC already mentions it can be used for that.

Yes, I agreed with you that throwing errors on Prefer is fine. It still is. This is not the point here, though. The point here is, that when you make a request with this header, you expect the server to error out when it does not support this prefer header. I.e. assume you are making such a request to an older PostgREST version, because you read about the feature in the docs. You don't want to have PostgREST silently ignore this, because then you are at risk of loosing data.

@wolfgangwalther
Copy link
Member

Just found this comment that indicates this style is legacy.

It does not say the style is legacy. It does say the API is legacy (and therefore they can't change it) and it uses those parameters.

@steve-chavez
Copy link
Member Author

Ah, I've also found the handling=strict suggested in that old thread.

Perhaps we could enforce its presence for max-changes to work. So Prefer: max-changes=20, handling=strict would err as proposed here but Prefer: max-changes=20, handling=lenient(or just Prefer: max-changes=20) would always make the request succeed and ignore max-changes.

@wolfgangwalther
Copy link
Member

One such case might be OpenAPI, I don't see a way on their docs to specify a path segment parameter.

The link you gave lead me to https://swagger.io/docs/specification/describing-parameters/#path-parameters.

This specifically mentions a "simple" style:

simple-style – comma-delimited, such as /users/12,34,56

@wolfgangwalther
Copy link
Member

Ah, I've also found the handling=strict suggested in that old thread.

Perhaps we could enforce its presence for max-changes to work. So Prefer: max-changes=20, handling=strict would err as proposed here but Prefer: max-changes=20, handling=lenient(or just Prefer: max-changes=20) would always make the request succeed and ignore max-changes.

This is missing the point of my concern. My point is not about strict vs lax handling. I think the response for this kind of request should always be to error out, i.e. strict handling.

As mentioned above, this is about "what happens when the server does not understand the prefer header?".

This is about a safe-guard. So it must provide safety.

@wolfgangwalther
Copy link
Member

If this is about consistency, then diverging from not using path parameters is much more inconsistent.

As mentioned in the other comments, this is not about consistency. But anyway: "diverging from not using path parameters"?

I have not seen anything anywhere that says we are avoiding path parameters on purpose. This is just not the case. They have not been used, yet, because there was no good use for them. And they are not the same as path segments, so the example regarding nested routes does not apply here either.

@steve-chavez
Copy link
Member Author

you expect the server to error out when it does not support this prefer header. I.e. assume you are making such a request to an older PostgREST version, because you read about the feature in the docs. You don't want to have PostgREST silently ignore this, because then you are at risk of loosing data.

Hm, wait, I don' see the point here. Say you're specifying a Prefer: tx=rollback on an old postgrest, it just ignores it and the client will lose data. Isn't that the same problem?

@wolfgangwalther
Copy link
Member

Hm, wait, I don' see the point here. Say you're specifying a Prefer: tx=rollback on an old postgrest, it just ignores it and the client will lose data. Isn't that the same problem?

Hm. Yes, nice example. I would agree - it is the same problem. Although, hopefully, less likely to occur, because that feature is meant for things like unit testing the api etc. - while the feature we are discussing here is for production use.

But yes, in hindsight, this has the same problem. If I knew back then what I know now, I might have looked for a different approach. Not sure whether I would have found one or not.

@steve-chavez
Copy link
Member Author

If anything, perhaps we should have handled handling=strict since the beginning. So Prefer: unknown=val, handling=strict will always fail. That would solve the problem, we could do it from now.

@wolfgangwalther
Copy link
Member

If anything, perhaps we should have handled handling=strict since the beginning. So Prefer: unknown=val, handling=strict will always fail. That would solve the problem, we could do it from now.

No, that's not an option. The RFC specifically says:

A server that does not recognize or is unable to comply with
particular preference tokens in the Prefer header field of a request
MUST ignore those tokens and continue processing instead of signaling
an error.

@wolfgangwalther
Copy link
Member

I understand you might be opposed to using path parameters here, because it's:

  • something new, we haven't used before, so more thought needs to go into it to make sure we're not doing anything wrong
  • more effort to implement, because we'd need to extend the parsers
  • looks unfamiliar

But think about how filters define the requested resource by imposing limitations on the set of available entities. In this way, adding those further limitations directly to the request URI makes a ton of sense.

Whether that ends up the suggested syntax in the path segment or via ;max-updated=123?... parameters, or directly in the query string, or whatever is another story. I wouldn't want to put them into the query string, wasting even more keywords there, though - but that would be a similarly consistent approach.


I have not and I doubt anyone would call that design "elegant

tbh, I think it is elegant.

It reads nicely. It uses more of what the URI spec allows us to do. It's consistent and conceptually sound. Yeah - that's elegant.

@steve-chavez
Copy link
Member Author

I understand you might be opposed to using path parameters here, because it's:

It's just more complexity for the users... having more ways to do the same thing...

Although, hopefully, less likely to occur, because that feature is meant for things like unit testing the api etc. - while the feature we are discussing here is for production use.

The tx=rollback is already being used by Supabase users, once the feature it's out you never know how the feature might be exploited.

Say you're specifying a Prefer: tx=rollback on an old postgrest, it just ignores it and the client will lose data. Isn't that the same problem?
Hm. Yes, nice example. I would agree - it is the same problem.

That's why we have the Preference-Applied on the response right? I don't see an issue with old versions not supporting new Prefer headers then. We can have a more detailed section on the docs about "Preferences".

tbh, I think it is elegant.
It reads nicely. It uses more of what the URI spec allows us to do. It's consistent and conceptually sound. Yeah - that's elegant.

It reminds me of my critique against Odata here, those ;max-updated=123 are more unreadable/unfamiliar in URLs than the $ symbols. But I guess the point here is that this is subjective, so no need to discuss it further.

I wouldn't want to put them into the query string, wasting even more keywords there, though - but that would be a similarly consistent approach but that would be a similarly consistent approach.

Yeah, a new query string param would be more consistent. However I don't see any problem with the new Prefer, considering the Preference-Applied and all.

@steve-chavez
Copy link
Member Author

Yeah, a new query string param would be more consistent. However I don't see any problem with the new Prefer, considering the Preference-Applied and all.

Messing with the Prefer strict semantics is still more complex(users/us) than a query param.

I think another option worth considering is just reusing the ?limit query param for this. What would be the main drawback?

@steve-chavez
Copy link
Member Author

I've been looking at some additions to SQL for DELETE. I've found that for example MySQL has DELETE LIMIT and SQL Server has DELETE TOP(you can even specify a percentage of rows there).

Nothing for failing in case a threshold is exceeded though. However, SQL Server also has the OPTION clause, could we use the same idea for our own options?

@steve-chavez
Copy link
Member Author

So perhaps we can say:

DELETE /projects?options=max-changes.10

Some other options could be added later like ?options=one.option,another.here.

@steve-chavez
Copy link
Member Author

Another option might be borrowing ROWCOUNT(number of rows affected or read. Rows may or may not be sent to the client) from SQL Server, this has less chance of conflicting with an existing column name. So we could say:

DELETE /projects?rowcount=lte.10

It's a really special case though, because we'd fail if the filter doesn't pass.

@steve-chavez
Copy link
Member Author

Those seem like the path forward for now(header or new query param). In general, I think a new URI syntax(whether adding nested routes or path segments paremeters) needs much more discussion(like on #2066) and consensus. I don't see myself trying to add new syntax here.

@wolfgangwalther
Copy link
Member

It's just more complexity for the users... having more ways to do the same thing...

Well, my point is that there should be only the way I suggested. So not "more ways" to do the same thing.

The tx=rollback is already being used by Supabase users, once the feature it's out you never know how the feature might be exploited.

I wouldn't say exploited. They might have a proper use-case. Although I can't tell from the PR, whether they plan to use it in production or in unit testing. I didn't mean "unit testing in the postgrest repo". I meant "unit testing your api".

That's why we have the Preference-Applied on the response right? I don't see an issue with old versions not supporting new Prefer headers then. We can have a more detailed section on the docs about "Preferences".

Yeah, it's cool to know that my Preference was applied. Or not. And my transaction was not rolled back. And data is lost. :D

So really, Preference-Applied does not help the fact that Prefer: tx=rollback was a bad choice.

I think another option worth considering is just reusing the ?limit query param for this. What would be the main drawback?

Two drawbacks:

  • Returning an error is very different from the SQL semantics of LIMIT.
  • This would break the "Range headers and ?limit do the same thing" concept we currently have.

?limit is basically a range header on steroids, because it can be used for nested embeddings. This is where range headers are currently not enough. Overloading limit with even more things... I don't like it.

Although I can see a limited (pun intended) feature set work with ?limit. Let's say:

  • limit works the way it does for read requests now (i.e. no errors, but fewer rows)
  • limit works the way you propose Prefer: max-changed to work for mutation requests (i.e. throws error or succeeds in full)

This would be somehow consistent. But it would be very limited, because:

  • you can't use the max-rows feature for read requests ("give me this number of rows, or return an error")
  • you can't use exact or minimum values

You can work around those issues by adding Prefer headers for "limit handling". You already suggested so. This would be consistent, iff we make the default the "best-case" scenario, so that "not applied preference headers" are no harm. That would not be the case if limit meant max-rows. That would only be possible if limit meant "batching" (as suggested elsewhere) by default.

I'm very opposed to overload the limit parameter with that many things. This makes things really complicated to deal with for the user.

I've been looking at some additions to SQL for DELETE. I've found that for example MySQL has DELETE LIMIT and SQL Server has DELETE TOP(you can even specify a percentage of rows there).

Implementing batching for limit is something that I could see working. We would be giving up some of the "limit is the same thing as Range headers" concept - but since this is only a thing for mutation requests and we can't use Range headers there anyway, that would still be consistent overall. I also don't see many more features that would be required here, so I think the limit syntax is enough to deal with that.

Nothing for failing in case a threshold is exceeded though. However, SQL Server also has the OPTION clause, could we use the same idea for our own options?

I don't understand this. OPTIONS seems to be about query hints? This is something that PostgreSQL explicitly decided not to implement, afaik. So I don't see a relevance to the discussion here.

Some other options could be added later like ?options=one.option,another.here.

I really don't want to make options a keyword in the query string. It's very likely that people use that for a column name.

So we could say:

DELETE /projects?rowcount=lte.10

And I wouldn't want to introduce any other keyword for the query string, if we don't have to. Those are all breaking changes.

It's a really special case though, because we'd fail if the filter doesn't pass.

Yeah, that would be a no-go, too - because the syntax is really way too similar to regular filters. That would be verrry confusing.

Those seem like the path forward for now(header or new query param).

My conclusion is different.

In general, I think a new URI syntax(whether adding nested routes or path segments paremeters) needs much more discussion(like on #2066) and consensus. I don't see myself trying to add new syntax here.

I fully agree. We need discussion and consensus. But is there any rush in implementing this as quickly as this is happening right now?

The issue for this was opened only a few days ago.

The only thing a quick implementation is doing is to kill all discussion. Because once implemented via query string or header there is no way back as users start using things and we don't want to force so many breaking changes.

This reminds me of #1949 (comment). I kind of gave up on the discussion at one point. And now we have !inner - and only a short time later we discovered, that this could be much better expressed as a table=exists filter. The PR was implemented in 2 1/2 weeks, so I don't think we need to rush here in a couple of days.

Ok, given: The !inner syntax was proposed a few months earlier in one comment. Even though others certainly had the chance to discuss it (and I just didn't take my chance early enough), we still can't say there was consensus. And... I proposed #2066 earlier, too. So the suggestion I'm making now, is not completely out of the blue.


On a meta-level: We don't really have any project policy for making decisions / reaching consensus. If it's the two of us discussing and we have a different opinion and can't reach consensus.. there's nothing really saying how to move forward. Except of course to say: You're the maintainer, so you'll have to make a decision. No matter which policy it is, I think it would be helpful to state it somewhere. So I'd know what to expect ;)

@steve-chavez
Copy link
Member Author

On a meta-level: We don't really have any project policy for making decisions / reaching consensus. Except of course to say: You're the maintainer, so you'll have to make a decision. No matter which policy it is, I think it would be helpful to state it somewhere.

Oh, no, I don't want to be taken as that I have the "last word". How about if the policy is: whenever there's no agreement, the decision must be subject to a voting from the PostgREST's org members.

Then we just call to voting in a comment(do 👍 or 👎). Might take a while of course, since all the team members have to respond, a time limit could be set as well. Hopefully there's a majority in such cases so we can move forward, if not, we'd have to just stall a feature/fix I guess.

@steve-chavez
Copy link
Member Author

steve-chavez commented Feb 10, 2022

On this particular case though, I think we'd have to ping begriffs so he can chime in on the REST syntax. For me, I just see elegance/simplicity in how he originally designed it(despite we know now that id=eq.val should be changed) it and adding path parameters just adds complexity, for both users and us. That's why I'm opposed to path parameters.

Now, if there would be no time limit for this discussion. I think the right way to do it would be to propose a new header to IANA, a replacement for Expect that doesn't ignore values as the Prefer header <- which is the main opposition to the Prefer: max-changes.

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Feb 10, 2022

For me, I just see elegance/simplicity in how he originally designed it(despite we know now that id=eq.val should be changed) it and adding path parameters just adds complexity, for both users and us.

Thanks for sharing that, I didn't know it, yet. I think PostgREST has deviated from those original ideas a lot by now, so I don't see that as something that should stop evolving this.

One thing however I can't fail to notice: The idea about schema based versioning including fall-through is so nice, that I can't believe it's done anymore. Was it never implemented or was it changed somehow?


Now, if there would be no time limit for this discussion. I think the right way to do it would be to propose a new header to IANA, a replacement for Expect that doesn't ignore values as the Prefer header <- which is the main opposition to the Prefer: max-changes.

That would be the answer to the tx=rollback thing for sure.

But even with that kind of header, I still think those max-rows settings should not be part of a header. They need to be part of the URI, because they describe which resource this request is targeting. I think the concept of what "resources" are in the http/rest context is still not appreciated enough in postgrest, so far.

So, say we have a table of items as we do in our test data:

CREATE TABLE items AS
SELECT generate_series(1,10) AS id;

If we map that to http, we have our endpoint:

GET /items

This will return something like [1,2,3,4,5,6,7,8,9,10] (slightly different format for simplicity).

This "list of 10 items" is one possible resource that we can return from the api. Another resource would be a list of 5 items, e.g. [1,2,3,4,5]. Or a single items as 1. All of those are different resources.

So basically:

  • each row is an entity
  • possible resources are all possible combinations of entities

That means, all of the following are different resources:

  • [1,2,3,4,5]
  • [5,4,3,2,1]
  • [1]
  • 1

A URI as the Uniform Resource Identifier describes which of those resources you want to target your request at.

Now assume you have a query like the following:

DELETE /items?id=gt.5

You assume this represents the resource [6,7,8,9,10]. But you didn't know that somebody else added some more items, so the same query now represents a different resource instead: [6,7,8,9,10,11]. Yeah URIs change, that's fine.

But you want to safeguard against that, so you need to specify some more restrictions to the URI. So you say:

DELETE /5items?id=gt.5

This is not meant to represent the syntax I proposed but just a different URI with a different meaning: that the resource we are requesting is described by length 5.

What would be the answer to this query? A 404 Not Found of course, because this resource does not exist anymore. There are resources /6items?id=gt.6 and /5items?id=gt.5&id=lte.10 - but not the one we requested.

This is HTTP/REST.

But using a Require: max-changed=5 header is not. This is not HTTP/REST, but basically a remote procedure call, since your telling the server what to do instead. This is just wrong.

@steve-chavez
Copy link
Member Author

steve-chavez commented Feb 14, 2022

You assume this represents the resource [6,7,8,9,10]. But you didn't know that somebody else added some more items, so the same query now represents a different resource instead: [6,7,8,9,10,11]. Yeah URIs change, that's fine.
DELETE /5items?id=gt.5
This is not meant to represent the syntax I proposed but just a different URI with a different meaning: that the resource we are requesting is described by length 5.
What would be the answer to this query? A 404 Not Found of course, because this resource does not exist anymore. There are resources /6items?id=gt.6 and /5items?id=gt.5&id=lte.10 - but not the one we requested.

Makes a lot of sense, seems that mechanism could be useful for GETs as well. I think this concept now fits exactly with the row_count I mentioned above.

GET /items?id=gt.5
[6,7,8,9,10,11]
GET /items?id=gt.5&row_count=eq.5
404
DELETE /items?id=gt.5&row_count=eq.5
404

And I wouldn't want to introduce any other keyword for the query string, if we don't have to. Those are all breaking changes.

I think row_count is unlikely to cause a breaking change, note that it's already used here.

Yeah, that would be a no-go, too - because the syntax is really way too similar to regular filters. That would be verrry confusing.

on_confict comes to mind, it's not a filter too and we haven't had any complaints about that, so maybe not a problem.

Edit: I do realize on_conflict was meant to be left out in #2066 but there we're really talking about a PostgREST syntax v2(another issue).

@steve-chavez
Copy link
Member Author

steve-chavez commented Feb 14, 2022

DELETE /5items?id=gt.5
But anyway: "diverging from not using path parameters"?

Btw, I gave it second thought and I'm not opposed to changing our REST syntax(adding path parameters). What I think is not right is to change it abruptly(on a new feature). We should at least document our REST syntax v1 and then introduce path parameters and the ideas in #2066, ideally providing back compat(also mentioned in #2066).

I have not seen anything anywhere that says we are avoiding path parameters on purpose.

Exactly, that's why I think we should document REST syntax v1 first. I've always had that rule in my mind(while reviewing/adding features/fixes) and likely all users and client libraries have come to expect a syntax from us as well.

@wolfgangwalther
Copy link
Member

I thought about this a little bit more, and even before your latest comments - which are well in line with what I am proposing now - I came to the following conclusion: I actually suggested two separate things, namely:

  • Making $subject a part of the URI due to "what is a resource" considerations
  • Using different parts of the URI (path parameters) for things in the query string that are "not a filter".

The second suggestion is not something that we can solve for just this specific case, but we need to discuss it in a bigger context. The basic question is "how many reserved keywords can we remove from the querystring?". If we were to start from scratch, the best answer would be "all" - so that we could use arbitrary names for filtering. But this is for sure better discussed in #2066 - and not here.

For this PR my main point is about headers vs URI. I have thought about this a little bit more. Our current usage of URI features and syntax certainly means we should implement this as a query string parameter - this is how we do those things right now. Even though I don't know any specific examples right now, I can see this general concept of "higher level filters", so to say, being extended in the future with more than just row_count. For this reason, I would like to find a way to in the query string that will avoid reserving even more keywords in the future - maybe we can create some kind of a namespace now.

I thought a little bit more of what that "higher level filters" concept is about here and I came up with the following. Our SQL queries have a general simplified structure of:

SELECT json_agg(<select>)
FROM <endpoint>
WHERE <filters>;

I.e. we have an aggregation at the end, and before that we have our regular filters.

I think it would help to think about those new filters like row_count as something like a HAVING clause, i.e. filters that are applied after the aggregation phase:

SELECT json_agg(<select>)
FROM <endpoint>
WHERE <filters>
HAVING count(*) BETWEEN x AND y;

In fact, this very much looks like a query similar to what we need for $subject.

Maybe we can find a syntax for this, that would allow extending it to more general aggregation scenarios and HAVING filters later on? I guess we should think about how the syntax for #915 (comment) should end up looking. We don't need to implement that right away, but if we can implement $subject in a way that can easily be extended to support other aggregation cases, too... that would be great.

TLDR: I think the correct solution to $subject is actually aggregation + HAVING filters.


I think row_count is unlikely to cause a breaking change, note that it's already used here.

This is a much better argument for row_count than before. I didn't buy the "SQL Server does this" argument at all. But if that's a postgres term...

I was about to agree to row_count - but while writing up the above, it became obvious that just count with some kind of HAVING notion is much better.

on_confict comes to mind, it's not a filter too and we haven't had any complaints about that, so maybe not a problem.

Well, we do have other non-filter query string parameters, too - e.g. select. But I guess that doesn't matter anymore.. because if we're talking HAVING, then those things are filters, so it's good when they look the same.


I think we have basically two options of implementing the HAVING:

  • with prefix, e.g. GET /items?id=gt.5&having.count=eq.5
  • without prefix, but some kind of smartness about "where vs having": GET /items?id=gt.5&count=eq.5

I feel we should discuss the details of that in #915. Then we can come back and implement something that looks the same for now in this PR.

@steve-chavez
Copy link
Member Author

Also, another thing we lose with row_count as a path parameter is that we can no longer use our filters on it. Like:

DELETE /users?status=eq.inactive&row_count=gt.20
DELETE /users?status=eq.inactive&row_count=eq.20
DELETE /users?status=eq.inactive&row_count=lt.20

So with that we can say delete "at most"/"at least"/exactly this amount of rows.

With path parameters I guess we could do:

GET /projects.row_count=gt.20?status=eq.inactive

But again that form is just the same we already have with our query parameters.

@steve-chavez
Copy link
Member Author

Ok, so let's step back to the when.count() option. How about this, since we already have a namespace there, we do:

GET /projects?when.row_count=lt.5

It seems more clear than doing count() for me. WDYT?

Then we could have the when.cost as well.

@wolfgangwalther
Copy link
Member

Ok, so let's step back to the when.count() option. How about this, since we already have a namespace there, we do:

GET /projects?when.row_count=lt.5

It seems more clear than doing count() for me. WDYT?

Namespace-wise, this is better than using a plain row_count.

However I don't see why we should use a renaming construct here? Introducing an arbitrary mapping makes it really hard to possibly extend this to using arbitrary aggregation functions later on. The function used is called count - we should use that name.


With path parameters I guess we could do:

GET /projects.row_count=gt.20?status=eq.inactive

But again that form is just the same we already have with our query parameters.

That would be very nice actually. It looks the same as a query parameter, because it's a filter. But there is a difference: Query parameters are optional, in the sense that not matching them will still return a response and not an error. Path parameters are required. This maps very nicely to the fact that query parameters are put after the ?, which makes them optional. Very similar to what ? means in a regex :). Taking this one step further it would be great if we could mark filters as required by using !.

Something like:

GET /projects!count=gt.20?status=eq.inactive

! is allowed in that part of the URI, because it's a sub-delim, just like ; and ,.

This also matches up very nicely with embedding hints: When you specify a !hint for an embedding, it is a ''requirement** that needs to be matched - otherwise the query will fail.


Still a bit weird for me but it doesn't look that bad. It seems it could be extended to embedded resources as well.

GET /projects.5?select=clients.3(*)

In a sense a "required filter" for an embedding would turn the embedding into an INNER JOIN. Throwing an error is not appropriate in that case. But from a query perspective the regular optional filters would be inside the subquery, while the required filters would be in the WHERE part of the outer query. However that kind of problem would already be better solved with #1414 (comment), so I don't think we need to implement those required filters on embeddings.

@wolfgangwalther
Copy link
Member

Ok, so let's step back to the when.count() option. How about this, since we already have a namespace there, we do:

GET /projects?when.row_count=lt.5

It seems more clear than doing count() for me. WDYT?

Namespace-wise, this is better than using a plain row_count.

However I don't see why we should use a renaming construct here? Introducing an arbitrary mapping makes it really hard to possibly extend this to using arbitrary aggregation functions later on. The function used is called count - we should use that name.

One more thing to note here: Using when.count() is not a breaking change only because of using (). Without those, it would break current embeddings which are called when and have a column called count or row_count.

If we were ok to introduce a breaking change, we could do the following:

  • Block the entire when. namespace for embedding now (breaking change)
  • Then use row_count=eq.whatever, cost=lt.whatever etc for postgrest built-in calls
  • And use count()=eq.whatever, other_aggregate()=lt.whatever for custom aggregates later on

In this case using a mapping of row_count -> count() would not prevent us from implementing custom aggregates later on. And it would also solve the "cost is not implemented as an aggregate anyway" problem.

If we do it like this, I'd be fine with using when.row_count=op.val here. If custom aggregates were implemeted, you could achieve the same with when.count()=op.val, but there would be no collision.

However, tbh: I like the !count()=op.val path parameter approach more. We could take the same approach there:

  • !row_count=op.val and later !cost=op.val (without ()) for pgrst built-ins
  • have the freedom to implement !any_agg()=op.val later on

@steve-chavez
Copy link
Member Author

However, tbh: I like the !count()=op.val path parameter approach more. We could take the same approach there:

The path parameters have the advantage of not colliding with another namespace as well, unlike the when.

However I still feel this choice will give us trouble in the future. Like when we chose making the + symbol significant(it was used instead of ! at one time) at then we found out some proxies automatically changed it to a whitespace. I think the same thing would happen with a GET/DELETE with a body as we discussed on other issue.

For example, on curl you can use the -G and -d options to pass query parameters more cleanly:

curl -X DELETE "localhost:3000/limited_update_items" -G \
   -d order=id \ 
   -d limit=1

There's no option for doing this with the new path parameters. Likely other tools/libraries also have facilities for dealing with regular query parameters as well.


I have another proposal that would solve the Prefer problem(older PostgREST versions will err on it and not just ignore it), the namespacing ones and wouldn't require a new syntax.

Basically use another vendored media type:

Accept: application/vnd.pgrst.mutation+json;rowcount=lte.50

The media type parameter already has our namespace, so it doesn't conflict with existing embeds or columns. Also older PostgREST versions will just fail if this media type is sent.

The existing vnd.pgrst.object would be equivalent to:

Accept: application/vnd.pgrst.mutation+json;rowcount=eq.1

This namespace would also allow us to set a cost in the future.

So far I think this is the most consistent way of adding this feature. @wolfgangwalther WDYT?

@wolfgangwalther
Copy link
Member

However I still feel this choice will give us trouble in the future. Like when we chose making the + symbol significant(it was used instead of ! at one time) at then we found out some proxies automatically changed it to a whitespace.

That's because + is part of application/x-www-form-urlencoded: https://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1

I don't think ! is treated in a special way anywhere. And if it was, we would have found out already, because we're using it in hints.

I think the same thing would happen with a GET/DELETE with a body as we discussed on other issue.

Could you elaborate on that? I don't see a problem, yet, with the path parameter syntax and what we've discussed around this matter.

For example, on curl you can use the -G and -d options to pass query parameters more cleanly:

curl -X DELETE "localhost:3000/limited_update_items" -G \
   -d order=id \ 
   -d limit=1

There's no option for doing this with the new path parameters. Likely other tools/libraries also have facilities for dealing with regular query parameters as well.

Hm. However, tools/library support for mimetype parameters in the accept header is even worse, right?

Accept: application/vnd.pgrst.mutation+json;rowcount=lte.50

The media type parameter already has our namespace, so it doesn't conflict with existing embeds or columns. Also older PostgREST versions will just fail if this media type is sent.

The existing vnd.pgrst.object would be equivalent to:

Accept: application/vnd.pgrst.mutation+json;rowcount=eq.1

[...]
So far I think this is the most consistent way of adding this feature.

Hm. It is consistent with what was previously done with vnd.pgrst.object. However, I remember you saying elsewhere, that this wasn't the best choice, because some tools don't even allow to set a mimetype at all. A path, however, can be set everywhere. Might not have the best tooling support, but certainly better than mimetypes.


This namespace would also allow us to set a cost in the future.

Uh, mh. I mean cost is already out of place in the path. But in the mimetype? It's even more out of place. The existing vnd.pgrst.object is first of all about the shape of the json response: object vs array. It's only a secondary effect, that responses with fewer or more rows throws errors, because they can't be represented as an object.

But what does vnd.pgrst.mutation mean? How does this describe the shape of the response body? It does not at all.

Even worse: When you do prefer not to return a representation, which is especially likely on a DELETE request - how would the accept header even make sense? We can freely ignore the accept header if we choose not to return a body at all.

In terms of HTTP semantics, the path parameter is really the right place to put it.

@steve-chavez
Copy link
Member Author

Uh, mh. I mean cost is already out of place in the path. But in the mimetype? It's even more out of place

Yeah, noticed that as well for the path. Then I think the cost only makes sense as a header, seems it'd fit as a Prefer.

Even worse: When you do prefer not to return a representation, which is especially likely on a DELETE request - how would the accept header even make sense? We can freely ignore the accept header if we choose not to return a body at all.

On #1417 (comment), we concluded that Accept can be enforced whether the response returns a body or not.

But what does vnd.pgrst.mutation mean? How does this describe the shape of the response body? It does not at all.

How about a different name then:

Accept: application/vnd.pgrst.array+json;size=lte.50

Then it does describe the shape of the body. I was thinking of mutation as a generic term to do csv as well, but a name for that can be thought later(maybe rows instead of array).

@wolfgangwalther
Copy link
Member

Uh, mh. I mean cost is already out of place in the path. But in the mimetype? It's even more out of place

Yeah, noticed that as well for the path. Then I think the cost only makes sense as a header, seems it'd fit as a Prefer.

Hm. Assume we have no cost limit by default and want to enforce one from the client-side: In this case, we'll just have the same problem with the Prefer header: It can by ignored, if not understood. So it's not a safe-guard at all.

But if we think about cost the other way around: Let's say we have a rather low server-side default limit for cost ("soft-limit"). And then we send a Prefer header to set the cost limit higher (up to a max - "hard limit"). In that case a Prefer header would be a great fit. If the header was ignored - fine, we'd still work with the lower limit. That's safe to do.

Even worse: When you do prefer not to return a representation, which is especially likely on a DELETE request - how would the accept header even make sense? We can freely ignore the accept header if we choose not to return a body at all.

On #1417 (comment), we concluded that Accept can be enforced whether the response returns a body or not.

But what does vnd.pgrst.mutation mean? How does this describe the shape of the response body? It does not at all.

How about a different name then:

Accept: application/vnd.pgrst.array+json;size=lte.50

Yeah, that would be very consistent with how we use vnd.pgrst.object now. But as mentioned in #2066 (comment), I don't think this was the best implementation in the first place.

Then it does describe the shape of the body. I was thinking of mutation as a generic term to do csv as well, but a name for that can be thought later(maybe rows instead of array).

Wait, what? A generic mimetype that would return either json or csv? How, if not by mimetype, would I then even tell PostgREST to return json or csv? A mimetype should describe the output format.

The csv argument is a very good one: We should not do this via mimetype. Otherwise, we'd need a special mimetype for each kind of response. And suddenly it will be very hard to support this for custom mimetypes. Why should this feature not be available, once we support something like Accept: text/yaml with custom functions?

@steve-chavez
Copy link
Member Author

Wait, what? A generic mimetype that would return either json or csv? How, if not by mimetype, would I then even tell PostgREST to return json or csv?

Isn't that possible by having suffixes in the mime type like the +json suffix we've been using in vnd.pgrst.object+json?

https://datatracker.ietf.org/doc/html/draft-ietf-appsawg-media-type-suffix-regs-02#section-3.1

The csv argument is a very good one: We should not do this via mimetype. Otherwise, we'd need a special mimetype for each kind of response

So with the above we could do json and csv with:

Accept: application/vnd.pgrst.rows+json;rowcount=lte.50

Accept: application/vnd.pgrst.rows+csv;rowcount=lte.50

@steve-chavez
Copy link
Member Author

The iana registered media types seem pretty liberal. For example, the subtype of this application/conference-info+xml doesn't define a structure on his own but the suffix does. It would be the same case with our vnd.pgrst.rows subtype.

@steve-chavez
Copy link
Member Author

Accept: application/vnd.pgrst.array+json;

Though I think we need the above as well.

We discussed this somewhere else but the json_typeof we run on mutations is expensive on the db side. Having both application/vnd.pgrst.array+json and application/vnd.pgrst.object+json would let us run json_populate_recordset and json_populate_record precisely. These options could then be abstracted on a client library to gain performance.

@wolfgangwalther
Copy link
Member

So with the above we could do json and csv with:

Accept: application/vnd.pgrst.rows+json;rowcount=lte.50

Accept: application/vnd.pgrst.rows+csv;rowcount=lte.50

But how would you do that with custom mimetypes that we don't handle out of the box but that could be supported by #1582?

This will limit the ability to use custom mimetypes heavily.


Accept: application/vnd.pgrst.array+json;

Though I think we need the above as well.

We discussed this somewhere else but the json_typeof we run on mutations is expensive on the db side. Having both application/vnd.pgrst.array+json and application/vnd.pgrst.object+json would let us run json_populate_recordset and json_populate_record precisely. These options could then be abstracted on a client library to gain performance.

This is not a requirement for Accept, though - but for Content-Type.

@steve-chavez
Copy link
Member Author

Hm, considering all the alternatives here, I think we shouldn't let the the perfect be enemy of the good.

The Prefer approach doesn't conflict with the columns namespace, doesn't require new syntax in the URL and has the problem of not being strict in its application: a problem in the case someone uses an outdated postgREST, for which there are other workarounds(explicit docs, warnings, etc). In light of the unavailability of the Expect header, Prefer is the best HTTP affordance we can adopt.

PATCH /tbl

Prefer: rowcount=eq.0, rowcount=lt.50

(We need to make sure the value syntax accepts two row counts, otherwise we'd need to collapse into a single val)


Related #2343 (comment)

wolfgangwalther added a commit to wolfgangwalther/postgrest that referenced this pull request Oct 27, 2022
This partially reverts PostgREST#1257 / PostgREST#1272 / 5535317 where the 404 was introduced.

A 406 error is still returned when requesting a single object via accept header.

Returning an error when no rows are changed can be introduced through a different syntax again, see the discussion in PostgREST#2164.

Fixes PostgREST#2343

Signed-off-by: Wolfgang Walther <[email protected]>
wolfgangwalther added a commit to wolfgangwalther/postgrest that referenced this pull request Oct 27, 2022
This partially reverts PostgREST#1257 / PostgREST#1272 / 5535317 where the 404 was introduced.

A 406 error is still returned when requesting a single object via accept header.

Returning an error when no rows are changed can be introduced through a different syntax again, see the discussion in PostgREST#2164.

Fixes PostgREST#2343

Signed-off-by: Wolfgang Walther <[email protected]>
wolfgangwalther added a commit that referenced this pull request Oct 27, 2022
This partially reverts #1257 / #1272 / 5535317 where the 404 was introduced.

A 406 error is still returned when requesting a single object via accept header.

Returning an error when no rows are changed can be introduced through a different syntax again, see the discussion in #2164.

Fixes #2343

Signed-off-by: Wolfgang Walther <[email protected]>
@steve-chavez
Copy link
Member Author

Taking the dollar namespace idea on #2125 (comment)

when.count()=op.val

How about if we do it like this:

GET /tbl?$rowcount=eq.3

Since pg identifiers cannot start with a $ this should be safe - well maybe not completely safe because we quote the columns.

In this case I wouldn't be opposed to a namespace like when since we could have other dollar namespaces and is clear that this is a postgREST-specific thing.

GET /tbl?$when.rowcount=eq.3

$when is not that clear though, but I don't have a better idea for now.

@wolfgangwalther WDYT? No more debate on the Prefer header 😸

@steve-chavez
Copy link
Member Author

steve-chavez commented Mar 20, 2023

GET /tbl?$when.rowcount=eq.3

The above still seems a bit unwieldy to me. Really the lack of the Expect header is what really originates this whole discussion.

If Kong can introduce such non-standard headers as apikey(ref), why cannot we introduce our own header?

It could be called X-Expect or PG-Expect(Cloudflare has its CF- headers).

Personally I'd vote for X-Expect. We have needed this capability for quite some time.

(Maybe X-Pect 😄? Could work better as some proxies might search for Expect to reject it)

It's introduction it's pretty much justified, as the standard Expect doesn't work.

@wolfgangwalther
Copy link
Member

I still think this was the best idea:

That would be very nice actually. It looks the same as a query parameter, because it's a filter. But there is a difference: Query parameters are optional, in the sense that not matching them will still return a response and not an error. Path parameters are required. This maps very nicely to the fact that query parameters are put after the ?, which makes them optional. Very similar to what ? means in a regex :). Taking this one step further it would be great if we could mark filters as required by using !.

@steve-chavez
Copy link
Member Author

steve-chavez commented Aug 2, 2023

I think #2887 is now a better semantic fit of all the alternatives here.

It will require less parsing too (which will help for #2816)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Conditional delete/update based on rows affected
2 participants