Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json to recordset #2542

Merged
merged 9 commits into from
Jan 8, 2023
Merged

Json to recordset #2542

merged 9 commits into from
Jan 8, 2023

Conversation

aljungberg
Copy link
Contributor

Switches from json_populate_recordset to json_to_recordset, which we will need for data representations. See #2523 for details.

The impossible panic we discussed in #2523 has been eliminated by creating a TypedField type which doesn't allow that situation.

src/PostgREST/ApiRequest/Types.hs Outdated Show resolved Hide resolved
src/PostgREST/ApiRequest/Types.hs Outdated Show resolved Hide resolved
src/PostgREST/Error.hs Outdated Show resolved Hide resolved
src/PostgREST/Plan.hs Outdated Show resolved Hide resolved
src/PostgREST/ApiRequest/Types.hs Outdated Show resolved Hide resolved
@wolfgangwalther
Copy link
Member

@steve-chavez we should certainly wait for the next minor release before merging this or #2523 to reduce the risk of breaking things and first getting to a release, which is upgrade-able to from previous major versions.

@aljungberg
Copy link
Contributor Author

aljungberg commented Oct 28, 2022

Regarding the code coverage on Ord, what's the best approach there? I could stick a few tests in that do some compare exercises, but didn't see anywhere else we do that. Could just drop Ord fromTable, seems to be unused, which would mean we don't need it on ColumnMap either. We only need Hashable for the maps. Thoughts?

@wolfgangwalther
Copy link
Member

Regarding the code coverage on Ord, what's the best approach there? I could stick a few tests in that do some compare exercises, but didn't see anywhere else we do that. Could just drop Ord fromTable, seems to be unused, which would mean we don't need it on ColumnMap either. We only need Hashable for the maps. Thoughts?

I'd say: Drop everything we don't need. Let's keep it at the minimum, to have better coverage reports. Then add those instances later, once you need them.

@aljungberg
Copy link
Contributor Author

Not entirely sure what this means:

Screenshot 2022-10-28 at 16 09 00

Clearly insCol is tested since it's used by the insertion tests?

@wolfgangwalther
Copy link
Member

Clearly insCol is tested since it's used by the insertion tests?

Actually, it's not. The record field accessor insCol is not used, because we only pattern match the MutatePlan, but do not use any of the record syntax.

You'll get reports for lines that were uncovered before your changes, too. So some of those codecov warnings can't really be avoided.

@steve-chavez
Copy link
Member

Switches from json_populate_recordset to json_to_recordset

Would it be possible to fallback to json_populate_recordset in case the columns of a relation are not present on the schema cache?

For example, this could happen if a new table is added and the schema cache is not refreshed. We remained operational before for these cases, what would happen now?

@aljungberg
Copy link
Contributor Author

Switches from json_populate_recordset to json_to_recordset

Would it be possible to fallback to json_populate_recordset in case the columns of a relation are not present on the schema cache?

For example, this could happen if a new table is added and the schema cache is not refreshed. We remained operational before for these cases, what would happen now?

The new behaviour is that you get an error when trying to insert data into a table or a column we believe not to exist.

Personally I think that's actually an improvement over the old behaviour. Clear and immediate messaging for a situation that is more likely to be an error than not.

Like how often do you expect to make API calls to tables so fresh off the presses they're not even in the cache? The only time I can imagine is if you're live developing your schema. In that case can't you do NOTIFY pgrst, 'reload schema' at the end of your edit transaction as well?

What's the venn diagram between people live editing their schema yet unwilling to NOTIFY? I imagine it's pretty small.

  • If we make it a stated goal to always allow interaction with unknown tables and columns, data representations will have confusing failure modes (e.g. they work some of the time).
  • Even if you don't use data representations, you can have confusing behaviour with regards to relations, embedding and so forth if you are in the habit of changing your schema without reloading the cache.
  • There's more complexity in our codebase having fallbacks.
  • To support json_populate_recordset as a fallback, the constraints on TypedField would have to be relaxed. The record started out relaxed and I made it more strict due to the feedback on the data rep PR. To be clear, I'm happy to do the work, but we might be pulling in two directions at once here.

All that said, happy to change it. I'm really excited about this feature and I think it'll be a great addition to PostgREST, so keen to get it merged. We're going to be using this probably in every single API on our end.

@wolfgangwalther
Copy link
Member

Personally I think that's actually an improvement over the old behaviour.

I absolutely agree. I think we should strive to become more strict. Hopefully this should lead people to learn about reloading the schema cache much earlier, too - and not only when they try to make sense of a rather complicated topic with embeddings. Making a basic request, seeing it fail, reloading the schema cache and then seeing it pass should give users (and us) much more confidence that they at least can successfully reload that cache.

@wolfgangwalther
Copy link
Member

I'm really excited about this feature and I think it'll be a great addition to PostgREST, so keen to get it merged. We're going to be using this probably in every single API on our end.

Yeah, same here - just too much on my plate to keep reviewing immediately. But I see use cases for it in all my projects, too :)

src/PostgREST/Error.hs Outdated Show resolved Hide resolved
@steve-chavez
Copy link
Member

Cool, I agree that becoming more strict is good overall.

Not sure if stuffing schema cache info on the details key of the error message is the right approach. The schema cache is becoming more relevant with this change and maybe we should make it more visible.

How about we add a special header on the endpoints to get their schema cache entry(discussed before on #1421).

GET /projects
Accept: application/vnd.pgrst.schema-cache

{
  "columns": [".."],
  "relationships": [".."]
}

The above can be done on another PR but I'd consider it a must for a stable release after this feature is merged.

@aljungberg
Copy link
Contributor Author

Not sure if stuffing schema cache info on the details key of the error message is the right approach. The schema cache is becoming more relevant with this change and maybe we should make it more visible.

How about we add a special header on the endpoints to get their schema cache entry(discussed before on #1421).

GET /projects
Accept: application/vnd.pgrst.schema-cache

{
  "columns": [".."],
  "relationships": [".."]
}

The above can be done on another PR but I'd consider it a must for a stable release after this feature is merged.

Having a per relation info API would be a nice addition.

As you rightfully point out in the referenced PR, the schema cache is already visible through the root endpoint so perhaps such an API would be more of a nice to have?

Maybe there's something we can do there to present more information about the available data representations for each column. Haven't fully thought that through yet, but if we add support for CSV and binary data representations down the line it might be nice to see which column custom formatters and/or parsers are available, and which ones will rely on the base type native PostgreSQL casting.

@wolfgangwalther
Copy link
Member

How about we add a special header on the endpoints to get their schema cache entry(discussed before on #1421).

How about making a request to the root endpoint with a different accept header, i.e. the one you mentioned.. and then returning schema cache instead of OpenAPI output? Filtering the root endpoint could then be done the same way for both.

@steve-chavez
Copy link
Member

How about making a request to the root endpoint with a different accept header, i.e. the one you mentioned.. and then returning schema cache instead of OpenAPI output?

Yeah, that could be another option. I've branched off the discussion to #1421 (comment) to not make this thread longer.

@wolfgangwalther
Copy link
Member

Could you please do the following to make it easier for me:

  • squash all commits in this PR into one commit, with the proper semantic versioning prefix and a good commit message. I'd say this is a feat:, because we're changing the way we're validating the ?columns for INSERT and UPDATE. Are we throwing the same kind of error messages before? In that case the feature is probably "throwing those errors without hitting the database"?
  • Add a changelog entry for that change.
  • Remove all the changes that are only required in the next PR as mentioned above. Make this self-contained.

I will then have another look. Thanks!

@aljungberg
Copy link
Contributor Author

Are we throwing the same kind of error messages before? In that case the feature is probably "throwing those errors without hitting the database"?

The essence of the error is the same but the error message now has a PostgREST error code and indeed happens without hitting the database.

# Before
{"code":"42703","details":null,"hint":null,"message":"column \"helicopter\" of relation \"articles\" does not exist"}
# After
{"code":"PGRST118","details":null,"hint":"If a new column was created in the database with this name, try reloading the schema cache.","message":"Could not find 'helicopter' in the target table"}

@aljungberg aljungberg force-pushed the json_to_recordset branch 2 times, most recently from c3c3981 to 39e5120 Compare November 8, 2022 14:15
src/PostgREST/Plan.hs Outdated Show resolved Hide resolved
src/PostgREST/Plan.hs Outdated Show resolved Hide resolved
Copy link
Member

@wolfgangwalther wolfgangwalther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have most of my suggested changes locally already. If you want to save some effort, you can check the "allow edits of maintainers" box (not sure what the name is exactly). Then I can push those to your branch.

src/PostgREST/Plan.hs Outdated Show resolved Hide resolved
src/PostgREST/Plan/Types.hs Outdated Show resolved Hide resolved
src/PostgREST/Plan/Types.hs Outdated Show resolved Hide resolved
src/PostgREST/Query/QueryBuilder.hs Outdated Show resolved Hide resolved
@aljungberg
Copy link
Contributor Author

I believe all feedback up to this point has been addressed.

This returns an error for trying to update or insert into invalid columns, without hitting the database. This change also switches from `json_populate_recordset` for these operations `json_to_recordset` which should make no functional difference except allowing future flexibility.

- New: store columns in a map, grab true column types.
- New: `json_populate_recordset` -> `json_to_recordset`. This lays the groundwork for data representations, but should make no functional difference at this stage except that we now have an explicit error for trying to mutate tables or columns that don't exist (according to the schema cache).
- New: test missing column errors.
- Drop unused `Ord` to increase test coverage. The `ColumnMap` `Ord` was only there because `Table` derived `Ord`. Doesn't seem like that's used anywhere for `Table` to begin with so dropped both.
This left to unnecessary type conversions.
It was only used to put `TypedField` into sets which we no longer do after the previous commit.
@aljungberg
Copy link
Contributor Author

This PR is now rebased on the latest main

Copy link
Member

@steve-chavez steve-chavez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! The error message now is clear as the one from PostgreSQL!

@steve-chavez
Copy link
Member

@wolfgangwalther Do you have more input on this PR? I think it's looking good to merge.

@wolfgangwalther
Copy link
Member

Don't have the time, right now, to dive into it, sorry. It's on my list - as well as a lot of other things ;). Feel free to merge, if you think it's good.

@steve-chavez
Copy link
Member

It's on my list - as well as a lot of other things ;)

Oh, no problem!

Feel free to merge, if you think it's good.

I hesitate because I'm a bit lost on #2523 (which IIUC is the main goal of the PR ), so will let you review and merge this one.

@aljungberg
Copy link
Contributor Author

I hesitate because I'm a bit lost on #2523 (which IIUC is the main goal of the PR ), so will let you review and merge this one.

Anything I can do to help explain or detail it for you? If you'd like I could discuss it over a video call.

@steve-chavez steve-chavez merged commit 43ad6d6 into PostgREST:main Jan 8, 2023
@steve-chavez
Copy link
Member

@aljungberg Sorry for the delay! Let's move forward with this one - we'll also need it for #2594, so I've just merged it.

(I'll start reviewing #2523 and I'll let you know any doubts I have there)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants