Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to customize the URL structure #1909

Closed
steve-chavez opened this issue Aug 13, 2021 · 17 comments
Closed

Provide a way to customize the URL structure #1909

steve-chavez opened this issue Aug 13, 2021 · 17 comments
Labels
idea Needs of discussion to become an enhancement, not ready for implementation

Comments

@steve-chavez
Copy link
Member

Problem

We recommend customizing the URL structure through Nginx, however doing so is complicated and requires additional tools. This can be seen on ppKrauss/PostgREST-writeAPI, which uses mustache templating plus a nodejs script to generate the Nginx rewrite config.

Additionally, changing the URL in the proxy will not be reflected in our OpenAPI spec - which would need rewriting on the proxy as well, which is also more work.

Relying solely on the proxy for this hurts our extensibility, ideally PostgREST should be able to adapt to other standards, like more classical REST with nested routes or maybe more complex ones like Odata; and do so outside of PostgREST core.

Note that we do have a way forward to customize the response body(see this discussion regarding JSON-LD), but not the URL structure.

Proposal

Provide a way to customize the URL through the config file or through postgrest-contrib. It could be like so:

select pgrst.url_rewrites(
  -- more classic REST
  -- /persons/1
  'persons/${value}', 'people?id=eq.${value}'
  -- /persons/1/companies
, 'persons/${value}/companies', '/people?select=*,companies(*)&id=eq.${value}'
  -- bring an RPC to root endpoint
, 'articles/${year}', 'rpc/get_articles?year=${year}'
  -- special route
, 'christmas_2021', 'rpc/get_articles?year=2021&month=december'
);

Related


@wolfgangwalther I think you had an idea about a "routing refactor", this may be related.

@steve-chavez steve-chavez added the idea Needs of discussion to become an enhancement, not ready for implementation label Aug 13, 2021
@wolfgangwalther
Copy link
Member

@wolfgangwalther I think you had an idea about a "routing refactor", this may be related.

Absolutely, I had customizing of the URLs for endpoints in mind with that, too.

@gautam1168
Copy link
Contributor

@steve-chavez so this would be a separate module that somehow plugs in before the userApiRequest function in ApiRequest.hs file depending on a configuration? Is there an example of such a module that I can look at to get an idea of how to write such a thing?

@steve-chavez
Copy link
Member Author

Is there an example of such a module that I can look at to get an idea of how to write such a thing?

Hm no, there isn't. This is mostly a high-level idea at this stage, extensibility is pretty new.

so this would be a separate module that somehow plugs in before the userApiRequest function in ApiRequest.hs file

Sounds about right, though TBH I haven't looked yet at the implementation details.

@wolfgangwalther
Copy link
Member

Sounds about right, though TBH I haven't looked yet at the implementation details.

At a higher level, my idea was roughly along the following lines:

  • Currently, the request is parsed in ApiRequest by pulling together lots of data from various sources (request itself, config, schema cache) - all this happens for each request, where the implicit routing table is basically built dynamically per request.
  • I would like to change that to create a Router type that holds all the information for available routes, that can be constructed from the config and schema cache - independent of the actual request.
  • When the request comes in, the request can then be matched against the Router to find the route it should be handled from.
  • This allows to customize the Router easily. After creation of the base router from config and schema cache, additional options can be parsed to customize all kinds of things, e.g. the endpoints, accepted media-types, etc... - the suggested pgrst.url_rewrites would hook in right there.

Does that make sense?

@gautam1168
Copy link
Contributor

gautam1168 commented Aug 22, 2021

Hmm, let me try to echo back what you are saying. There would be a new data type called Router. A router and a request can be combined using applyRouting function to reform the paths in the incoming Request to create a new Request with a modified path and/or query. Maybe something like the following:

data Router = Router {
-- some kind of representation of appconfig and dbstructure  and routing rules
}

makeRouter :: AppConfig -> DbStructure -> Router
applyRouting :: Request -> Router -> Request
userApiRequest :: Router -> Request -> RequestBody -> Either ApiRequestError ApiRequest

The applyRouting function has to be able to "run" some rules that will be specified in some configuration file, like the example in the issue description which looks like taking values from the input request and templating them out into the output string.

@wolfgangwalther
Copy link
Member

Hm. Using the names you suggested, I think userApiRequest would be replaced by makeRouter and applyRouting. Currently userApiRequest does both. So something like this maybe:

data Router = Router {
-- some kind of representation of appconfig and dbstructure  and routing rules
}

makeRouter :: AppConfig -> DbStructure -> Router
applyRouting :: Router -> Request -> RequestBody -> Either ApiRequestError ApiRequest

@gautam1168
Copy link
Contributor

gautam1168 commented Sep 4, 2021

@wolfgangwalther yes this makes sense. However, if I try to write a function in haskell whose job is to run some rules provided in the configuration file as mentioned in the description of this issue, then I think there are two ways it can be bad:

  1. The users specify some rule that is not satisfiable by the function and they do not know that they have made that mistake until runtime. Because there is no compiler checking this dsl we are inventing inside the configuration file to specify routing.
  2. We end up writing a very extensive module to check that the router configuration is sound in every way before we start postgrest (which feels like a big project to me, depending on how complex we make this dsl). And then there could be feature requests on this module making it even more complex and hard to maintain.

So going with postgrest's motto of doing one thing well, I suggest we somehow provide this configuration of the router as a program.

So, something like, if you provide an executable in the configuration file, postgrest will start that executable on a separate process. And we provide a default executable (router) with postgrest which may implement what we have discussed above. For example:

# config file
router = "default" # it can be a path to some executable
router-configs = "./routerconfigs"
# routerconfigs
port = 8080
host = localhost
target-port = 3000 # postgrest is running here
target-host = localhost

Now when you start postgrest, and it finds a router configuration it will start that program on a separate process.

This way, we can still allow people to configure routes on postgrest out of the box. And we give them a path to either go with some kind of cloud native architecture like AWS Apigateway or their own kubernetes setup with nginx gradually. We also give them the ability to configure much more than just the routes. Because this program can just rewrite the entire request, swapping out the request method, body and headers in arbitrary ways that the users can imagine. (This is possible inside the postgrest function we discussed above as well, however, the configuration dsl will become evil at that point).

@steve-chavez
Copy link
Member Author

steve-chavez commented Sep 4, 2021

Because there is no compiler checking this dsl we are inventing inside the configuration file to specify routing.
We end up writing a very extensive module to check that the router configuration is sound in every way before we start postgrest (which feels like a big project to me, depending on how complex we make this dsl).

True, it's looking complex. And the DSL I put above only considers the URL part, ideally we'd also be able to extend PostgREST to be compatible with IDL-based standards such as graphql — for this we'd need to parse the request body + map it to PostgREST query params.

That is one of the reasons I mentioned the embedded scripting language idea in #1698 (comment).

There might be another option. I've just read about List-based parser combinators, which is a way to do parser combinators dinamically. With this we could(in theory) allow to modify PostgREST parser combinators through the configuration file(or with a nicer interface through postgrest-contrib).

We'd need a library for the List-based parser combinators, because there's only an example for Haskell. Then we could use that lib for parsing the body + the URL part(maybe).

@wolfgangwalther
Copy link
Member

  1. The users specify some rule that is not satisfiable by the function and they do not know that they have made that mistake until runtime. Because there is no compiler checking this dsl we are inventing inside the configuration file to specify routing.

Proper strong haskell typing should avoid that.

  1. We end up writing a very extensive module to check that the router configuration is sound in every way before we start postgrest (which feels like a big project to me, depending on how complex we make this dsl). And then there could be feature requests on this module making it even more complex and hard to maintain.

I don't think this is going to be a very complex thing. I'm pretty sure it should be straight-forward to add a parser that reads in a JSON config for the router, maps that to a haskell data type and throws decent parsing errors.

So going with postgrest's motto of doing one thing well, I suggest we somehow provide this configuration of the router as a program.

I think this is just going to complicate things a lot more. For everything that goes beyond the simple set of features that we want to support directly at the PostgREST level, nginx as a reverse-proxy is still the best solution. We don't need to reinvent the wheel here.

True, it's looking complex. And the DSL I put above only considers the URL part, ideally we'd also be able to extend PostgREST to be compatible with IDL-based standards such as graphql — for this we'd need to parse the request body + map it to PostgREST query params.

That is one of the reasons I mentioned the embedded scripting language idea in #1698 (comment).

The whole scripting language idea sounds promising - as a second step after the routing refactor. In that second step, we would basically rewrite the request parser in that scripting language, to make it possible to swap it out by a replacement. Does that make sense?

@gautam1168
Copy link
Contributor

Hmm yes. The embedded scripting language part gets me more excited than this refactor 😛. And the idea of extending postgrest to be compatible with graphql is very exciting. I would love to see some specs about how to do these things. But for a starter I can refactor the router out as @wolfgangwalther suggested above.

Can @steve-chavez or @wolfgangwalther throw some more light on how it can be evolved to support graphql or start another issue/rfc for that?

@steve-chavez
Copy link
Member Author

throw some more light on how it can be evolved to support graphql or start another issue/rfc for that?

@gautam1168 I don't have anything specific but really that part should be left for another repo. My main motivation here is to provide a generic interface for extensibility.

Also, seems that dhall might have issues with graphql because it doesn't have recursiveness?
https://discourse.dhall-lang.org/t/dhall-as-an-alternative-to-openapi-graphql/344/4. Hm, that might be a blocker for us to adopt Dhall.

@gautam1168
Copy link
Contributor

hmm ok. Let me open a PR for this refactor first.

@steve-chavez
Copy link
Member Author

steve-chavez commented Sep 7, 2021

Regarding embedded scripting languages, there's also Embedded SQL - PostgreSQL has ECPG for example.

For Haskell, I found hesql(does a subset of Postgres' SQL), it doesn't seem maintained but the idea of embedded SQL in PostgREST seems really cool 😮 🤯 💥

@steve-chavez
Copy link
Member Author

Also, seems that dhall might have issues with graphql because it doesn't have recursiveness?
https://discourse.dhall-lang.org/t/dhall-as-an-alternative-to-openapi-graphql/344/4. Hm, that might be a blocker for us to adopt Dhall.

I've seen that Dhall supports toml, that doesn't seem that different from supporting graphql.

So maybe not all things can be done in pure dhall, but dhall can be extended with Haskell. So we'd need to upgrade our Dhall dependency in some cases, but that's not too bad, we'd still support extensibility off the core repo.

For Haskell, I found hesql(does a subset of Postgres' SQL), it doesn't seem maintained but the idea of embedded SQL in PostgREST seems really cool

That might be shooting for the stars for now, Dhall is looking like the most feasible alternative.

@steve-chavez
Copy link
Member Author

steve-chavez commented Oct 26, 2021

One advantage of Lua as embedded scripting language is that many backend services already support it: Redis, HAProxy, Vector and of course Nginx(OpenResty) and Pandoc. I'm now thinking Lua is the right choice here.

Also, hslua seems well supported.

@steve-chavez
Copy link
Member Author

steve-chavez commented Mar 2, 2022

There was a request in gitter about making PostgREST read-only. Of course a read replica or having a read-only role would work but maybe we can allow to restrict our http methods so only GETs are possible and POSTs or other methods don't even touch the db.

Or for a more "db as source of truth" approach, we could reject http methods without hitting the db based on the role's privileges. This requires more smartness from the schema cache though.

Edit: we shouldn't do this because those are different things: "not allowed to" vs "not capable to" .

@steve-chavez
Copy link
Member Author

This discussion derailed a bit of its original purpose. There's a more plausible alternative now on #1086 (comment)

Basically by defining the custom URL as a view or function name:

CREATE VIEW "./projects/:id" AS
SELECT * 
FROM projects
WHERE id = current_setting('request.path.id`, false);

Let's follow up there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea Needs of discussion to become an enhancement, not ready for implementation
Development

No branches or pull requests

3 participants