-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for PGroonga operators #2028
Comments
Why is this only in the old v5.2 docs? Any specific reason for removing it from v6 onwards? |
I was wondering, whether we could somehow add metadata to custom operators in the database, to make PostgREST be able to identify and map them automatically. Kind of similar to what we're doing with We could use a yaml front matter block to add metadata, while keeping the current openapi comments support backwards compatible. This would also prevent issues with inlining, when using COMMENT ON OPERATOR "&@~" $$
---
pgrst:
filter:
name: groonga_query
requires_index: pgroonga
---
This operator will provide fulltext magic!
$$;
-- similar to the original idea in #1582
COMMENT ON FUNCTION ret_image(name text) $$
---
pgrst:
accept:
- image/png
---
This function will respond to Accept: image/png requests!
$$;
-- related to improved routing
COMMENT ON TABLE really_long_and_inaccessible_name $$
---
pgrst:
route: short_name
---
This table can now be queried at /short_name instead.
$$;
-- moving more of the config closer to database objects
COMMENT ON SCHEMA my_extensions $$
---
pgrst:
search_path: true
---
This replaces db-extra-search-path in the config.
$$; |
It was not thoroughly discussed IIRC. I thought it would not be a good idea because it "pollutes" the comments with postgrest-specific metadata(output of slash commands like Creating a DSL on top of comments, like shown on #1179 (comment), was also described as "icky" on a hacker news comment. I somewhat agree, as it feels "invasive" and prone to errors(also no syntax highlight). Another problem is that if other tools also (ab)use the comments then we'd "clash" with their DSL.
yaml certainly looks better. I guess we'd have to reach consensus with other tools though. I still think the correct way to do this would be the way mentioned in #1710 (comment), whether is our in-db config or postgrest-contrib functions, that way our metadata would be namespaced.
Do you see a problem with this interface? I like the idea of being strict and requiring indexes for enabling certain filters. I also think we'll need a more generic interface eventually, but for now we don't have too many operators for this to be a problem.
It was an agreement between me and Timescale(former backer) sometime ago, where we'd show their logo + integration doc for the duration of their sponsorship on Patreon, which only lasted for a portion of v5.2 lifetime. I think we can reincorporate the timescale doc though, it's good content and an extension section(with pgroonga and postgis later on) would look good. |
One thought, now that we have many client libraries, we can expose the original operator and leave the urlencoding to them, so we don't need to make an alias. This was discussed before on #938 (comment):
Or, we could also offer both, the original operator and the url-safe alias. |
Hm. I don't buy the argument "it's hard to remember aliases" very much. I doubt people really know all those PG operators out of their head, because the regularly use them. For me, it's more a "I know there is something that does this... let's look up the operator in the docs.". Also the client-side code is probably read by a lot more people that don't know PostgreSQL well, and the aliased operators are much more meaningful to them. If you really want to support the original operator (where I don't really see much value in), we absolutely have to keep supporting the url-safe aliases. |
I guess that depends on what you're expecting to see in that case. Assuming we are reading comments of objects in the public (or other postgrest specific) schema only, it could also be helpful. The objects in the api schema should only be used for postgrest anyway, so having the postgrest specific information readily in one place would be much better than having them stored away in some big json config setting. Of course we can try to see what kind of format we can put the metadata comments in to have them show in a nice way in
I would assume all other tools to be somewhat resistant against stuff that doesn't match their DSL - and the same for us. As long as we are prefixing this on our side, we should be fine.
I think it's a great idea to be strict about requiring indexes. I like it so much, that I added
Fully agree. |
Looks like an extensible interface could already be useful, as new operators have now been asked for repeatedly:
|
@wolfgangwalther (Ab)using function names, how about this for exposing new operators CREATE FUNCTION "pgrst.projects.name.groonga_query"(test.projects, test.projects.name%TYPE) RETURNS bool
AS $_$
SELECT $1.name &@~ $2
$_$ LANGUAGE SQL IMMUTABLE; This would allow calling GET /projects?name=groonga_query.stuff I've checked and the resulting queries are inlinable too select * from test.projects x where "pgrst.projects.name.groonga_query"(x, 2); Related to #2442 (comment) Might be possible to make this more generic(for all columns) with some conventions on the function naming |
I was mixing the concern about restricting the operator columns on #2442, there's a more plausible idea there now. Adding new operators could be simply: CREATE FUNCTION "pgrst.groonga_query"(text, text) RETURNS bool
AS $_$
SELECT $1 &@~ $2
$_$ LANGUAGE SQL IMMUTABLE; Meaning any function name with our prefix |
In the end this is just a function wrapper around an operator to attach metadata to it via the name prefix. You could achieve the same by creating a custom operator with the same name prefix. Why the function wrapper? But both of those approaches are very limited in terms of how much metadata we can attach. The example in #2028 (comment) already mentions something like I think we should sort out the "metadata" problem once and for good - and then we can build a lot of things on top of that. |
You can't create an operator with letters, no "eq" or "lt" https://www.postgresql.org/docs/current/sql-createoperator.html. The interface is much more complex too.
Yeah, that I think we'll not need anymore. As discussed on #2249, it's not certain that an index will be applied. What we should do instead:
|
To put it the other way around: We do need to solve the "metadata for database objects" problems. Once we do that, adding the information on which operators to expose through this metadata is much nicer than through function names as proposed here. So no matter whether we need |
Now that the allowlist on #2442 (comment) seems to be the way to go to restrict operators on columns, I think this can be kept simpler. We could just have a config that maps keywords to operators: db-extra-operators = "groonga_query:&@~, similarity:%, geo_within:st_within" ALTER ROLE authenticator SET pgrst.extra_operators = 'groonga_query:&@~, similarity:%, geo_within:st_within' Later one we might have a COMMENT ON or SECURITY LABEL for metadata but for now this looks like the simplest way to add operators. |
A problem with the above proposal is that it cannot express the FTS operators, unless we support something like: ALTER ROLE postgrest_test_authenticator
SET pgrst.extra_operators = 'my_fts:@@ to_tsquery, my_plain_fts: @@ plainto_tsquery' Which is not that clean or expressive IMO. The above function interface is much more expressive. To make it cleaner, how about removing the naming convention: CREATE FUNCTION groonga_query(text, text) RETURNS bool
AS $_$
SELECT $1 &@~ $2
$_$ LANGUAGE SQL IMMUTABLE; So we expose a function as an operator if:
|
The function interface is complex, I'm not convinced by it. I'm in need of the pgroonga operators though. Until we settle on an interface, how about just adding the |
Regarding aliasing to avoid urlencoding on operator syntax, maybe that's something that should be done in core. It doesn't feel right to let customers break their APIs by changing an alias on the config. I feel that design is brittle. Could we be smart and transform symbols to letters? Like Another option could be having a special
That does look more consistent with our syntax. Plus aliases doesn't seem necessary now. To allowlist the exposed operators:
I think this can also keep the |
The syntax for the raw operators could be: SEARCH /table?column=$op.@@.$body.val
{"..."} Related to the dollar operators proposal #2125 (comment) (Kinda looks like the Could also be SEARCH /table?column=$op(@@).$body.val
{"..."} |
I've realized this is somewhat the same problem as #2578. We also need a way to call filters with multiple inputs. |
I like this the most of the suggestions above. |
How about this for custom operators. Since IMMUTABLE is really a pure function, we could expose these as operators. These can be enabled per table/view and column like this: -- name is a type for a pg identifier
CREATE FUNCTION groonga_query(test.projects, text, text, name default 'id,name,client_id') RETURNS bool
AS $_$
SELECT $2 &@~ $3
$_$ LANGUAGE SQL IMMUTABLE; They can be enabled per table/view and type like this: CREATE FUNCTION groonga_query(test.projects, text, text) RETURNS bool
AS $_$
SELECT $2 &@~ $3
$_$ LANGUAGE SQL IMMUTABLE; They can be enabled for all table/view and type like this: CREATE FUNCTION groonga_query(anyelement, text, text) RETURNS bool
AS $_$
SELECT $2 &@~ $3
$_$ LANGUAGE SQL IMMUTABLE; Inlining advantages apply as mentioned above. We would need to scan the filters and search them in the schema cache for this, which would incur in perf loss. Or we could somehow mark them as custom, like: GET /projects?name=$groonga_query.stuff |
Come to think of it, the above might solve #2442 as well. We could create: CREATE FUNCTION eq(anyelement, anyelement) RETURNS bool
AS $_$
SELECT $1 = $2
$_$ LANGUAGE SQL IMMUTABLE; Which would represent our default To disallow it globally: REVOKE EXECUTE ON eq FROM anon;
REVOKE EXECUTE ON eq FROM webuser; Then the user can enable it per column or per table as mentioned above. It might leave a big footprint in the db but I think it could work. |
Using unnamed parameters for custom operator functions is key to differentiate them from #2578 |
An improvement on the above would be: CREATE FUNCTION groonga_query(test.projects, text, text, id name) RETURNS bool
AS $_$
SELECT $2 &@~ $3
$_$ LANGUAGE SQL IMMUTABLE; Where the named parameter at the end indicates the column for the operator. |
So far looks like #2805 (comment) is the simplest implementation for custom operators. @wolfgangwalther I was wondering if you have any concerns about that? I guess one disadvantage is that we lose type checking when adding new operators through a custom config. It will also be harder to infer correct typescript types later. Hm, perhaps we leave the configurable query grammar just for restricting operators and not adding them? One alternative now that we have the CREATE FUNCTION postgrest.eq(anyelement, anyelement) RETURNS bool
AS $_$
SELECT $1 = $2
$_$ LANGUAGE SQL IMMUTABLE;
CREATE FUNCTION postgrest.fts(anyelement, anyelement) RETURNS bool
AS $_$
SELECT $1 @@ to_tsquery($2)
$_$ LANGUAGE SQL IMMUTABLE;
-- later on my thinking is that we could support
CREATE FUNCTION postgrest.fts(text, anyelement) RETURNS bool
AS $_$
SELECT $1 @@ to_tsquery($2)
$_$ LANGUAGE SQL IMMUTABLE;
-- that's basically saying that clients can only use the `fts` operators on text columns And then we enable them as operators for the query grammar. That doesn't sound bad actually. |
I don't like it. We should not do stuff that should be done via database objects, via config. PostgREST's goal is to "expose a, potentially already existing, database schema". So the first thing to do should be do interpret the existing schema and expose it in a way that makes sense. You showed multiple examples of how we could use create function eq(text, text) returns bool
immutable
return $1 = $2; We can't use it as an RPC, because we don't support multiple unnamed arguments for those. It's returning a bool, which is a strong indication, that it's supposed to be used in a Our query/request syntax has the following "identifiers" it can use:
None of those are applicable to this here. Short of some constants and keywords for syntax, we only have operators left in our query syntax. So it's pretty natural to use this.
I don't see how "the Also: We want to expose those operators.. so they should be in the exposed schema, right?
I like the suggestion you made above more:
I don't like that either. That's a lot of magic. I didn't have the use-case to disable some operators (or other things) for specific routes (or enable them only for one etc.). If we do anything remotely like that, we should make it some kind of general solution that would work with much more than just this specific case. Operators should not depend on context. They should only depend on their two arguments. |
Agree overall. If anything adding new operators via functions looks like the way forward. However this part:
Still doesn't solve enabling operators per column, say only enabling But we can leave that for later. I'm not sure if #2805 is the best way. |
Seems implementing this is not trivial. Right now we validate operators on Implementation is kinda similar to #2825 in that we have builtins and we can override/enhance them. When implementing this we might also consider supporting using the operator on the left side of the query param #2066 |
Looks like the above holds up pretty good. I've been testing: WITH
x AS (
SELECT n.nspname as "Schema",
p.proname as "Name",
pg_catalog.pg_get_function_result(p.oid) as "Result data type",
pg_catalog.pg_get_function_arguments(p.oid) as "Argument data types",
CASE p.prokind
WHEN 'a' THEN 'agg'
WHEN 'w' THEN 'window'
WHEN 'p' THEN 'proc'
ELSE 'func'
END as "Type",
CASE
WHEN p.provolatile = 'i' THEN 'immutable'
WHEN p.provolatile = 's' THEN 'stable'
WHEN p.provolatile = 'v' THEN 'volatile'
END as "Volatility",
CASE
WHEN p.proparallel = 'r' THEN 'restricted'
WHEN p.proparallel = 's' THEN 'safe'
WHEN p.proparallel = 'u' THEN 'unsafe'
END as "Parallel",
pg_catalog.pg_get_userbyid(p.proowner) as "Owner",
CASE WHEN prosecdef THEN 'definer' ELSE 'invoker' END AS "Security",
pg_catalog.array_to_string(p.proacl, E'\n') AS "Access privileges",
l.lanname as "Language",
COALESCE(pg_catalog.pg_get_function_sqlbody(p.oid), p.prosrc) as "Source code",
pg_catalog.obj_description(p.oid, 'pg_proc') as "Description"
FROM pg_catalog.pg_proc p
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = p.pronamespace
LEFT JOIN pg_catalog.pg_language l ON l.oid = p.prolang
WHERE pg_catalog.pg_function_is_visible(p.oid)
ORDER BY 1, 2, 4)
SELECT * FROM x
WHERE "Description" ILIKE '%operator%' AND "Result data type" = 'boolean' AND "Volatility" = 'immutable';
-[ RECORD 1 ]-------+---------------------------------------------------------
Schema | pg_catalog
Name | aclitemeq
Result data type | boolean
Argument data types | aclitem, aclitem
Type | func
Volatility | immutable
Parallel | safe
Owner | postgres
Security | invoker
Access privileges |
Language | internal
Source code | aclitem_eq
Description | implementation of = operator
-[ RECORD 2 ]-------+---------------------------------------------------------
Schema | pg_catalog
Name | array_eq
Result data type | boolean
Argument data types | anyarray, anyarray
Type | func
Volatility | immutable
Parallel | safe
Owner | postgres
Security | invoker
Access privileges |
Language | internal
Source code | array_eq
Description | implementation of = operator
-[ RECORD 3 ]-------+---------------------------------------------------------
Schema | pg_catalog
Name | array_ge
Result data type | boolean
Argument data types | anyarray, anyarray
Type | func
Volatility | immutable
Parallel | safe
Owner | postgres
Security | invoker
Access privileges |
Language | internal
Source code | array_ge
Description | implementation of >= operator
... And every operator we expose is inside that list. Some like The main problem is the naming, most functions contain a good name but it's usually concatenated with the types. If we could get a good name from the internal functions then we wouldn't need to hardcode the operators in the code nor we would need to instruct the user to create new functions. |
Considering the that it's not easy to add support for custom operators (ref). I'm tempted to just hardcode the pgroonga operators for now. They recently released a how-to for postgREST that uses dynamic SQL which doesn't look good at all: https://pgroonga.github.io/ja/how-to/postgrest.html pgroonga is on nixpkgs too, which should make things simpler. |
PGroonga offers performance and usability advantages over pg native textsearch and pg_trgm.
Tutorial: https://pgroonga.github.io/tutorial/
Reference : https://pgroonga.github.io/reference/
Proposal
Since pgroonga operators work on regular data types like
text
andjsonb
, we can conditionally enable them on a column if it's indexed. This is to prevent expensive searches on unindexed columns.Then to do
SELECT * FROM table WHERE column &@~ 'PostgreSQL';
:(&@~ is
groonga_query
for now).Then we can document this in an integrations(or extensions) page like we did for timescale.
The text was updated successfully, but these errors were encountered: