feat: add "totsv" modifier to the "fts" operator #3845

laurenceisla · 2025-01-14T01:49:48Z

When an fts operator is used, then it applies to_tsvector to the filtered column. For example:

/table?column=fts(simple).test

is equivalent to this in SQL:

SELECT *
FROM table
WHERE to_tsvector('simple', column) = to_tsquery('simple', 'test')

If the `totsv` modifier (short for `to_tsvector`) is used for `fts` operators then it applies `to_tsvector` to the filtered column. For example:
/table?column=totsv.fts(simple).test

Is equivalent to this in SQL:

SELECT * FROM table WHERE to_tsvector('simple', column) = to_tsquery('simple', 'test')

docs/references/api/tables_views.rst

laurenceisla · 2025-01-17T22:02:32Z

CHANGELOG.md

@@ -26,6 +26,7 @@ This project adheres to [Semantic Versioning](http://semver.org/).
 - #3795, Clarify `Accept: vnd.pgrst.object` error message - @steve-chavez
 - #3779, Always log the schema cache load time - @steve-chavez
 - #3706, Fix insert with `missing=default` uses default value of domain instead of column - @taimoorzaeem
+ - #2255, Fix `to_tsvector()` not applying explicitly to the full-text search filtered column (on non `tsvector` types) - @laurenceisla


I'm not entirely sure if this should be considered a new feature or a fix. I think the user expects the to_tsvector to be applied under the hood, so I'm adding it here.

So the issue says "enhancement", the PR title says "feat:" - and yet you conclude to add it to the "fixed" section :D

I'd look at this from the angle of backporting. I wouldn't backport it, at least in its current form. I understand you're applying this transformation to every column except those with type tsvector right now. This does have a considerably risk of breaking indexes, we only need to go back as far as when we did something similar with to_jsonb(...) around everything to allow accessing fields or composite types or so. It did break indexes.

Now.. what if we only apply this transformation for json/jsonb types? That would be a lot less risky, in which case we could think about backporting.

Although... even in this case, I'd still argue that this is more of a feature than a fix.

the PR title says "feat:"

Oh woops, I left that from the previous iteration (adding the totsv modifier looked more like a feat in that case for me). Yeah, I get it now.

I understand you're applying this transformation to every column except those with type tsvector right now. This does have a considerably risk of breaking indexes, we only need to go back as far as when we did something similar with to_jsonb(...) around everything to allow accessing fields or composite types or so

Yes, although there's a difference in that the to_tsvector() is already applied implicitly to the text types with the @@ operator.

text @@ tsquery → boolean Does text string, after implicit invocation of to_tsvector(), match tsquery? 'fat cats ate rats' @@ to_tsquery('cat & rat') → t

The other types will still fail but with a different error message, e.g. function to_tsquery(integer) does not exist instead of operator does not exist: integer @@ tsquery.

The other types will still fail but with a different error message, e.g. function to_tsquery(integer) does not exist instead of operator does not exist: integer @@ tsquery.

Which kind of brings me back to my original question: #2255 (comment)

Did anyone try creating the missing @@ operator for jsonb with a function that does this to_tsvector call explicitly?

Like.. do we actually need this ~~fix~~ feature, or could we just advise people to create the respective ~~cast~~ operator?

This does have a considerably risk of breaking indexes,

In fact, I think it's the contrary, actually. I've been checking more about FTS indexes and the docs say:

Because the two-argument version of to_tsvector was used in the index above, only a query reference that uses the 2-argument version of to_tsvector with the same configuration name will use that index. That is, WHERE to_tsvector('english', body) @@ 'a & b' can use the index, but WHERE to_tsvector(body) @@ 'a & b cannot. This ensures that an index will be used only with the same configuration used to create the index entries.

As it is right now if we a GIN index to a column of text type, PostgREST won't use it. This change (appying to all types, not only JSON) would fix that.

I fail to read your conclusion out of this.

However, I read something else:

Another approach is to create a separate tsvector column to hold the output of to_tsvector. To keep this column automatically up to date with its source data, use a stored generated column.

It seems to me that suggesting to our users to create generated columns with the tsvector output of the json column would solve the problem nicely. Then this generated column can be searched - without any additional operators or to_tsvector wrappers. It also scales much better, because this solution will easily allow users to add those config arguments ("english" etc.) or create the tsvector column over multiple columns.

If instead, we implement as proposed here, I see the next feature request coming very soon: "I need to pass 'english' to the two-argument-version of tsvector, please let PostgREST do that for me."

Like.. do we actually need this ~~fix~~ feature, or could we just advise people to create the respective ~~cast~~ operator?

It seems to me that suggesting to our users to create generated columns with the tsvector output of the json column would solve the problem nicely.

Yes, that'd be another solution. We'd need to mention that it won't work as expected for text or json columns and to use tsvector generated columns and give the suggestions you list. Nonetheless, I think having an out of the box option to use would be a good experience.

If instead, we implement as proposed here, I see the next feature request coming very soon: "I need to pass 'english' to the two-argument-version of tsvector, please let PostgREST do that for me."

Do you mean to_tsvector(config_language, text)? Cause... that's what it's implemented in this PR: col=fts(english).val in SQL would be to_tsvector('english', col) @@ to_tsquery('english', 'val') (I updated the original post of the PR to reflect this). I figured it made sense having the same config languages in the tsquery and the tsvector, so using the one in fts(language) for both shouldn't be a probem.

All in all, I'm on the fence here. I see the value in both having the implementation out of the box and recommending generated columns as an alternative and also for complex cases (computed fields are not included in the PR, for instance).

laurenceisla · 2025-01-17T22:29:55Z

src/PostgREST/Plan.hs

    res = fromMaybe (unknownField fn jp) $ HM.lookup qi tables >>=
-          Just . flip resolveTableFieldName fn
+          Just . (\t -> resolveTableFieldName t fn toTsV)


There was a performance concern here: #2255 (comment)

I don't think there will be a difference in performance since it's using this lookup that is already applied for all filters before this PR.

… column, only if it's not of tsvector type

laurenceisla force-pushed the feat-totsv branch 2 times, most recently from b9cee18 to b8f8db2 Compare January 14, 2025 16:14

laurenceisla marked this pull request as ready for review January 14, 2025 16:16

laurenceisla requested a review from steve-chavez January 14, 2025 16:16

steve-chavez reviewed Jan 14, 2025

View reviewed changes

docs/references/api/tables_views.rst Outdated Show resolved Hide resolved

laurenceisla force-pushed the feat-totsv branch from b8f8db2 to 5e61a05 Compare January 17, 2025 21:54

laurenceisla commented Jan 17, 2025

View reviewed changes

laurenceisla force-pushed the feat-totsv branch from 5e61a05 to 42b02b6 Compare January 17, 2025 22:13

laurenceisla commented Jan 17, 2025

View reviewed changes

laurenceisla requested a review from steve-chavez January 17, 2025 22:31

feat: apply to_tsvector() explicitly to the full-text search filtered…

846a45c

… column, only if it's not of tsvector type

laurenceisla force-pushed the feat-totsv branch from 42b02b6 to 846a45c Compare January 21, 2025 00:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add "totsv" modifier to the "fts" operator #3845

feat: add "totsv" modifier to the "fts" operator #3845

laurenceisla commented Jan 14, 2025 •

edited

Loading

laurenceisla Jan 17, 2025

wolfgangwalther Jan 19, 2025

laurenceisla Jan 20, 2025

wolfgangwalther Jan 20, 2025 •

edited

Loading

laurenceisla Jan 20, 2025 •

edited

Loading

wolfgangwalther Jan 20, 2025

laurenceisla Jan 20, 2025 •

edited

Loading

laurenceisla Jan 17, 2025

feat: add "totsv" modifier to the "fts" operator #3845

Are you sure you want to change the base?

feat: add "totsv" modifier to the "fts" operator #3845

Conversation

laurenceisla commented Jan 14, 2025 • edited Loading

laurenceisla Jan 17, 2025

Choose a reason for hiding this comment

wolfgangwalther Jan 19, 2025

Choose a reason for hiding this comment

laurenceisla Jan 20, 2025

Choose a reason for hiding this comment

wolfgangwalther Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

laurenceisla Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

wolfgangwalther Jan 20, 2025

Choose a reason for hiding this comment

laurenceisla Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

laurenceisla Jan 17, 2025

Choose a reason for hiding this comment

laurenceisla commented Jan 14, 2025 •

edited

Loading

wolfgangwalther Jan 20, 2025 •

edited

Loading

laurenceisla Jan 20, 2025 •

edited

Loading

laurenceisla Jan 20, 2025 •

edited

Loading