Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow fuzzy-search on credential identifiers #3526

Merged
merged 12 commits into from
Oct 2, 2023

Conversation

zepatrik
Copy link
Member

@zepatrik zepatrik commented Sep 21, 2023

This PR adds the ability to search for sub-strings and similar strings in credential identifiers.

Note that the postgres and CRDB migrations create special indexes useful for this feature. To use online schema changes with cockroach, we recommend to manually copy the index definition and run it before applying migrations. The migration will then be a no-op.
If you run on mysql (or sqlite), no special index is created. If desired, you can create such an index manually, and it would be highly appreciated if you could contribute its definition 🙏

This feature is a preview and might change in behavior! Similarity search is not expected to return deterministic results anyway, but are only really useful for humans.

//
// required: false
// in: query
CredentialsIdentifierSimilar string `json:"credentials_identifier_similar"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT of prefixing this as preview or something? We will probably need a separate search endpoint to deal with different use cases like:

  • identifier search
  • verifiable /recovery addresses search
  • traits search

Alternatively this could also be added to a new endpoint marked as experimental. For me it's primarily about finding the way we will be solving search in Ory to get a consistent API experience. That's why I think that this should be somehow marked as: this api will be changed once we have an appropriate search design doc

Copy link
Member

@aeneasr aeneasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

// Follow up: add page token support here, will be easy.
paginator := pop.NewPaginator(params.Page, params.PerPage)

if err := con.RawQuery(fmt.Sprintf(`SELECT DISTINCT identities.*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query is now build here because pop does not allow DISTINCT queries. We need this, because in many cases the query would return an identity twice, once for the webauthn and once for the password credential type.

@@ -0,0 +1 @@
CREATE INDEX IF NOT EXISTS identity_credential_identifiers_nid_identifier_gin ON identity_credential_identifiers USING GIN (nid, identifier gin_trgm_ops);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use online schema changes with cockroach, we recommend to manually copy this index definition and run it before applying migrations.

Comment on lines +680 to +682
default:
identifier = "%" + identifier + "%"
identifierOperator = "LIKE"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the mysql and sqlite case. Mysql does support full-text and fuzzy search, and special index types to allow them. However, it was not easy to understand at all, and I found no easy way to have the same behavior as postgres/CRDB. I think this is already an improvement on the current situation, and this feature is open for contributions!

@@ -28,7 +28,7 @@ jobs:
- sdk-generate
services:
postgres:
image: postgres:9.6
image: postgres:11.8
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres 9 is EoL by now, and 11 is currently the oldest supported version.

@codecov
Copy link

codecov bot commented Sep 21, 2023

Codecov Report

Merging #3526 (7ae1271) into master (7ae1271) will not change coverage.
The diff coverage is n/a.

❗ Current head 7ae1271 differs from pull request most recent head cd2cc35. Consider uploading reports for the commit cd2cc35 to get more accurate results

@@           Coverage Diff           @@
##           master    #3526   +/-   ##
=======================================
  Coverage   78.68%   78.68%           
=======================================
  Files         342      342           
  Lines       23470    23470           
=======================================
  Hits        18467    18467           
  Misses       3639     3639           
  Partials     1364     1364           

@zepatrik zepatrik force-pushed the feat/identity-search-similarity branch from 3319f35 to 5483892 Compare September 22, 2023 08:58
@aeneasr aeneasr merged commit 2cb3ea2 into master Oct 2, 2023
26 checks passed
@aeneasr aeneasr deleted the feat/identity-search-similarity branch October 2, 2023 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants