-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: implement improved fulltext search #671
base: main
Are you sure you want to change the base?
Conversation
When MC failed, this would restart the container forever, even when the Alexandria dev env was stopped
f6d10e6
to
c819a7d
Compare
c819a7d
to
bdf7198
Compare
637db77
to
e36f534
Compare
Status update: I let some custom file/document factories run overnight, created over 400k files and then used the FTS in this branch to search. Using the new commandline utility, I was able to run some tests - general query duration is between 0.05 and 0.2 seconds. |
This makes the search endpoint resemble a different data type (not "files" anymore!) Also, the search behaves now rather differently: * Search results are ordered by search rank * Search results get the search context (a fragment of the text) as part of the response * File and document are referenced as related fields (and includes are available) This should provide a highly performant, useful search that does what it should[tm] BREAKING CHANGE: This changes the structure and type of the search endpoint's data.
This helps selecting files when we're searching (or otherwise looking for files) and only want the newest version
This runs the search as it were run through the search endpoint. Note this is mainly to be used for performance testing. No auth support and no visibility support exists currently. If you enable visibility / auth, it will likely break or not return anything.
e36f534
to
e96aff8
Compare
) | ||
queryset = queryset.annotate( | ||
search_rank=SearchRank(F("content_vector"), search_query), | ||
search_context=SearchHeadline(F("content_text"), search_query), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the search matches with the file name but not content, then there will be no context. But that should be ok.
This makes the search endpoint resemble a different data type (not "files"
anymore!) Also, the search behaves now rather differently:
as part of the response
This should provide a highly performant, useful search that does what it should[tm]
BREAKING CHANGE: This changes the structure and type of the search
endpoint's data.
Drive-By: chore: do not restart minio config container
When MC failed, this would restart the container forever, even when the Alexandria dev env was stopped
Drive-By: feat(filters): add "only_newest" filter for files
This helps selecting files when we're searching (or otherwise looking
for files) and only want the newest version
Drive-By: feat(cmdline): search utility
This runs the search as it were run through the search endpoint.
Note this is mainly to be used for performance testing. No auth
support and no visibility support exists currently. If you enable
visibility / auth, it will likely break or not return anything.