feat!: implement improved fulltext search #671

winged · 2024-10-17T16:58:00Z

This makes the search endpoint resemble a different data type (not "files"
anymore!) Also, the search behaves now rather differently:

Search results are ordered by search rank
Search results get the search context (a fragment of the text)
as part of the response
File and document are referenced as related fields (and includes are available)

This should provide a highly performant, useful search that does what it should[tm]

BREAKING CHANGE: This changes the structure and type of the search
endpoint's data.

Drive-By: chore: do not restart minio config container

When MC failed, this would restart the container forever, even when the Alexandria dev env was stopped

Drive-By: feat(filters): add "only_newest" filter for files

This helps selecting files when we're searching (or otherwise looking
for files) and only want the newest version

Drive-By: feat(cmdline): search utility

This runs the search as it were run through the search endpoint.
Note this is mainly to be used for performance testing. No auth
support and no visibility support exists currently. If you enable
visibility / auth, it will likely break or not return anything.

When MC failed, this would restart the container forever, even when the Alexandria dev env was stopped

alexandria/core/filters.py

alexandria/core/tests/test_search.py

winged · 2024-10-23T09:01:24Z

Status update: I let some custom file/document factories run overnight, created over 400k files and then used the FTS in this branch to search.

Using the new commandline utility, I was able to run some tests - general query duration is between 0.05 and 0.2 seconds.

This makes the search endpoint resemble a different data type (not "files" anymore!) Also, the search behaves now rather differently: * Search results are ordered by search rank * Search results get the search context (a fragment of the text) as part of the response * File and document are referenced as related fields (and includes are available) This should provide a highly performant, useful search that does what it should[tm] BREAKING CHANGE: This changes the structure and type of the search endpoint's data.

This helps selecting files when we're searching (or otherwise looking for files) and only want the newest version

This runs the search as it were run through the search endpoint. Note this is mainly to be used for performance testing. No auth support and no visibility support exists currently. If you enable visibility / auth, it will likely break or not return anything.

Yelinz · 2024-10-24T18:45:48Z

alexandria/core/filters.py

-        )
+        queryset = queryset.annotate(
+            search_rank=SearchRank(F("content_vector"), search_query),
+            search_context=SearchHeadline(F("content_text"), search_query),


If the search matches with the file name but not content, then there will be no context. But that should be ok.

chore: do not restart minio config container

c5b722c

When MC failed, this would restart the container forever, even when the Alexandria dev env was stopped

winged force-pushed the cleanup_search branch from f6d10e6 to c819a7d Compare October 17, 2024 16:58

winged requested review from czosel, open-dynaMIX and Yelinz October 17, 2024 16:58

winged force-pushed the cleanup_search branch from c819a7d to bdf7198 Compare October 21, 2024 14:06

winged commented Oct 22, 2024

View reviewed changes

alexandria/core/filters.py Outdated Show resolved Hide resolved

Yelinz reviewed Oct 22, 2024

View reviewed changes

alexandria/core/filters.py Outdated Show resolved Hide resolved

alexandria/core/tests/test_search.py Outdated Show resolved Hide resolved

winged force-pushed the cleanup_search branch from 637db77 to e36f534 Compare October 22, 2024 15:48

winged added 3 commits October 24, 2024 09:17

feat(filters): add "only_newest" filter for files

2b42cb9

This helps selecting files when we're searching (or otherwise looking for files) and only want the newest version

winged force-pushed the cleanup_search branch from e36f534 to e96aff8 Compare October 24, 2024 07:17

Yelinz reviewed Oct 24, 2024

View reviewed changes

Yelinz approved these changes Oct 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: implement improved fulltext search #671

feat!: implement improved fulltext search #671

winged commented Oct 17, 2024 •

edited

Loading

winged commented Oct 23, 2024

Yelinz Oct 24, 2024

feat!: implement improved fulltext search #671

Are you sure you want to change the base?

feat!: implement improved fulltext search #671

Conversation

winged commented Oct 17, 2024 • edited Loading

Drive-By: chore: do not restart minio config container

Drive-By: feat(filters): add "only_newest" filter for files

Drive-By: feat(cmdline): search utility

winged commented Oct 23, 2024

Yelinz Oct 24, 2024

Choose a reason for hiding this comment

winged commented Oct 17, 2024 •

edited

Loading