-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add SearchAfterMixin for ES search_after capability #4536
base: master
Are you sure you want to change the base?
Conversation
2262eb3
to
a70ebb7
Compare
2b92b48
to
d93f032
Compare
mock_document = MagicMock() | ||
mock_search = MagicMock() | ||
mock_document.search.return_value = mock_search | ||
|
||
mock_search.query.return_value = mock_search | ||
mock_search.sort.return_value = mock_search | ||
mock_search.extra.return_value = mock_search | ||
|
||
mock_result1 = MagicMock() | ||
mock_result1.pk = 1 | ||
mock_result1.meta.sort = ["sort1"] | ||
|
||
mock_result2 = MagicMock() | ||
mock_result2.pk = 2 | ||
mock_result2.meta.sort = ["sort2"] | ||
|
||
mock_search.execute.side_effect = [ | ||
[mock_result1, mock_result2], | ||
[], | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the right way to test. This is mocking everything, including ES search after behavior. All the ES related tests in course-discovery hit test ES instance. The search after mixin tests should aim for that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the tests to use ElasticsearchTestMixin
instead.
break | ||
|
||
all_ids.update(ids) | ||
search_after = results[-1].meta.sort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add safety check here, like
search_after = results[-1].meta.sort if results[-1] else None
Also, have you tried it locally? By default, results[-1].meta.sort
is a Elastic.AttrList
, which is not json serializable. Using results[-1].meta.sort.copy()
will convert it into a simple list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch for the search_after check.
I've tested this locally, I'm using this search_after
value just to be sent back to the ES query execution.
d93f032
to
7a8f3b7
Compare
identified_course_ids.update( | ||
i.key | ||
for i in CourseRun.search( | ||
query, | ||
partner=ESDSLQ('term', partner=partner.short_code), | ||
identifiers=ESDSLQ('terms', **{'key.raw': course_run_ids}) | ||
).source(['key']) | ||
) | ||
if course_uuids: | ||
course_uuids = [UUID(course_uuid) for course_uuid in course_uuids.split(',')] | ||
specified_course_ids += course_uuids | ||
|
||
log.info(f"Specified course ids: {specified_course_ids}") | ||
identified_course_ids.update( | ||
Course.search( | ||
query, | ||
partner=ESDSLQ('term', partner=partner.short_code), | ||
identifiers=ESDSLQ('terms', **{'uuid': course_uuids}) | ||
).values_list('uuid', flat=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from this code that updates the querying logic for ES (so rather than fetching all the products AND THEN filter it, we're fetching only the necessary products), this is just a copy of https://github.com/openedx/course-discovery/blob/a948a741b1839e845e07c762bab6ee51622daea5/course_discovery/apps/api/v1/views/catalog_queries.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change not compatible with v1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requires changing the pkSearchableMixin
a bit as well if we were to keep a single CatalogQueryContainsViewSet
.
7a8f3b7
to
de8f6a3
Compare
PROD-4233
Adds a new
SearchAfterMixin
to be added in place ofPkSearchableMixin
that allows using search_after. Once this mixin is used, it will bypass the default search limit of 10k by making multiple calls to ES in case we have more than 10k records in an index.Previously, we faced an issue regarding the search limit, resulting in less records to be returned. We increased the
MAX_RESULT_WINDOW
before but a better way is to use search_after capability for an optimal and flexible result.Testing Instructions:
update_index
locally in Discovery shell.PkSearchableMixin
withSearchAfterMixin
in Course model./api/v1/catalog/query_contains/
endpoint and add a sample query like this: http://localhost:18381/api/v1/catalog/query_contains/?course_uuids=2de67490-f748-4efd-8532-b445f7ecc6f9,f9f1e100-668a-4fd5-a966-a127de1f69de&query=org:edXYou can set
ELASTICSEARCH_DSL_QUERYSET_PAGINATION
to your specific value in order to test the behavior. While the end result will be the same but this can affect the number of times the search_after mechanism is called.