Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: update limit for es response #4303

Merged
merged 1 commit into from
Apr 9, 2024
Merged

fix: update limit for es response #4303

merged 1 commit into from
Apr 9, 2024

Conversation

Ali-D-Akbar
Copy link
Contributor

@Ali-D-Akbar Ali-D-Akbar commented Mar 26, 2024

PROD-3986
Adds a setting to update_index. This will increase the max_result_window for all existing indexes and subsequently update any newly created index as well.

Also updates the unit test that previously didn't support a custom value of max_result_window in ES related logic.
Keeping the same limit count for unit tests as we won't reach this limit while testing.

@Ali-D-Akbar Ali-D-Akbar force-pushed the aakbar/PROD-3986 branch 4 times, most recently from 315212d to 8f47923 Compare March 28, 2024 10:49
Comment on lines 130 to 131
conn.indices.put_settings(body={"index": {"max_result_window": settings.MAX_RESULT_WINDOW}})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is global setting, it should be set only once. it should not be set in a periodically running management command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting is added to each index separately. Having it in this place specifically works as all the new/updated indexes will have this setting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the operation idempotent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is.

@Ali-D-Akbar Ali-D-Akbar force-pushed the aakbar/PROD-3986 branch 4 times, most recently from c58dc9f to b24cdba Compare March 29, 2024 12:09
@Ali-D-Akbar Ali-D-Akbar marked this pull request as ready for review March 29, 2024 12:28
Copy link
Contributor

@zawan-ila zawan-ila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need it in 2 management commands? Wouldn't the settings update when only one runs? Would it be nicer to extract this into a separate management command?

@Ali-D-Akbar
Copy link
Contributor Author

Ali-D-Akbar commented Mar 29, 2024

Why do we need it in 2 management commands?

To update this setting for both cases where we're either updating (through update_index) or creating new/rebuilding indexes (through search_index).

Wouldn't the settings update when only one runs?

In case when we create new index, it won't have the settings. These settings work for pre-existing indexes.

Would it be nicer to extract this into a separate management command?

That's debatable although the same command will be needed to run in both of the commands above.

Comment on lines 21 to 24
try:
self.es.indices.put_settings(body={"index": {"max_result_window": settings.MAX_RESULT_WINDOW}})
except NotFoundError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally am not a fan of this, specially in unit tests. Its a recipe for weirdness and unexpected behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, no setting is added when there is no index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

under what Testcases is this valid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see any tests for this. Not sure why this should be required in the first place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some testing, can you try this in setUpclass? I still dont understand why this is needed for tests as there are not any items in unit tests thaa can cross the threshold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The settings needs to be updated bcz I changed ELASTICSEARCH_DSL_QUERYSET_PAGINATION, ELASTICSEARCH_DSL_LOAD_PER_QUERY to 15000 in base.py.
Should I set this to 10k for tests only?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, yeah I think we can keep the settings as it is for tests and not complicate this unnecessarily.

@@ -8,6 +9,7 @@
from course_discovery.apps.course_metadata.tests.factories import CourseFactory, CourseRunFactory


@pytest.mark.django_db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? I am assuming APITestCase is child of django or DRF's TestCase which takes care of Django DB ops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it can be removed. 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, go ahead then.

Comment on lines 63 to 67
try:
es_connection = get_connection()
es_connection.indices.put_settings(body={"index": {"max_result_window": settings.MAX_RESULT_WINDOW}})
except NotFoundError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what cases will NotFoundError raise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case if no index is created, we won't be adding this setting to any index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? This is being done inside _create. Under what circumstances will we not create an index despite calling create?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to put this logic inside the ElasticSearchUtils class. Then you might be able to get by by calling it within ElasticsearchUtils.create_index.

Copy link
Contributor Author

@Ali-D-Akbar Ali-D-Akbar Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zawan-ila can you elaborate on where I should put these settings in the utils?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what @zawan-ila is referring to https://github.com/openedx/course-discovery/blob/master/course_discovery/apps/core/utils.py#L28. When calling create_index, this result_window update can be done there (possibly without try/catch?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried adding this settings isn't being added for all indexes. Seems like it needs to be added after all the indexes and aliases are defined.

Comment on lines 21 to 24
try:
self.es.indices.put_settings(body={"index": {"max_result_window": settings.MAX_RESULT_WINDOW}})
except NotFoundError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see any tests for this. Not sure why this should be required in the first place?

Comment on lines 63 to 67
try:
es_connection = get_connection()
es_connection.indices.put_settings(body={"index": {"max_result_window": settings.MAX_RESULT_WINDOW}})
except NotFoundError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to put this logic inside the ElasticSearchUtils class. Then you might be able to get by by calling it within ElasticsearchUtils.create_index.

@@ -127,6 +127,8 @@ def _update(self, models, options):
for index_alias_mapper in alias_mappings:
index_alias_mapper.registered_index._name = index_alias_mapper.alias # pylint: disable=protected-access

conn.indices.put_settings(body={"index": {"max_result_window": settings.MAX_RESULT_WINDOW}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment above, can we place this inside ElasticsearchUtils?

@Ali-D-Akbar Ali-D-Akbar force-pushed the aakbar/PROD-3986 branch 2 times, most recently from 3cc82b8 to bee2823 Compare April 4, 2024 09:22
@@ -2624,8 +2624,7 @@ def get_expected_data(cls, person, request):


@pytest.mark.django_db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might not need django_db mark if you are already using Django testcase.

@Ali-D-Akbar Ali-D-Akbar force-pushed the aakbar/PROD-3986 branch 2 times, most recently from 69709a0 to 141c3c9 Compare April 9, 2024 11:00
@@ -117,11 +117,13 @@ def _update(self, models, options):
)
if record_count_is_sane:
ElasticsearchUtils.set_alias(conn, alias, new_index_name)
ElasticsearchUtils.update_max_result_window(conn, settings.MAX_RESULT_WINDOW, new_index_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be really nice if we could do this just once at the start of the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be done only after set_alias.

@Ali-D-Akbar Ali-D-Akbar merged commit a700f20 into master Apr 9, 2024
12 checks passed
@Ali-D-Akbar Ali-D-Akbar deleted the aakbar/PROD-3986 branch April 9, 2024 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants