Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opinions index changes to improve MLT queries #4316

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

albertisfu
Copy link
Contributor

@albertisfu albertisfu commented Aug 15, 2024

Following comment #4305, this PR adds a new field to the Opinion Index mapping. It adds a field called combined_fields that combines multiple fields used to look for similar documents into a single one. This field uses the english_exact analyzer to avoid removing duplicates and avoid synonyms that could impact the quality of the query. Additionally, the field uses term_vector to only store fields, improving the performance of MTL queries as recommended in the documentation.

To test this approach, we can:

  1. Merge the PR.
  2. Within a maintenance pod, rename opinion_index = Index("opinion_index") to a different name, for instance "opinion_index_combined", in es_indices.py. This ensures the original production index won't be affected.
  3. Create the new index by running manage.py search_index --create --models search.OpinionCluster.
  4. Finally, perform a reindex:
POST
/_reindex?wait_for_completion=false
{
   "source":{
      "index":"opinion_index",
      "size": 1000000 # Remove this line to re_index all the documents.
   },
   "dest":{
      "index":"opinion_index_combined"
   }
}

This will return a task_id that we can use to monitor the task progress.

If we only want to assess the quality of results, reindexing a subset of the index may be sufficient. However, if we also want to assess the performance of this approach, indexing the entire index would be better.

@albertisfu albertisfu changed the title fix(elasticsearch): Tweaked Opinions index to improve MLT queries Opinions index changes to improve MLT queries Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant