Opinions index changes to improve MLT queries #4316
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Following comment #4305, this PR adds a new field to the Opinion Index mapping. It adds a field called
combined_fields
that combines multiple fields used to look for similar documents into a single one. This field uses theenglish_exact
analyzer to avoid removing duplicates and avoid synonyms that could impact the quality of the query. Additionally, the field usesterm_vector
to only store fields, improving the performance of MTL queries as recommended in the documentation.To test this approach, we can:
opinion_index = Index("opinion_index")
to a different name, for instance "opinion_index_combined", ines_indices.py
. This ensures the original production index won't be affected.manage.py search_index --create --models search.OpinionCluster
.This will return a
task_id
that we can use to monitor the task progress.If we only want to assess the quality of results, reindexing a subset of the index may be sufficient. However, if we also want to assess the performance of this approach, indexing the entire index would be better.