-
-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'Related Case Law' section is not being shown on the Opinion page when ES is enabled. #4305
Comments
Clearing the cache didn't help. The query you requested returned:
Hm... :/ |
It's strange that the query above didn't return any results. I cloned the cluster Just to confirm, was the query executed in the I also simplified the query by removing everything that's not required. Can we test it, please?
And this other version is an optimized version that also incorporates additional parameters for the MLT query, which are used in the Solr version.
|
Hm, both of those queries gave the minimal non-response like before? |
yeah, now I was able to reproduce the issue locally. After indexing a couple of thousand documents, these queries stopped retrieving results. It seems related to the document frequency and terms queried. Another problem I can see is that related queries can be quite restrictive now since they include all the fields that in Solr were indexed within a single field that contained all the content. So this can also be an issue. I'm testing tuning the query, and I'll also test with a field that concentrates all the document fields using the |
Ok I did many tests around this issue. Initially, I thought the problem was directly related to field fragmentation in the MTL query, since in the Solr version it only looked up a single field. Now, the MTL fields are:
I added a field that combines all of these fields into a single one using the Then I realized that the issue was directly related to the analyzer used to index the content in this field and to perform the MTL query. Using Also, using the default analyzer for search ( After using the exact analyzers, the MLT used more relevant terms for the query, as can be inspected using a profiling query, and the related results seem more relevant with better quality. Then I realized that the problem using the MLT query with many fields could also be related to the analyzers and not due to field fragmentation. So I tweaked the fields to use their In this query version, I got very similar results to the For now, we can test this fixed query that uses the right fields and analyzer and also includes stop words. We can analyze the quality of the results, and if it seems okay and isn't too slow, maybe we can go with this approach without requiring the implementation of the #4316 approach.
|
Here's the response: Instant, but I haven't looked at the quality. |
:/ well actually the response didn't return any results. The content in the response is the profile of the query, and it shows that nothing was queried.
I think the query requires more tuning according to the index size, we can try to increment "max_doc_freq":1000, to 10000 or 100000 maybe there are many documents matched. And perhaps more debugging will be required to continue tuning the query. It'd bee easy to debug this issue having access to perform queries directly to the cluster, hopefully soon! |
Ah yes, my bad. I just increased |
This one would require to be completed before we shut down Solr. However we need to debug queries using the case law production index in the ES cluster. Currently, access to Kibana and the ES endpoint for developers is not working. I asked Sergei, and he can’t access either, so something might be wrong with the Kibana/ES interface. |
Didn't @flooie sort of fix this in his new opinion page that he's landing shortly? I sent a message to Ramiro to see if he can get kibana going again. |
I put this one onto the next sprint for you, @albertisfu, so we can think about it then. |
While working on #4211, I noticed that the related opinions in the 'Related Case Law' section on the Opinion detail page are not being shown in the ES version.
Example:
https://www.courtlistener.com/opinion/1472349/roe-v-wade/?type=o&q=Roe%20v.%20Wade&type=o&order_by=score%20desc
If you look at it while logged off (Solr version), you can see the 'Related Case Law' sidebar with opinions. However, if you view the same opinion while logged in as staff (ES version), the related opinions are not shown.
I believe this could be an issue related to caching, or it might be that the
more_like_this
query isn't matching the right results.@mlissner We could start by checking if deleting the
mlt-cluster-es cache
entry for this cluster helps:"Additionally, we can run this query in Kibana to ensure it’s returning the correct results.
GET /opinion_index/_search?pretty
The text was updated successfully, but these errors were encountered: