You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Solr 8 (since Solr 5) has a documented bug in Lucene that, on the presence on a special character in a quoted phrase, triggers the use of a SpanQuery (internally) generating un unawareness of the real offset (gaps) of words that impedes a phrase that exist in the right order of tokens in the index to match. Basically complex ADO labels and whole phrases when send to Solr via the Lucene parser (no slope) will not match.
The solution is to move to a Lucene that has the patch which is what is "newest" right now, Lucene 9, sol Solr 9. apache/lucene@98dafe2
The actual implications of migrating to Solr 9 imply solrconfig, schema, types and OCR plugin changes but will be dealt on the new release on archipelago-deployment and deployment-live (tested and works very well) but for now, we need to make code compatible with 8 and 9 too.
9 uses the Unified Highlight component by default. Because Drupal treats (and exposes via the UI) all Full Text Search API fields a "group of things that are all equal" unified will fail in any of these does not contain the field properties to store offsets and vector positions at all. But not just fail, basically give a Java alert and die. So the idea here is to force the default (original) highlight component which is the default in 8 everywhere we are in charge of Highlights.
So:
First, make all this play Solr 9. I will for original highlighter to avoid unexpected issues like NULL POINTERS and classes that can not be cast into others from Solr (new version already found those).
Second. Parse, treat keys coming from a phrase v/s individual terms differently. I already build this which can dissect direct queries into keywords
But it calls an inherited method $this->flattenKeysArray($keys); that kills phrases. So I need to override it
Third: on a highlight return, remove all HTML (so don't use the original highlight) IF at least one of the keys was a phrase (smart, less over processing for the normal cute cats queries people will do, then apply links and highlights over those manually
Fourth. no fourth.
@alliomeria this is what I promised. Hope this makes sense
The text was updated successfully, but these errors were encountered:
What?
Solr 8 (since Solr 5) has a documented bug in Lucene that, on the presence on a special character in a quoted phrase, triggers the use of a SpanQuery (internally) generating un unawareness of the real offset (gaps) of words that impedes a phrase that exist in the right order of tokens in the index to match. Basically complex ADO labels and whole phrases when send to Solr via the Lucene parser (no slope) will not match.
The solution is to move to a Lucene that has the patch which is what is "newest" right now, Lucene 9, sol Solr 9.
apache/lucene@98dafe2
The actual implications of migrating to Solr 9 imply solrconfig, schema, types and OCR plugin changes but will be dealt on the new release on archipelago-deployment and deployment-live (tested and works very well) but for now, we need to make code compatible with 8 and 9 too.
9 uses the Unified Highlight component by default. Because Drupal treats (and exposes via the UI) all Full Text Search API fields a "group of things that are all equal" unified will fail in any of these does not contain the field properties to store offsets and vector positions at all. But not just fail, basically give a Java alert and die. So the idea here is to force the default (original) highlight component which is the default in 8 everywhere we are in charge of Highlights.
So:
First, make all this play Solr 9. I will for
original
highlighter to avoid unexpected issues like NULL POINTERS and classes that can not be cast into others from Solr (new version already found those).Second. Parse, treat keys coming from a phrase v/s individual terms differently. I already build this which can dissect direct queries into keywords
strawberryfield/src/Plugin/search_api/processor/StrawberryFieldHighlight.php
Line 392 in 573ffa4
But it calls an inherited method
$this->flattenKeysArray($keys);
that kills phrases. So I need to override itThird: on a highlight return, remove all HTML (so don't use the original highlight) IF at least one of the keys was a phrase (smart, less over processing for the normal
cute cats
queries people will do, then apply links and highlights over those manuallyFourth. no fourth.
@alliomeria this is what I promised. Hope this makes sense
The text was updated successfully, but these errors were encountered: