Preparing for Solr 9.x and Join/Highlight limitations #263

DiegoPino · 2023-04-05T14:54:01Z

What?

Solr 8 (since Solr 5) has a documented bug in Lucene that, on the presence on a special character in a quoted phrase, triggers the use of a SpanQuery (internally) generating un unawareness of the real offset (gaps) of words that impedes a phrase that exist in the right order of tokens in the index to match. Basically complex ADO labels and whole phrases when send to Solr via the Lucene parser (no slope) will not match.

The solution is to move to a Lucene that has the patch which is what is "newest" right now, Lucene 9, sol Solr 9.
apache/lucene@98dafe2

The actual implications of migrating to Solr 9 imply solrconfig, schema, types and OCR plugin changes but will be dealt on the new release on archipelago-deployment and deployment-live (tested and works very well) but for now, we need to make code compatible with 8 and 9 too.

9 uses the Unified Highlight component by default. Because Drupal treats (and exposes via the UI) all Full Text Search API fields a "group of things that are all equal" unified will fail in any of these does not contain the field properties to store offsets and vector positions at all. But not just fail, basically give a Java alert and die. So the idea here is to force the default (original) highlight component which is the default in 8 everywhere we are in charge of Highlights.

So:

First, make all this play Solr 9. I will for original highlighter to avoid unexpected issues like NULL POINTERS and classes that can not be cast into others from Solr (new version already found those).
Second. Parse, treat keys coming from a phrase v/s individual terms differently. I already build this which can dissect direct queries into keywords

strawberryfield/src/Plugin/search_api/processor/StrawberryFieldHighlight.php

Line 392 in 573ffa4

    
           protected function getKeywordsParseModeAware(QueryInterface $query, string $parse_mode_id) {

But it calls an inherited method $this->flattenKeysArray($keys); that kills phrases. So I need to override it
Third: on a highlight return, remove all HTML (so don't use the original highlight) IF at least one of the keys was a phrase (smart, less over processing for the normal cute cats queries people will do, then apply links and highlights over those manually
Fourth. no fourth.

@alliomeria this is what I promised. Hope this makes sense

The text was updated successfully, but these errors were encountered:

DiegoPino self-assigned this Apr 5, 2023

DiegoPino added enhancement New feature or request Typed Data and Search Search API F around and find out labels Apr 5, 2023

DiegoPino added this to the 1.1.0 milestone Apr 5, 2023

This was referenced Apr 5, 2023

Upgrade to Solr 9.1.x esmero/archipelago-deployment#242

Open

ISSUE-263: Improve HL for JOINS + Phrase Linking. #264

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preparing for Solr 9.x and Join/Highlight limitations #263

Preparing for Solr 9.x and Join/Highlight limitations #263

DiegoPino commented Apr 5, 2023 •

edited

Loading

Preparing for Solr 9.x and Join/Highlight limitations #263

Preparing for Solr 9.x and Join/Highlight limitations #263

Comments

DiegoPino commented Apr 5, 2023 • edited Loading

What?

DiegoPino commented Apr 5, 2023 •

edited

Loading