Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparing for Solr 9.x and Join/Highlight limitations #263

Open
DiegoPino opened this issue Apr 5, 2023 · 0 comments
Open

Preparing for Solr 9.x and Join/Highlight limitations #263

DiegoPino opened this issue Apr 5, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request Search API F around and find out Typed Data and Search
Milestone

Comments

@DiegoPino
Copy link
Member

DiegoPino commented Apr 5, 2023

What?

Solr 8 (since Solr 5) has a documented bug in Lucene that, on the presence on a special character in a quoted phrase, triggers the use of a SpanQuery (internally) generating un unawareness of the real offset (gaps) of words that impedes a phrase that exist in the right order of tokens in the index to match. Basically complex ADO labels and whole phrases when send to Solr via the Lucene parser (no slope) will not match.

The solution is to move to a Lucene that has the patch which is what is "newest" right now, Lucene 9, sol Solr 9.
apache/lucene@98dafe2

The actual implications of migrating to Solr 9 imply solrconfig, schema, types and OCR plugin changes but will be dealt on the new release on archipelago-deployment and deployment-live (tested and works very well) but for now, we need to make code compatible with 8 and 9 too.

9 uses the Unified Highlight component by default. Because Drupal treats (and exposes via the UI) all Full Text Search API fields a "group of things that are all equal" unified will fail in any of these does not contain the field properties to store offsets and vector positions at all. But not just fail, basically give a Java alert and die. So the idea here is to force the default (original) highlight component which is the default in 8 everywhere we are in charge of Highlights.

So:

First, make all this play Solr 9. I will for original highlighter to avoid unexpected issues like NULL POINTERS and classes that can not be cast into others from Solr (new version already found those).
Second. Parse, treat keys coming from a phrase v/s individual terms differently. I already build this which can dissect direct queries into keywords

protected function getKeywordsParseModeAware(QueryInterface $query, string $parse_mode_id) {

But it calls an inherited method $this->flattenKeysArray($keys); that kills phrases. So I need to override it
Third: on a highlight return, remove all HTML (so don't use the original highlight) IF at least one of the keys was a phrase (smart, less over processing for the normal cute cats queries people will do, then apply links and highlights over those manually
Fourth. no fourth.

@alliomeria this is what I promised. Hope this makes sense

@DiegoPino DiegoPino self-assigned this Apr 5, 2023
@DiegoPino DiegoPino added enhancement New feature or request Typed Data and Search Search API F around and find out labels Apr 5, 2023
@DiegoPino DiegoPino added this to the 1.1.0 milestone Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Search API F around and find out Typed Data and Search
Projects
None yet
Development

No branches or pull requests

1 participant