Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace TermsQuery with TermInSetQuery #169

Closed
wants to merge 2 commits into from

Conversation

dcollins53
Copy link
Contributor

TermsQuery is removed in Lucene 7, so this replaces with TermInSetQuery (and Boolean wrapper where needed).
Tests pass on their own but seem to conflicy with the legacy numeric changes when both are merged in...

@mjustice3
Copy link
Contributor

This results in a functionally different presearcher query than before. The existing tests don't show a problem but it is fairly easy to construct a TestMultipassPresearcher test where the query has multiple fields which should pass but doesn't. E.g.

    @Test
    public void testMultipleFields() throws IOException, UpdateException {

        monitor.update(new MonitorQuery("1", "field1:(foo OR bar) AND field2:cormorant"));

        assertThat(monitor.match(buildDoc("doc1", "field1", "a badger walked into a bar", "field2", "cormorant"),
                SimpleMatcher.FACTORY))
                .hasQueriesRunCount(1)
                .hasMatchCount("doc1", 1);
}

Two correct ways I have found:

  1. Replace TermsQuery with its brute force BooleanQuery equivalent
  2. Use a Multimap (brings in a guava dependency) to sort the terms into fields as they are being added then use TermInSetQuery :
        List<Multimap<String, BytesRef>> terms = new ArrayList<Multimap<String,BytesRef>>(passes);

        @Override
        public void addTerm(String field, BytesRef term) throws IOException {
            for (int i = 0; i < passes; i++) {
                terms.get(i).put(field(field,i), term);
            }
        }

        @Override
        public Query build() {
            BooleanQuery.Builder parent = new BooleanQuery.Builder();
            for (int i = 0; i < passes; i++) {
                BooleanQuery.Builder bq = new BooleanQuery.Builder();
                for (String field : terms.get(i).keySet()) {
                    bq.add(new TermInSetQuery(field, terms.get(i).get(field)), BooleanClause.Occur.SHOULD);
                }
                parent.add(bq.build(), BooleanClause.Occur.MUST);
            }
            return parent.build();
        }

My testing shows a slight performance advantage for option 2 (although I'm seeing an overall matching performance degradation of ~20% in going from 6.3 to 7.3 --- mostly in the search phase and not the presearcher phase.

@dcollins53
Copy link
Contributor Author

Can you add that test case at least? I wasn't convinced about this to be frank, but it passed the tests that were present. Can you paste up some alternative PRs for the 2 approaches?

Though not ideal that 7.3 is generally slower than 6.3 :(

@mjustice3
Copy link
Contributor

PR #170 is my stab at a 7.3 port and uses option 1 above for replacing TermsQuery. mjustice3@d17bfd6 is a small patch on top of this PR for option 2.

@dcollins53 dcollins53 closed this Jun 17, 2019
@dcollins53 dcollins53 deleted the lucene7/termsquery branch June 17, 2019 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants