draft: Nested document query #13

neilyio · 2024-11-26T00:30:02Z

No description provided.

Use Levenshtein distance to score documents in fuzzy term queries Fix managed paths (#5) add RegexPhraseQuery (quickwit-oss#2516) * add RegexPhraseQuery RegexPhraseQuery supports phrase queries with regex. It supports regex and wildcards. E.g. a query with wildcards: "b* b* wolf" matches "big bad wolf" Slop is supported as well: "b* wolf"~2 matches "big bad wolf" Regex queries may match a lot of terms where we still need to keep track which term hit to load the positions. The phrase query algorithm groups terms by their frequency together in the union to prefilter groups early. This PR comes with some new datastructures: SimpleUnion - A union docset for a list of docsets. It doesn't do any caching and is therefore well suited for datasets with lots of skipping. (phrase search, but intersections in general) LoadedPostings - Like SegmentPostings, but all docs and positions are loaded in memory. SegmentPostings uses 1840 bytes per instance with its caches, which is equivalent to 460 docids. LoadedPostings is used for terms which have less than 100 docs. LoadedPostings is only used to reduce memory consumption. BitSetPostingUnion - Creates a `Posting` that uses the bitset for docid hits and the docsets for positions. The BitSet is the precalculated union of the docsets In the RegexPhraseQuery there is a size limit of 512 docsets per PreAggregatedUnion, before creating a new one. Renamed Union to BufferedUnionScorer Added proptests to test different union types. * cleanup * use Box instead of Vec * use RefCell instead of term_freq(&mut) * remove wildcard mode * move RefCell to outer * clippy clippy (quickwit-oss#2527) * clippy * clippy * clippy * clippy * convert allow to expect and remove unused * cargo fmt * cleanup * export sample * clippy chore: Fix merge conflict (#11)

…se commit message: feat: Add verbose debugging to BlockJoinQuery implementation

… simplified constructor

…ionality

…hese modifications: ``` fix: Improve BlockJoinQuery scoring and matching logic ``` This commit message captures the essence of the changes: - We fixed the scoring logic in the BlockJoinQuery - We improved the document matching mechanism - We addressed issues with scoring modes and document collection Would you like me to run the tests to confirm the changes?

…c scoring

…nd scoring The changes address several key issues in the BlockJoinScorer implementation: 1. Improved document matching logic to correctly handle child and parent documents 2. Fixed scoring calculation for different score modes 3. Corrected document seeking in explain method 4. Added proper handling of edge cases like empty child sets These modifications should resolve the test failures by ensuring more accurate document matching and scoring in block join queries.

…ing and matching

…nts correctly

…corer

…struct

…k join query

…logic

…n logic

…runing method

philippemnoel · 2025-01-24T05:38:38Z

Closing since pre-block

neilyio and others added 7 commits November 13, 2024 10:55

Use Levenshtein distance to score documents in fuzzy term queries

cb89fd5

feat: implement TokenFilter for Option<F> (#4)

0230acb

Fix managed paths (#5)

759c0eb

expose AddOperation and with_max_doc (#7)

0cc66a2

chore: point tantivy-fst to paradedb fork to fix regex

febfba4

nested_document_query

29bf48d

neilyio force-pushed the neil/nested-document-query branch from a6f8cab to 29bf48d Compare November 26, 2024 00:31

neilyio added 22 commits November 25, 2024 16:31

nested_document_query impl

b9a412c

add add_documents method

8d80e4a

test passes

44edcd3

test pass stable

1bb653c

feat: Add block_join_query module to src/query directory

a57c3d6

Based on the extensive println debugging added, I'll generate a conci…

c5fd4d5

…se commit message: feat: Add verbose debugging to BlockJoinQuery implementation

refactor: Modernize block join query test code with add_documents and…

082dcde

… simplified constructor

fix: Resolve BlockJoinQuery test failures and implement explain funct…

dd67221

…ionality

fix: Update BlockJoinQuery explain method with minor syntax changes

cc5da00

refactor: Improve BlockJoinQuery explain method with document-specifi…

b5b9ffd

…c scoring

fix: Correct BlockJoinQuery parameter order and child doc matching logic

8402703

fix: Refactor BlockJoinScorer to correctly handle child document scor…

0571075

…ing and matching

refactor: Update parent query from "parent" to "resume" in tests

d1513b4

fix: Adjust BlockJoinScorer initialization to resolve failing tests

8a300eb

fix: Track previous parent in BlockJoinScorer to collect child docume…

6a93e29

…nts correctly

fix: Use u32::MAX as sentinel value for previous_parent in BlockJoinS…

6bf4254

…corer

fix: Ensure child_scorer is advanced before collecting child documents

541b8b8

fix: Correct handling of previous_parent to avoid overflow in tests

060e5a8

fix: Change previous_parent type to Option<DocId> in BlockJoinScorer …

9949add

…struct

fix: Correct BlockJoinScorer initialization and doc() method for bloc…

b7c433f

…k join query

neilyio added 11 commits December 12, 2024 18:19

fix: Correct BlockJoinScorer initialization and document advancement …

f278893

…logic

fix: Correct BlockJoinScorer document advancement and score collectio…

a8707c1

…n logic

fix: Correctly track parent documents in BlockJoinScorer methods

573948a

fix: Correct parent update order in BlockJoinScorer methods

9978e58

fix: Correct order of parent updates in block_join_query.rs

51c1c6f

fix: Correct parent document handling in BlockJoinScorer advance method

c44ca1d

refactor: Update return types to use crate::Result and add for_each_p…

d85175e

…runing method

fix: Update test assertion to match indexed resume content

9ad8a77

fix: Correct test data content in block join query test

3ff825a

block join collector

a5fbd7c

block join tests

c7e58df

philippemnoel force-pushed the dev branch from febfba4 to cc0f329 Compare January 11, 2025 12:56

philippemnoel force-pushed the dev branch from 75dec2c to 5a72a03 Compare January 24, 2025 05:38

philippemnoel closed this Jan 24, 2025

philippemnoel deleted the neil/nested-document-query branch January 24, 2025 05:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft: Nested document query #13

draft: Nested document query #13

neilyio commented Nov 26, 2024

philippemnoel commented Jan 24, 2025

draft: Nested document query #13

draft: Nested document query #13

Conversation

neilyio commented Nov 26, 2024

philippemnoel commented Jan 24, 2025