-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature](inverted index) add ordered functionality to match_phrase query #36356
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 40490 ms
|
TPC-DS: Total hot run time: 172901 ms
|
ClickBench: Total hot run time: 30.48 s
|
ba4c966
to
83cff3b
Compare
run buildall |
@@ -323,7 +323,7 @@ Status FullTextIndexReader::query(OlapReaderStatistics* stats, RuntimeState* run | |||
query_info.terms.emplace_back(search_str); | |||
} else { | |||
if (query_type == InvertedIndexQueryType::MATCH_PHRASE_QUERY) { | |||
RETURN_IF_ERROR(PhraseQuery::parser_slop(search_str, query_info)); | |||
PhraseQuery::parser_slop(search_str, query_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return status now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parse errors indicate no slop is needed
CL_NS_USE(index) | ||
CL_NS_USE(search) | ||
|
||
namespace doris::segment_v2 { | ||
|
||
class PostingsAndPosition { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete old private class PostingsAndPosition
}; | ||
|
||
using PhraseQueryPtr = std::unique_ptr<CL_NS(search)::PhraseQuery>; | ||
using Matcher = std::variant<ExactPhraseMatcher, OrderedSloppyPhraseMatcher, PhraseQueryPtr>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment to explain usage of different matchers
@@ -44,16 +140,19 @@ void PhraseQuery::add(const InvertedIndexQueryInfo& query_info) { | |||
} | |||
|
|||
_slop = query_info.slop; | |||
if (_slop <= 0) { | |||
|
|||
if (_slop == 0 || query_info.ordered) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment to explain different branch
@@ -73,14 +172,33 @@ void PhraseQuery::add(const std::wstring& field_name, const std::vector<std::str | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no _matcher for terms.size() == 1 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matcher handles position information, terms.size() == 1 is handled in other processes
@@ -54,12 +105,10 @@ class PhraseQuery : public Query { | |||
void search_by_skiplist(roaring::Roaring& roaring); | |||
|
|||
int32_t do_next(int32_t doc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment for sematics of the interfaces.
auto result = | ||
std::from_chars(slop_str.begin(), slop_str.end(), query_info.slop); | ||
if (result.ec != std::errc()) { | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return Status::Error now?
template <typename Derived> | ||
class PhraseMatcherBase { | ||
public: | ||
bool matches(int32_t doc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment for all interface
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 40132 ms
|
TPC-DS: Total hot run time: 173071 ms
|
ClickBench: Total hot run time: 30.57 s
|
83cff3b
to
22580d7
Compare
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
return false; | ||
} | ||
|
||
bool OrderedSloppyPhraseMatcher::stretchToOrder(PostingsAndPosition* prev_posting) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use consistent function name style here.
may be stretch_to_order is better
22580d7
to
b883e3d
Compare
run buildall |
TPC-H: Total hot run time: 40252 ms
|
TPC-DS: Total hot run time: 174752 ms
|
ClickBench: Total hot run time: 30.14 s
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
…ery (apache#36356) 1. select count() from tbl where b match_phrase 'the brown ~2+';
…ery (apache#36356) 1. select count() from tbl where b match_phrase 'the brown ~2+';
…ery (#36356) 1. select count() from tbl where b match_phrase 'the brown ~2+';
…uery (apache#37752) ## Proposed changes 1. select count() from tbl where b match_phrase 'the brown ~2+'; pick from apache#36356
Proposed changes