-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance: support post filter execution #37363
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: chasingegg The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Invalid PR Title Format Detected Your PR submission does not adhere to our required standards. To ensure clarity and consistency, please meet the following criteria:
Required Title Structure:
Where Example:
Please review and update your PR to comply with these guidelines. |
@chasingegg go-sdk check failed, comment |
@chasingegg cpp-unit-test check failed, comment |
@chasingegg E2e jenkins job failed, comment |
std::optional<int64_t> | ||
get_iterator_batch_size() { | ||
return milvus::index::GetValueFromConfig<int64_t>( | ||
search_info_.search_params_, "batch_size"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to rename it as conflicting to E2E Iterator parameters?
MoveCursorForIndex(); | ||
if (segment_->HasFieldData(field_id_)) { | ||
// when we specify input, do not maintain states | ||
if (has_offset_input_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it does not need to move cursor here when input is specified? Not the other way around as in internal/core/src/exec/expression/LogicalBinaryExpr.h
per say?
size_t hi, | ||
float dist) { | ||
while (lo < hi) { | ||
size_t mid = (lo + hi) >> 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential overflow, use size_t mid = lo + ((hi - lo) >> 1)
double scalar_cost = | ||
std::chrono::duration<double, std::micro>(scalar_end - scalar_start) | ||
.count(); | ||
monitor::internal_core_search_latency_postfilter.Observe(scalar_cost); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see adding metrics as well!
auto col_vec_size = col_vec->size(); | ||
TargetBitmapView bitsetview(col_vec->GetRawData(), | ||
col_vec_size); | ||
Assert(bitsetview.size() <= batch_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this must be met? Is it possible that user gives a small batch_size
?
} | ||
|
||
RowVectorPtr | ||
PhyFilterNode::GetOutput() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it support RangeSearch?
6a7ea43
to
83d7a5e
Compare
@chasingegg cpp-unit-test check failed, comment |
909b13b
to
82aa236
Compare
plan_node->search_info_.group_by_field_id_ == std::nullopt) { | ||
plannode = std::make_shared<milvus::plan::MvccNode>( | ||
milvus::plan::GetNextPlanNodeId()); | ||
sources = std::vector<milvus::plan::PlanNodePtr>{plannode}; | ||
plannode = std::make_shared<milvus::plan::VectorSearchNode>( | ||
milvus::plan::GetNextPlanNodeId(), sources); | ||
sources = std::vector<milvus::plan::PlanNodePtr>{plannode}; | ||
|
||
// add filter nodes after vector search node | ||
auto expr = ParseExprs(anns_proto.predicates()); | ||
plannode = std::make_shared<plan::FilterNode>( | ||
milvus::plan::GetNextPlanNodeId(), expr, sources); | ||
sources = std::vector<milvus::plan::PlanNodePtr>{plannode}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrapper as a function, same as below else
!search_info.search_params_.contains(RADIUS)) { | ||
search_info.post_filter_execution = | ||
search_info.search_params_[POST_FILTER]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that means user decide to whether using post filter or not ? Could decided by other method like stats info etc
// FilterNode will accept offsets array and execute over these and generate result valid offsets | ||
namespace milvus { | ||
namespace exec { | ||
class PhyFilterNode : public Operator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PhyPosterFilterBitsNode may more accurate. PhyFilterNode should not related with vector search node, not just return bits, it is more pure concept. When we support project function, add PhyFilterNode is better,same is FilterNode.h
@@ -28,17 +28,26 @@ namespace milvus { | |||
namespace exec { | |||
|
|||
class ExprSet; | |||
|
|||
using OffsetVector = FixedVector<int64_t>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int32_t is enough
82aa236
to
8dcc2db
Compare
@chasingegg cpp-unit-test check failed, comment |
rerun cpp-unit-test |
8dcc2db
to
3a9224e
Compare
3a9224e
to
8e67a30
Compare
Signed-off-by: chasingegg <[email protected]>
8e67a30
to
99d6e57
Compare
@chasingegg go-sdk check failed, comment |
@chasingegg cpp-unit-test check failed, comment |
rerun go-sdk |
rerun cpp-unit-test |
issue: #37360