Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: support post filter execution #37363

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

chasingegg
Copy link
Contributor

issue: #37360

@sre-ci-robot sre-ci-robot added area/compilation size/XXL Denotes a PR that changes 1000+ lines. labels Nov 1, 2024
@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chasingegg
To complete the pull request process, please assign congqixia after the PR has been reviewed.
You can assign the PR to them by writing /assign @congqixia in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

mergify bot commented Nov 1, 2024

@chasingegg

Invalid PR Title Format Detected

Your PR submission does not adhere to our required standards. To ensure clarity and consistency, please meet the following criteria:

  1. Title Format: The PR title must begin with one of these prefixes:
  • feat: for introducing a new feature.
  • fix: for bug fixes.
  • enhance: for improvements to existing functionality.
  • test: for add tests to existing functionality.
  • doc: for modifying documentation.
  • auto: for the pull request from bot.
  1. Description Requirement: The PR must include a non-empty description, detailing the changes and their impact.

Required Title Structure:

[Type]: [Description of the PR]

Where Type is one of feat, fix, enhance, test or doc.

Example:

enhance: improve search performance significantly 

Please review and update your PR to comply with these guidelines.

@chasingegg chasingegg changed the title Support post filter enhance: support post filter execution Nov 1, 2024
@mergify mergify bot added kind/enhancement Issues or changes related to enhancement and removed do-not-merge/invalid-pr-format labels Nov 1, 2024
Copy link
Contributor

mergify bot commented Nov 1, 2024

@chasingegg go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 1, 2024

@chasingegg cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 1, 2024

@chasingegg E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link

codecov bot commented Nov 1, 2024

Codecov Report

Attention: Patch coverage is 84.39349% with 211 lines in your changes missing coverage. Please review.

Project coverage is 80.67%. Comparing base (5a23c80) to head (99d6e57).
Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
internal/core/src/exec/expression/Expr.h 51.23% 59 Missing ⚠️
internal/core/src/exec/expression/CompareExpr.h 64.61% 23 Missing ⚠️
internal/core/src/common/Chunk.cpp 0.00% 15 Missing ⚠️
internal/core/src/exec/expression/ColumnExpr.cpp 65.78% 13 Missing ⚠️
internal/core/src/exec/operator/Utils.h 53.57% 13 Missing ⚠️
...rnal/core/src/exec/expression/JsonContainsExpr.cpp 96.03% 10 Missing ⚠️
...rnal/core/src/segcore/ChunkedSegmentSealedImpl.cpp 30.76% 9 Missing ⚠️
internal/core/src/segcore/SegmentInterface.h 33.33% 8 Missing ⚠️
internal/core/src/exec/operator/PostFilterNode.cpp 92.55% 7 Missing ⚠️
...ernal/core/src/exec/expression/BinaryRangeExpr.cpp 93.40% 6 Missing ⚠️
... and 13 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #37363      +/-   ##
==========================================
- Coverage   80.67%   80.67%   -0.01%     
==========================================
  Files        1357     1360       +3     
  Lines      190679   191257     +578     
==========================================
+ Hits       153835   154294     +459     
- Misses      31416    31546     +130     
+ Partials     5428     5417      -11     
Components Coverage Δ
Client 61.25% <ø> (ø)
Core 68.97% <84.39%> (+0.04%) ⬆️
Go 83.22% <ø> (+0.03%) ⬆️
Files with missing lines Coverage Δ
internal/core/src/common/Chunk.h 60.24% <ø> (ø)
internal/core/src/common/QueryInfo.h 100.00% <ø> (ø)
internal/core/src/exec/Driver.cpp 81.39% <100.00%> (+0.44%) ⬆️
internal/core/src/exec/QueryContext.h 84.61% <100.00%> (+0.30%) ⬆️
...ternal/core/src/exec/expression/AlwaysTrueExpr.cpp 88.23% <100.00%> (+2.52%) ⬆️
...e/src/exec/expression/BinaryArithOpEvalRangeExpr.h 100.00% <100.00%> (ø)
...nternal/core/src/exec/expression/BinaryRangeExpr.h 94.00% <100.00%> (+1.31%) ⬆️
internal/core/src/exec/expression/ColumnExpr.h 44.73% <ø> (ø)
internal/core/src/exec/expression/EvalCtx.h 100.00% <100.00%> (ø)
internal/core/src/exec/expression/ExistsExpr.cpp 89.18% <100.00%> (+2.09%) ⬆️
... and 40 more

... and 40 files with indirect coverage changes

std::optional<int64_t>
get_iterator_batch_size() {
return milvus::index::GetValueFromConfig<int64_t>(
search_info_.search_params_, "batch_size");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to rename it as conflicting to E2E Iterator parameters?

MoveCursorForIndex();
if (segment_->HasFieldData(field_id_)) {
// when we specify input, do not maintain states
if (has_offset_input_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it does not need to move cursor here when input is specified? Not the other way around as in internal/core/src/exec/expression/LogicalBinaryExpr.h per say?

size_t hi,
float dist) {
while (lo < hi) {
size_t mid = (lo + hi) >> 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential overflow, use size_t mid = lo + ((hi - lo) >> 1)

double scalar_cost =
std::chrono::duration<double, std::micro>(scalar_end - scalar_start)
.count();
monitor::internal_core_search_latency_postfilter.Observe(scalar_cost);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see adding metrics as well!

auto col_vec_size = col_vec->size();
TargetBitmapView bitsetview(col_vec->GetRawData(),
col_vec_size);
Assert(bitsetview.size() <= batch_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this must be met? Is it possible that user gives a small batch_size?

}

RowVectorPtr
PhyFilterNode::GetOutput() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it support RangeSearch?

Copy link
Contributor

mergify bot commented Nov 5, 2024

@chasingegg cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@chasingegg chasingegg force-pushed the support-post-filter branch 3 times, most recently from 909b13b to 82aa236 Compare November 8, 2024 10:14
@mergify mergify bot added the ci-passed label Nov 8, 2024
Comment on lines 106 to 118
plan_node->search_info_.group_by_field_id_ == std::nullopt) {
plannode = std::make_shared<milvus::plan::MvccNode>(
milvus::plan::GetNextPlanNodeId());
sources = std::vector<milvus::plan::PlanNodePtr>{plannode};
plannode = std::make_shared<milvus::plan::VectorSearchNode>(
milvus::plan::GetNextPlanNodeId(), sources);
sources = std::vector<milvus::plan::PlanNodePtr>{plannode};

// add filter nodes after vector search node
auto expr = ParseExprs(anns_proto.predicates());
plannode = std::make_shared<plan::FilterNode>(
milvus::plan::GetNextPlanNodeId(), expr, sources);
sources = std::vector<milvus::plan::PlanNodePtr>{plannode};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapper as a function, same as below else

!search_info.search_params_.contains(RADIUS)) {
search_info.post_filter_execution =
search_info.search_params_[POST_FILTER];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that means user decide to whether using post filter or not ? Could decided by other method like stats info etc

// FilterNode will accept offsets array and execute over these and generate result valid offsets
namespace milvus {
namespace exec {
class PhyFilterNode : public Operator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PhyPosterFilterBitsNode may more accurate. PhyFilterNode should not related with vector search node, not just return bits, it is more pure concept. When we support project function, add PhyFilterNode is better,same is FilterNode.h

@@ -28,17 +28,26 @@ namespace milvus {
namespace exec {

class ExprSet;

using OffsetVector = FixedVector<int64_t>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int32_t is enough

Copy link
Contributor

mergify bot commented Nov 14, 2024

@chasingegg cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@chasingegg
Copy link
Contributor Author

rerun cpp-unit-test

Signed-off-by: chasingegg <[email protected]>
Copy link
Contributor

mergify bot commented Nov 15, 2024

@chasingegg go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 15, 2024

@chasingegg cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@chasingegg
Copy link
Contributor Author

rerun go-sdk

@chasingegg
Copy link
Contributor Author

rerun cpp-unit-test

@mergify mergify bot added the ci-passed label Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/compilation ci-passed dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement size/XXL Denotes a PR that changes 1000+ lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants