-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] table sample #52600
base: main
Are you sure you want to change the base?
[Feature] table sample #52600
Conversation
64435a6
to
46bfe29
Compare
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
Signed-off-by: Murphy <[email protected]>
acf9bfe
to
bc37039
Compare
Quality Gate failedFailed conditions |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]❌ fail : 61 / 85 (71.76%) file detail
|
[BE Incremental Coverage Report]✅ pass : 272 / 322 (84.47%) file detail
|
@@ -259,9 +261,11 @@ class ColumnIterator { | |||
|
|||
virtual Status null_count(size_t* count) { return Status::OK(); }; | |||
|
|||
// RAW interface, should be used carefully | |||
virtual ColumnReader* get_column_reader() { return nullptr; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest to
virtual ColumnReader* get_column_reader()=0;
or use std::optional<ColumnReader*> as a return type to tell sb that the return type must be checked
std::bernoulli_distribution dist(_probability_percent / 100.0); | ||
|
||
size_t sampled_blocks = 0; | ||
size_t total_blocks = _total_rows / _rows_per_block; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how _rows_per_block is deduced? it seems _rows_per_block is deduced from rows per block
for (size_t i = 0; i < _total_rows; i += _rows_per_block) { | ||
if (dist(mt)) { | ||
sampled_blocks++; | ||
sampled_ranges.add(RowIdRange(i, std::min(i + _rows_per_block, _total_rows))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- can RowIdRange span two blocks ? a interesting question is what's the math expect of number of RowIdRanges spanning two block when stride(_rows_per_block) varies. is there an optimized stride?
|
||
double SortableZoneMap::width(const Datum& lhs, const Datum& rhs) { | ||
if (lhs.is_null() || rhs.is_null()) { | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is 0, it seems that is a open range, it seens that infinity or maximum value is reasonable
Why I'm doing:
What I'm doing:
Introduce a syntax for table sample:
Properties:
seed
: optional, specify the random seed, which can guarantee the deterministic resultmethod
: optional,by_block
orby_page
percent
:(0, 100)
, sample probabilityFixes #52787
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: