-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: feat(storage): introduce reclaim_table_write_throughput #19135
base: main
Are you sure you want to change the base?
Conversation
…nto li0k/storage_recliam_table_throughput_statistic
…nto li0k/storage_recliam_table_throughput_statistic
…nto li0k/storage_recliam_table_throughput_statistic
…nto li0k/storage_recliam_table_throughput_statistic
…nto li0k/storage_recliam_table_throughput_statistic
@@ -85,12 +87,26 @@ impl HummockManager { | |||
.cloned() | |||
.collect_vec(); | |||
|
|||
if member_table_ids_1.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this reachable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we do not remove the default group cg2 and cg3, the dafault group may trigger this branch when initial a new cluster
|| (group.group_id == StaticCompactionGroupId::MaterializedView as u64 | ||
&& next_group.group_id == StaticCompactionGroupId::StateDefault as u64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this reachable? I think group.group_id
is supposed to be smaller than next_group.group_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, We cannot guarantee the group_id relationship between cg2 and cg3 in the old cluster.
src/meta/src/hummock/manager/compaction/compaction_group_schedule.rs
Outdated
Show resolved
Hide resolved
src/meta/src/hummock/manager/mod.rs
Outdated
#[derive(Debug, Clone)] | ||
pub struct TableWriteThroughputStatistic { | ||
pub throughput: u64, | ||
pub timestamp: i64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: it is better to put the unit of the timestamp in the name: timestamp_secs
src/meta/src/hummock/manager/mod.rs
Outdated
table_throughput | ||
.retain(|statistic| timestamp - statistic.timestamp <= self.max_statistic_expired_time); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we need to iterate through all items in table_throughput on each add_table_throughput_with_ts
, which is unneccessary. It is more efficient to maintain table_throughput
in a VecDeque
(ring buffer). We can use pop_front
to remove items until timestamp - statistic.timestamp <= self.max_statistic_expired_time
.
src/meta/src/hummock/manager/mod.rs
Outdated
pub fn retain(&mut self) { | ||
self.table_throughput.retain(|_, v| !v.is_empty()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we only remove items in table_throughput
via add_table_throughput_with_ts
, we don't need this separate retain
method and we can just do it in place in add_table_throughput_with_ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think we need to way to remove dropped table in TableWriteThroughputStatisticManager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the retain
method and trigger the remove
when all statistics are expired
…nto li0k/storage_recliam_table_throughput_statistic
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
After #18806
if a table stops writing (silently), it will still be misclassified as high throughput, which will prevent the merge strategy.
Therefore, the PR introduces a recliam mechanism to reclaim expired table throughput statistic, consider the following case
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.