Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: feat(storage): introduce reclaim_table_write_throughput #19135

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

Li0k
Copy link
Contributor

@Li0k Li0k commented Oct 25, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

After #18806

if a table stops writing (silently), it will still be misclassified as high throughput, which will prevent the merge strategy.

Therefore, the PR introduces a recliam mechanism to reclaim expired table throughput statistic, consider the following case

  1. When table stops being written, the table throughput statistic will be recliamed.
  2. When the throughput statistic of a table is empty, there are two scenarios
  • The throughput statistic has expired and has been reclaimed (no throughput, can be merged)
  • The table is in the creating phase, and the first ckpt has not yet completed. (will not be merged)

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

…nto li0k/storage_recliam_table_throughput_statistic
…nto li0k/storage_recliam_table_throughput_statistic
@@ -85,12 +87,26 @@ impl HummockManager {
.cloned()
.collect_vec();

if member_table_ids_1.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this reachable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we do not remove the default group cg2 and cg3, the dafault group may trigger this branch when initial a new cluster

Comment on lines +847 to +848
|| (group.group_id == StaticCompactionGroupId::MaterializedView as u64
&& next_group.group_id == StaticCompactionGroupId::StateDefault as u64)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this reachable? I think group.group_id is supposed to be smaller than next_group.group_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, We cannot guarantee the group_id relationship between cg2 and cg3 in the old cluster.

#[derive(Debug, Clone)]
pub struct TableWriteThroughputStatistic {
pub throughput: u64,
pub timestamp: i64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: it is better to put the unit of the timestamp in the name: timestamp_secs

src/meta/src/hummock/manager/mod.rs Outdated Show resolved Hide resolved
src/meta/src/hummock/manager/mod.rs Outdated Show resolved Hide resolved
src/meta/src/hummock/manager/mod.rs Outdated Show resolved Hide resolved
Comment on lines 535 to 536
table_throughput
.retain(|statistic| timestamp - statistic.timestamp <= self.max_statistic_expired_time);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need to iterate through all items in table_throughput on each add_table_throughput_with_ts, which is unneccessary. It is more efficient to maintain table_throughput in a VecDeque (ring buffer). We can use pop_front to remove items until timestamp - statistic.timestamp <= self.max_statistic_expired_time.

Comment on lines 561 to 563
pub fn retain(&mut self) {
self.table_throughput.retain(|_, v| !v.is_empty());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we only remove items in table_throughput via add_table_throughput_with_ts, we don't need this separate retain method and we can just do it in place in add_table_throughput_with_ts

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we need to way to remove dropped table in TableWriteThroughputStatisticManager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the retain method and trigger the remove when all statistics are expired

@Li0k Li0k changed the title feat(storage): introduce reclaim_table_write_throughput WIP: feat(storage): introduce reclaim_table_write_throughput Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants