WIP: feat(storage): introduce reclaim_table_write_throughput #19135

Li0k · 2024-10-25T10:34:03Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

if a table stops writing (silently), it will still be misclassified as high throughput, which will prevent the merge strategy.

Therefore, the PR introduces a recliam mechanism to reclaim expired table throughput statistic, consider the following case

When table stops being written, the table throughput statistic will be recliamed.
When the throughput statistic of a table is empty, there are two scenarios

The throughput statistic has expired and has been reclaimed (no throughput, can be merged)
The table is in the creating phase, and the first ckpt has not yet completed. (will not be merged)

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added test labels as necessary. See details.
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
All checks passed in ./risedev check (or alias, ./risedev c)
My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

…nto li0k/storage_recliam_table_throughput_statistic

hzxa21 · 2024-11-18T02:17:58Z

src/meta/src/hummock/manager/compaction/compaction_group_schedule.rs

@@ -85,12 +87,26 @@ impl HummockManager {
            .cloned()
            .collect_vec();

+        if member_table_ids_1.is_empty() {


Is this reachable?

yes, we do not remove the default group cg2 and cg3, the dafault group may trigger this branch when initial a new cluster

hzxa21 · 2024-11-18T02:19:57Z

src/meta/src/hummock/manager/compaction/compaction_group_schedule.rs

+            || (group.group_id == StaticCompactionGroupId::MaterializedView as u64
+                && next_group.group_id == StaticCompactionGroupId::StateDefault as u64)


Is this reachable? I think group.group_id is supposed to be smaller than next_group.group_id

yes, We cannot guarantee the group_id relationship between cg2 and cg3 in the old cluster.

src/meta/src/hummock/manager/compaction/compaction_group_schedule.rs

hzxa21 · 2024-11-18T02:27:12Z

src/meta/src/hummock/manager/mod.rs

+#[derive(Debug, Clone)]
+pub struct TableWriteThroughputStatistic {
+    pub throughput: u64,
+    pub timestamp: i64,


minor: it is better to put the unit of the timestamp in the name: timestamp_secs

src/meta/src/hummock/manager/mod.rs

hzxa21 · 2024-11-18T02:37:51Z

src/meta/src/hummock/manager/mod.rs

+        table_throughput
+            .retain(|statistic| timestamp - statistic.timestamp <= self.max_statistic_expired_time);


Here we need to iterate through all items in table_throughput on each add_table_throughput_with_ts, which is unneccessary. It is more efficient to maintain table_throughput in a VecDeque (ring buffer). We can use pop_front to remove items until timestamp - statistic.timestamp <= self.max_statistic_expired_time.

hzxa21 · 2024-11-18T02:39:45Z

src/meta/src/hummock/manager/mod.rs

+    pub fn retain(&mut self) {
+        self.table_throughput.retain(|_, v| !v.is_empty());
+    }


Given that we only remove items in table_throughput via add_table_throughput_with_ts, we don't need this separate retain method and we can just do it in place in add_table_throughput_with_ts

Actually, I think we need to way to remove dropped table in TableWriteThroughputStatisticManager.

remove the retain method and trigger the remove when all statistics are expired

…nto li0k/storage_recliam_table_throughput_statistic

feat(storage): introduce reclaim_table_write_throughput

f5f1461

github-actions bot added type/feature ci/run-e2e-single-node-tests labels Oct 25, 2024

Li0k mentioned this pull request Oct 28, 2024

feat(compaction): introduce auto split and merge policy and config #18806

Merged

9 tasks

Li0k added 3 commits October 29, 2024 13:41

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

742bcb4

…nto li0k/storage_recliam_table_throughput_statistic

add config

6ef8ba3

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

29b269b

…nto li0k/storage_recliam_table_throughput_statistic

Li0k mentioned this pull request Oct 30, 2024

feat(compaction): default new compaction group for new table #19080

Merged

9 tasks

Li0k marked this pull request as ready for review November 4, 2024 05:14

Li0k requested review from hzxa21 and zwang28 November 4, 2024 05:14

github-actions bot added the ci/run-e2e-test-other-backends label Nov 4, 2024

Li0k added 10 commits November 4, 2024 14:32

update doc

e2e7777

fix check

59c2e0f

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

b4b6b41

…nto li0k/storage_recliam_table_throughput_statistic

typo

40e8f25

refactor

cfe993d

remove unused config

81e68fe

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

e8e93b7

…nto li0k/storage_recliam_table_throughput_statistic

fix(compaction): empty group

023c895

refactor

2597bc9

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

ea31292

…nto li0k/storage_recliam_table_throughput_statistic

hzxa21 reviewed Nov 18, 2024

View reviewed changes

Li0k changed the title ~~feat(storage): introduce reclaim_table_write_throughput~~ WIP: feat(storage): introduce reclaim_table_write_throughput Nov 19, 2024

github-actions bot added the Invalid PR Title label Nov 19, 2024

Li0k added 3 commits November 19, 2024 20:24

refactor

cee9bc8

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

0cc1252

…nto li0k/storage_recliam_table_throughput_statistic

refactor

6045bc5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: feat(storage): introduce reclaim_table_write_throughput #19135

WIP: feat(storage): introduce reclaim_table_write_throughput #19135

Li0k commented Oct 25, 2024 •

edited

Loading

hzxa21 Nov 18, 2024

Li0k Nov 19, 2024

hzxa21 Nov 18, 2024

Li0k Nov 19, 2024

hzxa21 Nov 18, 2024

hzxa21 Nov 18, 2024

hzxa21 Nov 18, 2024

hzxa21 Nov 18, 2024

Li0k Nov 19, 2024

		\|\| (group.group_id == StaticCompactionGroupId::MaterializedView as u64
		&& next_group.group_id == StaticCompactionGroupId::StateDefault as u64)

		table_throughput
		.retain(\|statistic\| timestamp - statistic.timestamp <= self.max_statistic_expired_time);

WIP: feat(storage): introduce reclaim_table_write_throughput #19135

Are you sure you want to change the base?

WIP: feat(storage): introduce reclaim_table_write_throughput #19135

Conversation

Li0k commented Oct 25, 2024 • edited Loading

What's changed and what's your intention?

Checklist

Documentation

Release note

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Li0k commented Oct 25, 2024 •

edited

Loading