perf(memory): use thread-local sequence-based memory eviction policy #16087

MrCroxx · 2024-04-02T08:56:38Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Motivation

Resolves #15305

The previous memory eviction strategy was based on epoch, and in the following scenarios, there may be cases of excessively aggressive eviction:

Uneven data volume between epochs caused by sudden increase or decrease in traffic.
Large data volume in recent epochs due to frequent access to certain data.
Large epoch interval.

This PR introduces a sequence-based memory eviction mechanism. The eviction of the cache is no longer based on epochs but on the sequence of cache access allocation with finer granularity.

Because the sequence needs to be globally shared, although only one atomic variable is needed for the sequence, the overhead of cache invalidation caused by frequent fetch_add cannot be ignored. Therefore, this PR introduces a thread_local sequence that allows global reordering within a certain range.

When insert/access an entry into/from the managed lru cache, the managed lru cache acquires a sequence from the Sequencer. The Sequencer use a thread-local variable to grant the sequence. the thread-local variable is synchronized with the global sequence if (a) the pre-allocated local sequences (step) are exhausted, or (b) the local sequence lag is higher than the threshold (lag). When evicting, the memory controller calculate the memory ratio to evict and normalize it as watermark sequence with the global sequence. The out-of-order threshold is max(lag, step).

CN node memory (before vs after):

Changes

Remove forked lru dependency. Use customized implemention in risingwave_common::lru.
Add thread-local sequencer implementation in risingwave_common::sequencer.
Add factor configuration for each eviction policy to control the eviction speed.

Configurations

    #[serde(default = "default::developer::memory_controller_eviction_factor_aggressive")]
    pub memory_controller_eviction_factor_aggressive: f64,

    #[serde(default = "default::developer::memory_controller_eviction_factor_graceful")]
    pub memory_controller_eviction_factor_graceful: f64,

    #[serde(default = "default::developer::memory_controller_eviction_factor_stable")]
    pub memory_controller_eviction_factor_stable: f64,

    #[serde(default = "default::developer::memory_controller_sequence_tls_step")]
    pub memory_controller_sequence_tls_step: u64,

    #[serde(default = "default::developer::memory_controller_sequence_tls_lag")]
    pub memory_controller_sequence_tls_lag: u64,

        pub fn memory_controller_threshold_aggressive() -> f64 {
            0.9
        }

        pub fn memory_controller_threshold_graceful() -> f64 {
            0.8
        }

        pub fn memory_controller_threshold_stable() -> f64 {
            0.7
        }

        pub fn memory_controller_eviction_factor_aggressive() -> f64 {
            2.0
        }

        pub fn memory_controller_eviction_factor_graceful() -> f64 {
            1.5
        }

        pub fn memory_controller_eviction_factor_stable() -> f64 {
            1.0
        }

        pub fn memory_controller_sequence_tls_step() -> u64 {
            128
        }

        pub fn memory_controller_sequence_tls_lag() -> u64 {
            32
        }

Micro Benchmarks for Components

Sequencer microbench:

primitive            1 threads 10000000 loops: 0ns per iter
atomic               1 threads 10000000 loops: 1ns per iter
atomic skip 8        1 threads 10000000 loops: 1ns per iter
atomic skip 16       1 threads 10000000 loops: 1ns per iter
atomic skip 32       1 threads 10000000 loops: 1ns per iter
atomic skip 64       1 threads 10000000 loops: 1ns per iter
sequencer(64,8)      1 threads 10000000 loops: 2ns per iter
sequencer(64,16)     1 threads 10000000 loops: 1ns per iter
sequencer(64,32)     1 threads 10000000 loops: 1ns per iter
sequencer(128,8)     1 threads 10000000 loops: 1ns per iter
sequencer(128,16)    1 threads 10000000 loops: 1ns per iter
sequencer(128,32)    1 threads 10000000 loops: 1ns per iter
coarse               1 threads 10000000 loops: 3ns per iter

primitive            4 threads 10000000 loops: 0ns per iter
atomic               4 threads 10000000 loops: 20ns per iter
atomic skip 8        4 threads 10000000 loops: 5ns per iter
atomic skip 16       4 threads 10000000 loops: 5ns per iter
atomic skip 32       4 threads 10000000 loops: 4ns per iter
atomic skip 64       4 threads 10000000 loops: 4ns per iter
sequencer(64,8)      4 threads 10000000 loops: 4ns per iter
sequencer(64,16)     4 threads 10000000 loops: 5ns per iter
sequencer(64,32)     4 threads 10000000 loops: 5ns per iter
sequencer(128,8)     4 threads 10000000 loops: 3ns per iter
sequencer(128,16)    4 threads 10000000 loops: 3ns per iter
sequencer(128,32)    4 threads 10000000 loops: 3ns per iter
coarse               4 threads 10000000 loops: 10ns per iter

primitive            8 threads 10000000 loops: 0ns per iter
atomic               8 threads 10000000 loops: 43ns per iter
atomic skip 8        8 threads 10000000 loops: 18ns per iter
atomic skip 16       8 threads 10000000 loops: 12ns per iter
atomic skip 32       8 threads 10000000 loops: 9ns per iter
atomic skip 64       8 threads 10000000 loops: 6ns per iter
sequencer(64,8)      8 threads 10000000 loops: 8ns per iter
sequencer(64,16)     8 threads 10000000 loops: 7ns per iter
sequencer(64,32)     8 threads 10000000 loops: 7ns per iter
sequencer(128,8)     8 threads 10000000 loops: 5ns per iter
sequencer(128,16)    8 threads 10000000 loops: 4ns per iter
sequencer(128,32)    8 threads 10000000 loops: 5ns per iter
coarse               8 threads 10000000 loops: 16ns per iter

primitive            16 threads 10000000 loops: 0ns per iter
atomic               16 threads 10000000 loops: 125ns per iter
atomic skip 8        16 threads 10000000 loops: 35ns per iter
atomic skip 16       16 threads 10000000 loops: 24ns per iter
atomic skip 32       16 threads 10000000 loops: 18ns per iter
atomic skip 64       16 threads 10000000 loops: 12ns per iter
sequencer(64,8)      16 threads 10000000 loops: 23ns per iter
sequencer(64,16)     16 threads 10000000 loops: 15ns per iter
sequencer(64,32)     16 threads 10000000 loops: 15ns per iter
sequencer(128,8)     16 threads 10000000 loops: 16ns per iter
sequencer(128,16)    16 threads 10000000 loops: 10ns per iter
sequencer(128,32)    16 threads 10000000 loops: 9ns per iter
coarse               16 threads 10000000 loops: 41ns per iter

primitive            32 threads 10000000 loops: 0ns per iter
atomic               32 threads 10000000 loops: 384ns per iter
atomic skip 8        32 threads 10000000 loops: 72ns per iter
atomic skip 16       32 threads 10000000 loops: 51ns per iter
atomic skip 32       32 threads 10000000 loops: 34ns per iter
atomic skip 64       32 threads 10000000 loops: 21ns per iter
sequencer(64,8)      32 threads 10000000 loops: 138ns per iter
sequencer(64,16)     32 threads 10000000 loops: 64ns per iter
sequencer(64,32)     32 threads 10000000 loops: 28ns per iter
sequencer(128,8)     32 threads 10000000 loops: 137ns per iter
sequencer(128,16)    32 threads 10000000 loops: 63ns per iter
sequencer(128,32)    32 threads 10000000 loops: 16ns per iter
coarse               32 threads 10000000 loops: 184ns per iter

lru microbench:

lru - 1024           1 threads 1000000 loops: 35ns per iter, total evicted: 999424
rw  - 1024           1 threads 1000000 loops: 26ns per iter, total evicted: 999424

lru - 1024           4 threads 1000000 loops: 35ns per iter, total evicted: 3997696
rw  - 1024           4 threads 1000000 loops: 27ns per iter, total evicted: 3997696

lru - 1024           8 threads 1000000 loops: 44ns per iter, total evicted: 7995392
rw  - 1024           8 threads 1000000 loops: 34ns per iter, total evicted: 7995392

lru - 1024           16 threads 1000000 loops: 46ns per iter, total evicted: 15990784
rw  - 1024           16 threads 1000000 loops: 51ns per iter, total evicted: 15990784

lru - 1024           32 threads 1000000 loops: 56ns per iter, total evicted: 31981568
rw  - 1024           32 threads 1000000 loops: 81ns per iter, total evicted: 31981568

lru - 1024           64 threads 1000000 loops: 90ns per iter, total evicted: 63963136
rw  - 1024           64 threads 1000000 loops: 149ns per iter, total evicted: 63963136

Benchmark

benchmark (nexmark, vs nightly-20240511):

http://metabase.risingwave-cloud.xyz/question/2219-nexmark-rw-compare?risingwave_tag_1=nightly-20240512&rw_label_1=daily&risingwave_metrics=avg-source-output-rows-per-second&risingwave_tag_2=git-8bc7ee189094e72c65db0725f05263ca3ec08be3&rw_label_2=benchmark-xx-tls

benchmark (nexmark, vs main without this PR):

http://metabase.risingwave-cloud.xyz/question/2219-nexmark-rw-compare?risingwave_tag_1=git-91b7ee29ce4d846f9c2ee6d9f56264bab414250a&rw_label_1=benchmark-xx-main&risingwave_metrics=avg-source-output-rows-per-second&risingwave_tag_2=git-8bc7ee189094e72c65db0725f05263ca3ec08be3&rw_label_2=benchmark-xx-tls

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added test labels as necessary. See details.
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
All checks passed in ./risedev check (or alias, ./risedev c)
My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Signed-off-by: MrCroxx <[email protected]>

github-actions

license-eye has totally checked 4973 files.

Valid	Invalid	Ignored	Fixed
2138	1	2834	0

Click to see the invalid file list

src/common/benches/bench_sequencer.rs

src/common/benches/bench_sequencer.rs

Signed-off-by: MrCroxx <[email protected]>

BugenZhao · 2024-04-08T04:50:36Z

Hi, would you mind sharing more information on the motivation and methodology in the PR description?

Signed-off-by: MrCroxx <[email protected]>

st1page

LGTM

hzxa21

LGTM

fuyufjh · 2024-05-13T07:22:02Z

src/common/src/lru.rs

+    }
+
+    pub fn put(&mut self, key: K, mut value: V) -> Option<V> {
+        unsafe {


So many unsafe in this file 🥵 Please explain the necessities of unsafe with some comments in this file e.g. why LinkedList can't satisfy this use case.

Also, I tend to wrap every linked list operations in unsafe { ... } instead of simply wrapping all the function body code. It makes it hard to reason about the safety.

The single thread LRU with sequence implementation is basically ported from our original modified lru repo, foyer, and our original block cache implementation (ported from RocksDB).

A LRU cache is a classic multi-indexer problem, which cannot be achieved easily and cheaply with safe Rust. It requires mutability with shared pointers, O(1) node lookup with address or reference (which cannot be achieved with std linked list) .

It requires mutability with shared pointers

Yeah, got this. But is it possible to reduce the size of unsafe block? As mentioned in the 2nd comment.

fuyufjh

Overall the idea LGTM

src/common/src/sequence.rs

src/compute/src/memory/controller.rs

fuyufjh · 2024-05-13T07:39:37Z

src/common/Cargo.toml

I heard that some stateless queries in NexMark were negatively affected by this PR for some "unknown" cause. Have we found the reason now?

Still not. But the regression hasn't appear these weeks.

lmatz · 2024-05-13T07:49:17Z

Is it necessary to also run the 4X (32C 64G) nexmark once? https://buildkite.com/risingwave-test/nexmark-benchmark/builds/3664#018f689a-0b1d-4fc3-9b82-912358895ccc
Considering that 32 threads case seem to incur more overhead

Signed-off-by: MrCroxx <[email protected]>

MrCroxx · 2024-05-23T08:06:33Z

Is it necessary to also run the 4X (32C 64G) nexmark once? buildkite.com/risingwave-test/nexmark-benchmark/builds/3664#018f689a-0b1d-4fc3-9b82-912358895ccc Considering that 32 threads case seem to incur more overhead

Wha's the hardware configuration of the longevity test? I've ran longevity test and there is no regression.

Signed-off-by: MrCroxx <[email protected]>

lmatz · 2024-05-23T08:16:23Z

Each MV in longevity uses 3 as the parallelism.
The machine is 32C 64GB where there are 3CNs (each 32CPUs) free to compete with each other

Signed-off-by: MrCroxx <[email protected]>

fuyufjh

Overall LGTM

fuyufjh · 2024-05-27T07:21:32Z

src/common/src/lru.rs

+    }
+
+    pub fn put(&mut self, key: K, mut value: V) -> Option<V> {
+        unsafe {


It requires mutability with shared pointers

Yeah, got this. But is it possible to reduce the size of unsafe block? As mentioned in the 2nd comment.

grafana/risingwave-dev-dashboard.dashboard.py

src/stream/src/cache/managed_lru.rs

Signed-off-by: MrCroxx <[email protected]>

MrCroxx · 2024-05-27T10:45:04Z

#16087 (comment)

Discussed offline. Separating the unsafe blocks barely helps reduce the explosion radius. Let's keep it as it is now.

…16087) Signed-off-by: MrCroxx <[email protected]>

perf(memory): use thread-local squence-based memory eviction policy

9eb816a

Signed-off-by: MrCroxx <[email protected]>

MrCroxx self-assigned this Apr 2, 2024

github-actions bot added the type/perf label Apr 2, 2024

MrCroxx added 2 commits April 2, 2024 17:05

Merge remote-tracking branch 'origin/main' into xx/thread-local-sequence

d998a13

test(bench): add sequencer benchmark

b87e911

Signed-off-by: MrCroxx <[email protected]>

github-actions bot reviewed Apr 2, 2024

View reviewed changes

src/common/benches/bench_sequencer.rs Show resolved Hide resolved

MrCroxx added 9 commits April 2, 2024 17:35

fix: fix license header

f9678c3

Signed-off-by: MrCroxx <[email protected]>

fix: do not init sequence when insert lru

3959a16

Signed-off-by: MrCroxx <[email protected]>

perf: add lru bench

1ba5fd6

Signed-off-by: MrCroxx <[email protected]>

fix: clear lru cache after drop

6483b58

Signed-off-by: MrCroxx <[email protected]>

refactor: simplify clear

7a9a7c8

Signed-off-by: MrCroxx <[email protected]>

fix: drop inited field when clear

f76da99

Signed-off-by: MrCroxx <[email protected]>

Merge remote-tracking branch 'origin/main' into xx/thread-local-sequence

f12c141

refactor: update metrics in rw

8eccfee

Signed-off-by: MrCroxx <[email protected]>

chore: update grafana

594111d

Signed-off-by: MrCroxx <[email protected]>

MrCroxx marked this pull request as ready for review April 7, 2024 06:41

MrCroxx requested a review from a team as a code owner April 7, 2024 06:41

Merge remote-tracking branch 'origin/main' into xx/thread-local-sequence

dc0737a

Signed-off-by: MrCroxx <[email protected]>

TennyZhuang changed the title ~~perf(memory): use thread-local squence-based memory eviction policy~~ perf(memory): use thread-local sequence-based memory eviction policy Apr 7, 2024

MrCroxx added 3 commits April 7, 2024 16:01

refactor: make sequencer args configurable

5d98779

Signed-off-by: MrCroxx <[email protected]>

Merge remote-tracking branch 'origin/main' into xx/thread-local-sequence

f77d980

chore: tiny refactors

dce4999

Signed-off-by: MrCroxx <[email protected]>

MrCroxx requested review from fuyufjh, yuhao-su and st1page April 16, 2024 06:25

Merge remote-tracking branch 'origin/main' into xx/thread-local-sequence

7ebc5d3

Signed-off-by: MrCroxx <[email protected]>

MrCroxx force-pushed the xx/thread-local-sequence branch from 4a8785b to 7ebc5d3 Compare April 16, 2024 07:16

MrCroxx added 2 commits April 16, 2024 15:36

chore: make clippy happier

2339eee

Signed-off-by: MrCroxx <[email protected]>

fix: enable unstabl feature

ebc27aa

Signed-off-by: MrCroxx <[email protected]>

MrCroxx enabled auto-merge May 13, 2024 02:43

MrCroxx disabled auto-merge May 13, 2024 02:59

MrCroxx requested a review from hzxa21 May 13, 2024 02:59

st1page approved these changes May 13, 2024

View reviewed changes

hzxa21 approved these changes May 13, 2024

View reviewed changes

fuyufjh reviewed May 13, 2024

View reviewed changes

MrCroxx added 3 commits May 23, 2024 15:21

Merge remote-tracking branch 'origin/main' into xx/thread-local-sequence

dc43b6a

Signed-off-by: MrCroxx <[email protected]>

chore: fill rust docs for Sequencer

6ae11d5

Signed-off-by: MrCroxx <[email protected]>

chore: refine docs for controller

c934b8e

Signed-off-by: MrCroxx <[email protected]>

MrCroxx requested a review from fuyufjh May 23, 2024 08:05

fix: fix bench build

3dfd2d6

Signed-off-by: MrCroxx <[email protected]>

MrCroxx added 2 commits May 23, 2024 16:40

Merge remote-tracking branch 'origin/main' into xx/thread-local-sequence

207df01

Merge remote-tracking branch 'origin/main' into xx/thread-local-sequence

7b68659

Signed-off-by: MrCroxx <[email protected]>

fuyufjh approved these changes May 27, 2024

View reviewed changes

MrCroxx added 3 commits May 27, 2024 08:40

fix: resolve grafana build

93858bb

Signed-off-by: MrCroxx <[email protected]>

refactor: remove update_epoch

3d5917c

Signed-off-by: MrCroxx <[email protected]>

Merge remote-tracking branch 'origin/main' into xx/thread-local-sequence

a46061b

Signed-off-by: MrCroxx <[email protected]>

MrCroxx added this pull request to the merge queue May 27, 2024

Merged via the queue into main with commit 240f0b9 May 27, 2024
27 of 28 checks passed

MrCroxx deleted the xx/thread-local-sequence branch May 27, 2024 11:10

hzxa21 mentioned this pull request May 28, 2024

Tracking: Remote storage IOPS optimization #16973

Open

8 tasks

lmatz pushed a commit that referenced this pull request Jun 11, 2024

perf(memory): use thread-local sequence-based memory eviction policy (#…

b67ea03

…16087) Signed-off-by: MrCroxx <[email protected]>

lmatz mentioned this pull request Jun 14, 2024

perf(memory): raise memory eviction threshold #17265

Merged

9 tasks

hzxa21 mentioned this pull request Nov 15, 2024

Avoid relying on barrier_interval_ms to increase operator cache eviction watermark #19403

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(memory): use thread-local sequence-based memory eviction policy #16087

perf(memory): use thread-local sequence-based memory eviction policy #16087

MrCroxx commented Apr 2, 2024 •

edited

Loading

github-actions bot left a comment

BugenZhao commented Apr 8, 2024

st1page left a comment

hzxa21 left a comment

fuyufjh May 13, 2024 •

edited

Loading

fuyufjh May 13, 2024 •

edited

Loading

MrCroxx May 23, 2024

fuyufjh May 27, 2024

fuyufjh left a comment

fuyufjh May 13, 2024

MrCroxx May 23, 2024

lmatz commented May 13, 2024

MrCroxx commented May 23, 2024

lmatz commented May 23, 2024

fuyufjh left a comment

fuyufjh May 27, 2024

MrCroxx commented May 27, 2024

perf(memory): use thread-local sequence-based memory eviction policy #16087

perf(memory): use thread-local sequence-based memory eviction policy #16087

Conversation

MrCroxx commented Apr 2, 2024 • edited Loading

What's changed and what's your intention?

Motivation

Changes

Configurations

Micro Benchmarks for Components

Benchmark

Checklist

Documentation

Release note

github-actions bot left a comment

Choose a reason for hiding this comment

BugenZhao commented Apr 8, 2024

st1page left a comment

Choose a reason for hiding this comment

hzxa21 left a comment

Choose a reason for hiding this comment

fuyufjh May 13, 2024 • edited Loading

Choose a reason for hiding this comment

fuyufjh May 13, 2024 • edited Loading

Choose a reason for hiding this comment

MrCroxx May 23, 2024

Choose a reason for hiding this comment

fuyufjh May 27, 2024

Choose a reason for hiding this comment

fuyufjh left a comment

Choose a reason for hiding this comment

fuyufjh May 13, 2024

Choose a reason for hiding this comment

MrCroxx May 23, 2024

Choose a reason for hiding this comment

lmatz commented May 13, 2024

MrCroxx commented May 23, 2024

lmatz commented May 23, 2024

fuyufjh left a comment

Choose a reason for hiding this comment

fuyufjh May 27, 2024

Choose a reason for hiding this comment

MrCroxx commented May 27, 2024

MrCroxx commented Apr 2, 2024 •

edited

Loading

fuyufjh May 13, 2024 •

edited

Loading

fuyufjh May 13, 2024 •

edited

Loading