-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fix potential data loss for shared source #19443
fix: fix potential data loss for shared source #19443
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
53349f8
to
a29b177
Compare
39887aa
to
4469944
Compare
So when receiving a new mutation on rate_limit, the source exec refreshes the high watermark to |
It's not related with backfill's position. We may assume backfill already finished, and it's just forwarding messages now. Rebuilding will make source exec jump from |
4469944
to
cf75bff
Compare
cf75bff
to
2ff1259
Compare
Want to wait a while for more reviews. Just in case. |
@@ -232,7 +232,7 @@ impl ExecutorBuilder for SourceExecutorBuilder { | |||
barrier_receiver, | |||
system_params, | |||
source.rate_limit, | |||
is_shared, | |||
is_shared && !source.with_properties.is_cdc_connector(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't impl seek to latest for CDC, and it will hit error
@@ -211,14 +211,17 @@ impl SourceReader { | |||
} | |||
|
|||
/// Build `SplitReader`s and then `BoxChunkSourceStream` from the given `ConnectorState` (`SplitImpl`s). | |||
/// | |||
/// If `seek_to_latest` is true, will also return the latest splits after seek. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we always return the splits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds OK to me. But ConnectorState
is also Option<Vec<SplitImpl>>
(which is also a little unnecessary to me), so perhaps we should refactor that together. NTFS
Merge activity |
will cherry pick the whole stack together |
…19466) (#19482) Signed-off-by: xxchan <[email protected]> Co-authored-by: Noel Kwan <[email protected]>
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Set up: Create a shared kafka source, and 1 MV on the source.
Data loss happens when:
Reason:
In #16626 we introduced an optimization to let shared SourceExecutor start from latest, but the implementation is problematic. Specifically,
hack_seek_to_latest
will not only take effect at the beginning, but will also when rebuilding the source reader (which happens when rate limit is applied).The new implementation in this PR:
hack_seek_to_latest
flag, which is error prone.seek_to_latest
call. At the same time, we also get the latest offsets. To make sureSplitImpl
andSourceReader
is consistent.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.