-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kafka(ticdc): sarama do not retry if produce message failed to prevent out of order (#11870) #11961
kafka(ticdc): sarama do not retry if produce message failed to prevent out of order (#11870) #11961
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. @@ Coverage Diff @@
## release-8.5 #11961 +/- ##
================================================
Coverage ? 55.2926%
================================================
Files ? 1002
Lines ? 136727
Branches ? 0
================================================
Hits ? 75600
Misses ? 55622
Partials ? 5505 |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: 3AceShowHand The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
/unhold |
/retest |
2 similar comments
/retest |
/retest |
/test pull-dm-integration-test |
/retest |
2 similar comments
/retest |
/retest |
This is an automated cherry-pick of #11870
What problem does this PR solve?
Issue Number: close #11935
What is changed and how it works?
config.Net.MaxOpenRequest
is set to 1config.Producer.Retry.Max
is set to 0, to disable the internal retry mechanismThe root cause of the out-of-order message problem comes from the sarama internal bug, cannot be easily fixed, this is a workaround solution, by set the
retry.max
to 0, to disable the retry.Check List
Tests
This is tested by an internal E2E test, which inject network partition between the random cdc node and random kafka server. Before this PR, the test case cannot be passed, and we found out-of-order message by reading consumer log, after this PR it can be passed, and no out-of-order message.
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note