source: use pull model for mylogical and pglogical #1065

ZhouXing19 · 2024-11-06T17:23:03Z

This PR is based on the branch
https://github.com/cockroachdb/replicator/tree/bob_core_open to modify the mylogical and pglogical frontend to use a pull based model which is enabled by the BatchReader interface.

This change is

This PR is based on the branch https://github.com/cockroachdb/replicator/tree/bob_core_open to modify the mylogical and pglogical frontend to use a pull based model which is enabled by the BatchReader interface.

internal/source/cdc/webhook.go

 		}
 	}

-	return conveyor.AcceptMultiBatch(ctx, toProcess, &types.AcceptOptions{})
+	for _, cursor := range cursorsPushed {
+		_, err = stopvar.DoWhenUpdatedOrInterval(ctx, nil, &cursor.Outcome, 2*time.Second, func(ctx *stopper.Context, old, new error) error {


internal/source/cdc/webhook_query.go

+	}
+
+	for _, cursor := range cursorsPushed {
+		_, err = stopvar.DoWhenUpdatedOrInterval(ctx, nil, &cursor.Outcome, 2*time.Second, func(ctx *stopper.Context, old, new error) error {


ryanluu12345

Left some comments here. My remaining thought is this: because this is such an all encompassing change, we risk regressions across all operating modes.

Once we continue this work, I don't think we should merge and release all of this at once. Here is how I'm thinking about the sequencing (pun intended):

Make the core changes to the sequencer and relevant interfaces + implementations as their own standalone change
Make the changes for PG and MySQL first since those will see the most immediate performance benefit
Perform testing and verification of the PG and MySQL cases
Slowly add each mode of operation like objstore, kafka, and C2C

I don't want to embroil us in support tickets across the board in case something goes awry.

I'm wondering if it's possible for us to take this piecemeal approach -- starting with the PG and MySQL sources?

ryanluu12345 · 2024-12-19T21:16:15Z

internal/sequencer/besteffort/best_effort.go

 				return nil
 			})
 		return err
 	})

-	return ret, stats, nil
+	ctx.Go(func(ctx *stopper.Context) error {
+		return ret.Fanout(ctx)


What's the purpose of doing a fanout here? Does this allow us to perform batching across different tables? How is this fan out affected if they have FK constraints?

ryanluu12345 · 2024-12-19T21:19:22Z

internal/sequencer/besteffort/router.go

-		if err := destination.AcceptMultiBatch(ctx, acc, opts); err != nil {
-			return err
-		}
+// AcceptMultiBatch splits the batch based on destination tables.


Thanks for the helpful note. So that means for the PG and MY case, previously we would have had to send updates to any of the target tables single threaded:

table1 / mut 1 -> table 2 / mut 1 -> table2 / mut 2 -> table 1 / mut 2

But now we can fan it out so that we do in separate goroutines.

table1 / mut 1 -> table1 / mut 2 table2 / mut 1 -> table2 / mut 2

ryanluu12345 · 2024-12-19T21:20:39Z

internal/sequencer/besteffort/router.go

+		// Read from the source input.
+		case cursor, ok := <-inputChan:
+			cursorCnt++
+			for name, mut := range cursor.Batch.Data.All() {


Beyond fanning out, do we do any batch applying for the single table case? Like within single tables, can we do like:

(table1 / mut 1 + table1 / mut 2 + table1 / mut 3) as one batch apply

ryanluu12345 · 2024-12-19T21:22:31Z

internal/sequencer/seqtest/seqtest.go

@@ -67,7 +67,7 @@ func (f *Fixture) SequencerFor(
 	case switcher.ModeConsistent:
 		return f.Staging.Wrap(ctx, f.Core)
 	case switcher.ModeImmediate:
-		return f.Immediate, nil
+		return f.Core, nil


Are we now using .Core since there is no more Immediate after this change?

ryanluu12345 · 2024-12-19T21:23:22Z

internal/sequencer/staging/staging.go

@@ -72,3 +87,70 @@ func (s *staging) Start(
 	_, stats, err := s.delegate.Start(ctx, opts)
 	return &acceptor{s.Staging}, stats, err
 }
+
+func (s *staging) StageMutations(


Just to clarify, but is this StageMutations only relevant for modes that touch the staging table for mutations storage? C2X cases I mean.

ryanluu12345 · 2024-12-19T21:27:15Z

internal/source/mylogical/conn.go

-				TargetQuerier: tx,
-			}); err != nil {
-				return nil, err
+			cursor := &types.BatchCursor{


So previously, we were accepting a single batch at once and then flushing single threaded on the target. What is the implication of using the BatchCursor? Does this now get us to use multibatches?

Who is the receiver of this out channel where cursors are written too? Is that the core sequencer that will then do all the batching within and across tables?

ZhouXing19 · 2024-12-19T23:01:04Z

Left some comments here. My remaining thought is this: because this is such an all encompassing change, we risk regressions across all operating modes.

Once we continue this work, I don't think we should merge and release all of this at once. Here is how I'm thinking about the sequencing (pun intended):

Make the core changes to the sequencer and relevant interfaces + implementations as their own standalone change

Make the changes for PG and MySQL first since those will see the most immediate performance benefit

Perform testing and verification of the PG and MySQL cases

Slowly add each mode of operation like objstore, kafka, and C2C

I don't want to embroil us in support tickets across the board in case something goes awry.

I'm wondering if it's possible for us to take this piecemeal approach -- starting with the PG and MySQL sources?

That was the plan -- having smaller PRs to make smaller progress. The tricky thing is how the integration tests are written -- it's convoluted with the script wrapper. It means if we changed the pg/mysql front end, to make sure the CI can be green, we need to modify the interface of the script wrapper too. However, the script wrapper is shared by other frontends and thus other integrations too, meaning we need to change other frontends to accommodate the change too. We make break it into smaller commits, but given how the tests are structured, I don't think there's an easy way to break it into smaller PRs.

ZhouXing19 mentioned this pull request Nov 6, 2024

internal: employ Core sequencer for pglogical & mylogical #1060

Closed

ZhouXing19 force-pushed the jane_pg_mysql_pull branch 2 times, most recently from 1952f7b to 7b85e20 Compare November 6, 2024 17:33

source: use pull model for mylogical and pglogical

d1f24b5

This PR is based on the branch https://github.com/cockroachdb/replicator/tree/bob_core_open to modify the mylogical and pglogical frontend to use a pull based model which is enabled by the BatchReader interface.

ZhouXing19 force-pushed the jane_pg_mysql_pull branch from 7b85e20 to d1f24b5 Compare November 6, 2024 17:34

ZhouXing19 added 2 commits November 6, 2024 14:43

TestMYLogical/consistent passed

d2e6bec

passed for both pg and mysql integration tests

ceb8e5b

ZhouXing19 force-pushed the jane_pg_mysql_pull branch from 91f95ca to ceb8e5b Compare November 7, 2024 21:01

ZhouXing19 added 19 commits November 7, 2024 17:02

return correct acceptor for script/wrapper.Start()

81285d6

draft for script, passed for integreation test (mylogical/pglogical)

b781637

removed mu.acceptor

2043edd

draft for webhook

064a782

fixed for TestCommand with webhook, but slow

95ef9e1

finished fixing for TestHandler/immediate-feed/webhookEndpoints

10b48f3

fixed for TestMergeInt

2a79057

fixed TestHandler/immediate

27d2702

fixed TestHandler/feed

d1240d4

fixed TestHandler

521e6ae

TestKafka passed

7502537

fixed TestApplyWithinRange

37fe3bf

fixed TestLocalProcess

95c7bb1

fixed TestSmoke

648828a

draft for TestUserScriptSequencer/ModeConsistent

2c4fd4f

[stagingErrChans] working for TestUserScriptSequencer/ModeConsistent

c31e535

convo with bob about staging and besteffort

2387598

passed first part of user script

4805a78

worked for besteffort and consistent, broke immediate

9848464

ZhouXing19 force-pushed the jane_pg_mysql_pull branch from e0bfbbd to 9848464 Compare November 22, 2024 16:51

ZhouXing19 added 2 commits November 23, 2024 09:54

TestUserScriptSequencer fully passed

38d5e44

passed TestHandler/feed/deferred/webhookEndpoints with outcome check

013905a

fixed TestHandler and TestKafka

cd92bfc

github-advanced-security bot found potential problems Nov 25, 2024

View reviewed changes

add some noisy logging

73c414f

ryanluu12345 reviewed Dec 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

source: use pull model for mylogical and pglogical #1065

source: use pull model for mylogical and pglogical #1065

ZhouXing19 commented Nov 6, 2024 •

edited by petermattis

Loading

ryanluu12345 left a comment

ryanluu12345 Dec 19, 2024

ryanluu12345 Dec 19, 2024

ryanluu12345 Dec 19, 2024

ryanluu12345 Dec 19, 2024

ryanluu12345 Dec 19, 2024

ryanluu12345 Dec 19, 2024

ZhouXing19 commented Dec 19, 2024

source: use pull model for mylogical and pglogical #1065

Are you sure you want to change the base?

source: use pull model for mylogical and pglogical #1065

Conversation

ZhouXing19 commented Nov 6, 2024 • edited by petermattis Loading

ryanluu12345 left a comment

Choose a reason for hiding this comment

ryanluu12345 Dec 19, 2024

Choose a reason for hiding this comment

ryanluu12345 Dec 19, 2024

Choose a reason for hiding this comment

ryanluu12345 Dec 19, 2024

Choose a reason for hiding this comment

ryanluu12345 Dec 19, 2024

Choose a reason for hiding this comment

ryanluu12345 Dec 19, 2024

Choose a reason for hiding this comment

ryanluu12345 Dec 19, 2024

Choose a reason for hiding this comment

ZhouXing19 commented Dec 19, 2024

ZhouXing19 commented Nov 6, 2024 •

edited by petermattis

Loading