Pipeline the Pipeline #128

tzaffi · 2023-07-24T18:59:51Z

Description

Allowing for moderate concurrency in the pipeline but without sacrificing its sequential integrity.

Summary of Changes

conduit/pipeline/common.go: introducing generic retrying of pipeline methods via Retries() and RetriesNoOutput()
conduit/pipeline/pipeline.go:
- error/cancellation handling:
  - modify the pipeline's cancellation function to have a cause, and expose it via the WhyStopped() method
  - joinError() instead of setError() to the pipeline's error property
- introducing goroutines for the importer, processors, exporters, and round launcher by refactoring Start() and introducing methods ImportHandler(), ProcessorHandler(), and ExporterHandler()
- E2E end of test signal: modified the info-level log at the end of a round to look like FINISHED Pipeline round: 110. UPDATED Pipeline round: 111 and added WARNING commentary to be careful about changing this format as it would break the E2E test.
conduit/pipeline/pipeline_bench_test.go - benchmarker for the pipeline that includes sleeping plugins with an importer, 2 processors, and an exporter.
pkg/cli/cli.go: remove a line break from the final error printout

Issues

#118

TODO

Testing

E2E

`pipeline_bench_test.go`

Running a new benchmark test twice on the original code and the new, we have the following results. Note the most pertinent results for the typical indexer DB population use case is exporter_10ms_while_others_1ms:

Benchmark Name	Original rounds/sec	Pipelining rounds/sec	Pipelining v Original (%)
vanilla_2_procs_without_sleep-size-1-8	3077	3309.5	+7%
uniform_sleep_of_10ms-size-1-8	22.32	79.815	+250%
exporter_10ms_while_others_1ms-size-1-8	63.405	78.565	+24%
importer_10ms_while_others_1ms-size-1-8	65.535	91.255	+39%
first_processor_10ms_while_others_1ms-size-1-8	60.28	89.175	+48%

Block Generator Results

Running the block generator test using SCENARIO = scenarios/config.allmixed.small.yml for 30s, with the original code and the new, each time for 2 experiments we have:

Reset database?	Original rounds/30 sec	Pipelining rounds/30 sec	Pipelining v Original (%)
Reset	301	400	+33%
No Reset	295	418	+41%

Local test network 5 minute sprint

I used the Justfile command

❯ just conduit-bootstrap-and-go 300

to bootstrap testnet and run a postgresql exporter against it for 300 seconds. I ran it a number of times against both the original pipeline and the new one. Here are the experimental results:

Log Level	Reps	Original rounds/300 sec (logs/round)	Pipelining rounds/300 sec (logs/round)	Pipelining v Original (%)
TRACE	3	3718 (7.0)	3509 (14.0)	-5.6% 😢
INFO	2	4578.5 (3.0)	4423.5 (3.0)	-3.4% 😢

On EC2 - CLASSIC vs. PIPELINING vs. 30 Second Timeout vs. FINAL

I ran catchup tests for 4 versions of conduit:

CLASSIC - what was on master at the time of the run
PIPELINING - the head of the pipelining branch at the same time
30 Second Timeout - the pipelining branch after a 30 second timeout was introduced in the algod importer
FINAL - commit 867973f which is essentially the code meant for merging

There are much more detailed results in a google sheets document, but the summary is:

SUMMARY

codecov · 2023-07-24T19:05:02Z

Codecov Report

Merging #128 (9918073) into master (442791a) will increase coverage by 4.32%.
Report is 52 commits behind head on master.
The diff coverage is 81.89%.

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
+ Coverage   67.66%   71.98%   +4.32%     
==========================================
  Files          32       36       +4     
  Lines        1976     2695     +719     
==========================================
+ Hits         1337     1940     +603     
- Misses        570      657      +87     
- Partials       69       98      +29

Files Changed	Coverage Δ
conduit/data/block_export_data.go	`100.00% <ø> (+92.30%)`	⬆️
conduit/metrics/metrics.go	`100.00% <ø> (ø)`
conduit/pipeline/metadata.go	`69.11% <ø> (ø)`
conduit/plugins/config.go	`100.00% <ø> (ø)`
...duit/plugins/exporters/filewriter/file_exporter.go	`81.63% <ø> (-1.06%)`	⬇️
conduit/plugins/importers/algod/metrics.go	`100.00% <ø> (ø)`
...gins/processors/filterprocessor/fields/searcher.go	`77.50% <ø> (ø)`
...ins/processors/filterprocessor/filter_processor.go	`83.82% <ø> (+3.54%)`	⬆️
...plugins/processors/filterprocessor/gen/generate.go	`34.28% <ø> (ø)`
conduit/plugins/processors/noop/noop_processor.go	`64.70% <ø> (+6.81%)`	⬆️
... and 20 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

conduit/pipeline/pipeline.go

tzaffi · 2023-07-31T02:12:22Z

pkg/cli/cli.go

@@ -134,7 +134,7 @@ Detailed documentation is online: https://github.com/algorand/conduit`,
 		Run: func(cmd *cobra.Command, args []string) {
 			err := runConduitCmdWithConfig(cfg)
 			if err != nil {
-				fmt.Fprintf(os.Stderr, "\nExiting with error:\n%s.\n", err)
+				fmt.Fprintf(os.Stderr, "\nExiting with error:\t%s.\n", err)


This improves the clarity of the E2E tests in the case of an error.

conduit/data/config.go

conduit/pipeline/pipeline.go

…llbacks

conduit/plugins/importers/algod/algod_importer.go

docs/PluginDevelopment.md

tzaffi · 2023-08-10T16:00:00Z

If it's easy for you to run, it might be interesting to see Local test network 5 minute sprint using different log levels to show how logrus/logging in general is impacting processing times.

Can we add this as a task for #131 ?

…lgorand#100

conduit/pipeline/common.go

winder

Pending the validation test, this looks good.

conduit/pipeline/pipeline.go

Co-authored-by: Will Winder <[email protected]>

tzaffi · 2023-08-21T17:24:50Z

pkg/cli/cli.go

@@ -80,7 +80,7 @@ func runConduitCmdWithConfig(args *data.Args) error {
 	}

 	ctx := context.Background()
-	pipeline, err := pipeline.MakePipeline(ctx, pCfg, logger)


No functional changes here, but removing the shadowing of the package by the variable.

benchmark test

bead065

Zeph Grunschlag added 6 commits July 25, 2023 19:03

basic logic in place

cf130e2

basic logic in place

66f1288

common.go unit test

42fa4fd

commentariat

b8a5ee1

better function comment

5504a23

lint

62da700

tzaffi changed the title ~~benchmark test~~ Pipeline the Pipeline Jul 26, 2023

Zeph Grunschlag added 2 commits July 26, 2023 14:37

wip

0aeb483

fix mocking tests

3ce28b8

tzaffi commented Jul 27, 2023

View reviewed changes

conduit/pipeline/pipeline.go Outdated Show resolved Hide resolved

tzaffi commented Jul 27, 2023

View reviewed changes

conduit/pipeline/pipeline.go Show resolved Hide resolved

Zeph Grunschlag added 4 commits July 26, 2023 21:00

WhyStopped

06586e8

cancelWithProblem

6173e38

should still work even when there's no processors

77e2675

addMetrics

5940c9f

tzaffi commented Jul 27, 2023

View reviewed changes

conduit/pipeline/pipeline.go Outdated Show resolved Hide resolved

Zeph Grunschlag added 5 commits July 27, 2023 10:57

more originator details in algod_importer errors

35e635c

pass unit test

15a62df

NewSyncError

4956c9f

lint

579f2e4

bring back E2E finish signaller

94b7bd8

tzaffi commented Jul 28, 2023

View reviewed changes

conduit/pipeline/pipeline.go Show resolved Hide resolved

does e2e pass in CI?

cd7e53c

tzaffi mentioned this pull request Jul 27, 2023

Pipeline the Pipeline #118

Closed

tzaffi commented Jul 31, 2023

View reviewed changes

finer logging granularity

4c1756f

tzaffi commented Jul 31, 2023

View reviewed changes

conduit/data/config.go Outdated Show resolved Hide resolved

tzaffi commented Jul 31, 2023

View reviewed changes

conduit/pipeline/pipeline.go Outdated Show resolved Hide resolved

Zeph Grunschlag added 3 commits August 9, 2023 21:54

remove OStart()

674c949

don't shodow the pipeline package

7e88db3

enable if !errors.Is(pl.WhyStopped(), pipeline.BecauseStopMethod)

a160e28

tzaffi commented Aug 10, 2023

View reviewed changes

conduit/pipeline/pipeline.go Outdated Show resolved Hide resolved

tzaffi and others added 2 commits August 9, 2023 22:36

Update conduit/pipeline/pipeline.go

7b7cf2f

fail fast if can't save metadata and carry on without retrying the ca…

cb86036

…llbacks

tzaffi commented Aug 10, 2023

View reviewed changes

conduit/plugins/importers/algod/algod_importer.go Outdated Show resolved Hide resolved

tzaffi commented Aug 10, 2023

View reviewed changes

docs/PluginDevelopment.md Show resolved Hide resolved

Zeph Grunschlag added 4 commits August 11, 2023 13:20

Merge branch 'master' into pipelining

8c1e6fa

simplify cancellation cause and punt to issue Graceful Pipeline Exit a…

1158f75

…lgorand#100

logstatsE2Elog

8b88301

don't send metrics through the plugin channels

867973f

This was referenced Aug 17, 2023

Make pipeline.NextRound atomic #140

Open

api: New API with health endpoint #139

Merged

winder reviewed Aug 18, 2023

View reviewed changes

conduit/pipeline/common.go Show resolved Hide resolved

winder approved these changes Aug 18, 2023

View reviewed changes

conduit/pipeline/pipeline.go Show resolved Hide resolved

conduit/pipeline/pipeline.go Show resolved Hide resolved

tzaffi mentioned this pull request Aug 18, 2023

Pipelining Followups #141

Open

tzaffi and others added 6 commits August 18, 2023 10:48

Apply suggestions from code review

b36eb15

Co-authored-by: Will Winder <[email protected]>

docs: Remove quotes in file_write examples. (algorand#137)

07e2e03

build: add version to release filename. (algorand#138)

136996b

temp commit

300677f

revert

ded0284

Merge branch 'master' into pipelining

9918073

tzaffi commented Aug 21, 2023

View reviewed changes

shiqizng approved these changes Aug 21, 2023

View reviewed changes

tzaffi merged commit 0095fc9 into algorand:master Aug 21, 2023
3 checks passed

tzaffi deleted the pipelining branch August 21, 2023 18:39

This was referenced Aug 29, 2023

Pipelining followups tzaffi/conduit#8

Closed

Pipelining followups #147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline the Pipeline #128

Pipeline the Pipeline #128

tzaffi commented Jul 24, 2023 •

edited

Loading

codecov bot commented Jul 24, 2023 •

edited

Loading

tzaffi Jul 31, 2023

tzaffi commented Aug 10, 2023

winder left a comment

tzaffi Aug 21, 2023

Pipeline the Pipeline #128

Pipeline the Pipeline #128

Conversation

tzaffi commented Jul 24, 2023 • edited Loading

Description

Summary of Changes

Issues

TODO

Testing

E2E

pipeline_bench_test.go

Block Generator Results

Local test network 5 minute sprint

On EC2 - CLASSIC vs. PIPELINING vs. 30 Second Timeout vs. FINAL

SUMMARY

codecov bot commented Jul 24, 2023 • edited Loading

Codecov Report

tzaffi Jul 31, 2023

Choose a reason for hiding this comment

tzaffi commented Aug 10, 2023

winder left a comment

Choose a reason for hiding this comment

tzaffi Aug 21, 2023

Choose a reason for hiding this comment

tzaffi commented Jul 24, 2023 •

edited

Loading

`pipeline_bench_test.go`

codecov bot commented Jul 24, 2023 •

edited

Loading