Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline the Pipeline #128

Merged
merged 53 commits into from
Aug 21, 2023
Merged

Pipeline the Pipeline #128

merged 53 commits into from
Aug 21, 2023

Conversation

tzaffi
Copy link
Contributor

@tzaffi tzaffi commented Jul 24, 2023

Description

Allowing for moderate concurrency in the pipeline but without sacrificing its sequential integrity.

Summary of Changes

  • conduit/pipeline/common.go: introducing generic retrying of pipeline methods via Retries() and RetriesNoOutput()
  • conduit/pipeline/pipeline.go:
    • error/cancellation handling:
      • modify the pipeline's cancellation function to have a cause, and expose it via the WhyStopped() method
      • joinError() instead of setError() to the pipeline's error property
    • introducing goroutines for the importer, processors, exporters, and round launcher by refactoring Start() and introducing methods ImportHandler(), ProcessorHandler(), and ExporterHandler()
    • E2E end of test signal: modified the info-level log at the end of a round to look like FINISHED Pipeline round: 110. UPDATED Pipeline round: 111 and added WARNING commentary to be careful about changing this format as it would break the E2E test.
  • conduit/pipeline/pipeline_bench_test.go - benchmarker for the pipeline that includes sleeping plugins with an importer, 2 processors, and an exporter.
  • pkg/cli/cli.go: remove a line break from the final error printout

Issues

#118

TODO

  • Cleanup E2E code
  • Merge in more granular timestamps in logs #129
  • Don't embed PipelineData inside of BlockData
  • Remove OStart() and any resulting orphans in pipeline.go
  • Merge latest from master which includes 30 second timeout Algod Importer: Longer Timeout #133
  • Should we send a telemetry observation based on WhyStopped() ? NOT FOR NOW. Reconsider this for enhancement: graceful pipeline interrupt #97
  • What should be logged and how should internal-tools/.../logstats.go be modified? This is under the umbrella of the "Logging Plugin Performance" thread
  • Should we send metrics for the callbacks? Not in this PR
  • Test on EC2 with the final changes in place.
  • Update the results from that test in the spreadsheet table image below.

Testing

E2E

pipeline_bench_test.go

Running a new benchmark test twice on the original code and the new, we have the following results. Note the most pertinent results for the typical indexer DB population use case is exporter_10ms_while_others_1ms:

Benchmark Name Original rounds/sec Pipelining rounds/sec Pipelining v Original (%)
vanilla_2_procs_without_sleep-size-1-8 3077 3309.5 +7%
uniform_sleep_of_10ms-size-1-8 22.32 79.815 +250%
exporter_10ms_while_others_1ms-size-1-8 63.405 78.565 +24%
importer_10ms_while_others_1ms-size-1-8 65.535 91.255 +39%
first_processor_10ms_while_others_1ms-size-1-8 60.28 89.175 +48%

Block Generator Results

Running the block generator test using SCENARIO = scenarios/config.allmixed.small.yml for 30s, with the original code and the new, each time for 2 experiments we have:

Reset database? Original rounds/30 sec Pipelining rounds/30 sec Pipelining v Original (%)
Reset 301 400 +33%
No Reset 295 418 +41%

Local test network 5 minute sprint

I used the Justfile command

❯ just conduit-bootstrap-and-go 300

to bootstrap testnet and run a postgresql exporter against it for 300 seconds. I ran it a number of times against both the original pipeline and the new one. Here are the experimental results:

Log Level Reps Original rounds/300 sec (logs/round) Pipelining rounds/300 sec (logs/round) Pipelining v Original (%)
TRACE 3 3718 (7.0) 3509 (14.0) -5.6% 😢
INFO 2 4578.5 (3.0) 4423.5 (3.0) -3.4% 😢

On EC2 - CLASSIC vs. PIPELINING vs. 30 Second Timeout vs. FINAL

I ran catchup tests for 4 versions of conduit:

  • CLASSIC - what was on master at the time of the run
  • PIPELINING - the head of the pipelining branch at the same time
  • 30 Second Timeout - the pipelining branch after a 30 second timeout was introduced in the algod importer
  • FINAL - commit 867973f which is essentially the code meant for merging

There are much more detailed results in a google sheets document, but the summary is:

SUMMARY

image

@codecov
Copy link

codecov bot commented Jul 24, 2023

Codecov Report

Merging #128 (9918073) into master (442791a) will increase coverage by 4.32%.
Report is 52 commits behind head on master.
The diff coverage is 81.89%.

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
+ Coverage   67.66%   71.98%   +4.32%     
==========================================
  Files          32       36       +4     
  Lines        1976     2695     +719     
==========================================
+ Hits         1337     1940     +603     
- Misses        570      657      +87     
- Partials       69       98      +29     
Files Changed Coverage Δ
conduit/data/block_export_data.go 100.00% <ø> (+92.30%) ⬆️
conduit/metrics/metrics.go 100.00% <ø> (ø)
conduit/pipeline/metadata.go 69.11% <ø> (ø)
conduit/plugins/config.go 100.00% <ø> (ø)
...duit/plugins/exporters/filewriter/file_exporter.go 81.63% <ø> (-1.06%) ⬇️
conduit/plugins/importers/algod/metrics.go 100.00% <ø> (ø)
...gins/processors/filterprocessor/fields/searcher.go 77.50% <ø> (ø)
...ins/processors/filterprocessor/filter_processor.go 83.82% <ø> (+3.54%) ⬆️
...plugins/processors/filterprocessor/gen/generate.go 34.28% <ø> (ø)
conduit/plugins/processors/noop/noop_processor.go 64.70% <ø> (+6.81%) ⬆️
... and 20 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@tzaffi tzaffi changed the title benchmark test Pipeline the Pipeline Jul 26, 2023
conduit/pipeline/pipeline.go Outdated Show resolved Hide resolved
conduit/pipeline/pipeline.go Outdated Show resolved Hide resolved
@tzaffi tzaffi mentioned this pull request Jul 27, 2023
@@ -134,7 +134,7 @@ Detailed documentation is online: https://github.com/algorand/conduit`,
Run: func(cmd *cobra.Command, args []string) {
err := runConduitCmdWithConfig(cfg)
if err != nil {
fmt.Fprintf(os.Stderr, "\nExiting with error:\n%s.\n", err)
fmt.Fprintf(os.Stderr, "\nExiting with error:\t%s.\n", err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This improves the clarity of the E2E tests in the case of an error.

conduit/data/config.go Outdated Show resolved Hide resolved
conduit/pipeline/pipeline.go Outdated Show resolved Hide resolved
conduit/pipeline/pipeline.go Outdated Show resolved Hide resolved
@tzaffi
Copy link
Contributor Author

tzaffi commented Aug 10, 2023

If it's easy for you to run, it might be interesting to see Local test network 5 minute sprint using different log levels to show how logrus/logging in general is impacting processing times.

Can we add this as a task for #131 ?

Copy link
Contributor

@winder winder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending the validation test, this looks good.

conduit/pipeline/pipeline.go Show resolved Hide resolved
conduit/pipeline/pipeline.go Show resolved Hide resolved
@tzaffi tzaffi mentioned this pull request Aug 18, 2023
@@ -80,7 +80,7 @@ func runConduitCmdWithConfig(args *data.Args) error {
}

ctx := context.Background()
pipeline, err := pipeline.MakePipeline(ctx, pCfg, logger)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No functional changes here, but removing the shadowing of the package by the variable.

@tzaffi tzaffi merged commit 0095fc9 into algorand:master Aug 21, 2023
3 checks passed
@tzaffi tzaffi deleted the pipelining branch August 21, 2023 18:39
This was referenced Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants