Proposal: Cross-Implementation Benchmarking Dataset for Plutus Performance #1049

sierkov · 2024-11-02T00:29:00Z

I'm working on a C++ implementation of Plutus aimed at optimizing batch synchronization. We'd like to benchmark our implementation against existing open-source Plutus implementations to foster cross-learning and understand their relative performance. This issue is a request for feedback on the proposed benchmark dataset, as well as for approved code samples representing your implementation to include in our benchmarks. Detailed information is provided below.

The proposed benchmark dataset is driven by the following considerations:

Predictive Power: Benchmark results should allow us to predict the time required for a given implementation to validate all script witnesses on Cardano’s mainnet.
Efficient Runtime: The benchmark should complete quickly to enable rapid experimentation and performance evaluation.
Parallelization Awareness: It must assess both single-threaded and multi-threaded performance to identify implementation approaches that influence the parallel efficiency of script witness validation.
Sufficient Sample Size: The dataset should contain enough samples to allow computing reasonable sub-splits for further analysis, such as by Plutus version or by Cardano era.

The procedure for creating the proposed benchmark dataset is as follows:

Transaction Sampling: Randomly without replacement select a sample of 256,000 mainnet transactions containing Plutus script witnesses. This sample size is chosen as a balance between speed, sufficient data for analysis, and compatibility with high-end server hardware with up to 256 execution threads. The randomness of the sample allows for generalizable predictions of validation time of all transactions with script witnesses.
Script Preparation: For each script witness in the selected transactions, prepare the required arguments and script context data. Save each as a Plutus script in Flat format, with all arguments pre-applied.
File Organization: For easier debugging, organize all extracted scripts using the following filename pattern: <mainnet-epoch>/<transaction-id>-<script-hash>-<redeemer-idx>.flat.

To gather performance data across open-source Plutus implementations, I am reaching out to the projects listed below. If there are any other implementations not listed here, please let me know, as I’d be happy to include them in the benchmark analysis. The known Plutus implementations:

I look forward to your feedback on the proposed benchmark dataset and to your support in providing code that can represent your project in this benchmark.

The text was updated successfully, but these errors were encountered:

rvcas · 2024-11-02T01:59:57Z

@sierkov we should use https://github.com/pragma-org/uplc instead of this repo

We are using a set of flat encoded files from the Haskell code base to benchmark against

https://github.com/pragma-org/uplc/blob/main/crates/uplc/benches/benchmarks/haskell.rs

We also have a binary of the Haskell benchmarks to make it possible to run in CI or to just compare against locally.

I'm also working on a go implementation for blinklabs

sierkov · 2024-11-02T19:04:35Z

@rvcas, thank you for the quick response. Sure, I'll add https://github.com/pragma-org/uplc to the list instead. Would recreating this issue in that repository be helpful?

Regarding the go implementation, would you like us to benchmark it as well? If so, could you provide a link to it?

Regarding the benchmarked scripts, in my view, a dataset that is representative of the actual frequency and behavior of scripts executed on the mainnet can help all implementations optimize their performance for the actual profile of scripts.
Speaking practically, It's easier to optimize for things that are easy to measure.
So, the proposed dataset aims to make optimizing for the actual profile easier.
However, here I assume that no implementation targets only a specific subset of scripts.
Could you explain the methodology behind the script selection in your benchmark set?

If you were to translate your concerns into a requirement for the proposed dataset, what would that requirement be? For example, would you like to have some form of compatibility with your existing set? I'd like this dataset to provide practical value to participating projects, so if there are requirements that can help to make it more useful in your day-to-day development activities, I'd love to learn about them.

sierkov · 2024-11-19T18:58:09Z

@rvcas, I've shared the links to the dataset and the reference benchmarking code in a related issue in the main Plutus repository in which you are tagged as well:
IntersectMBO/plutus#6626

Please, let me know what you've decided about this task. Shall I move it to the https://github.com/pragma-org/uplc?

KtorZ · 2024-11-20T08:57:26Z

@sierkov, we can keep it here; it's the same people maintaining both repositories anyway.

github-project-automation bot added this to Project Tracking Nov 2, 2024

github-project-automation bot moved this to 🪣 Backlog in Project Tracking Nov 2, 2024

MicroProofs added uplc Relates to Untyped Plutus Core help welcomed Contributor friendly labels Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Cross-Implementation Benchmarking Dataset for Plutus Performance #1049

Proposal: Cross-Implementation Benchmarking Dataset for Plutus Performance #1049

sierkov commented Nov 2, 2024

rvcas commented Nov 2, 2024 •

edited

Loading

sierkov commented Nov 2, 2024

sierkov commented Nov 19, 2024

KtorZ commented Nov 20, 2024

Proposal: Cross-Implementation Benchmarking Dataset for Plutus Performance #1049

Proposal: Cross-Implementation Benchmarking Dataset for Plutus Performance #1049

Comments

sierkov commented Nov 2, 2024

rvcas commented Nov 2, 2024 • edited Loading

sierkov commented Nov 2, 2024

sierkov commented Nov 19, 2024

KtorZ commented Nov 20, 2024

rvcas commented Nov 2, 2024 •

edited

Loading