Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Cross-Implementation Benchmarking Dataset for Plutus Performance #1049

Open
sierkov opened this issue Nov 2, 2024 · 4 comments
Open
Labels
help welcomed Contributor friendly uplc Relates to Untyped Plutus Core

Comments

@sierkov
Copy link

sierkov commented Nov 2, 2024

I'm working on a C++ implementation of Plutus aimed at optimizing batch synchronization. We'd like to benchmark our implementation against existing open-source Plutus implementations to foster cross-learning and understand their relative performance. This issue is a request for feedback on the proposed benchmark dataset, as well as for approved code samples representing your implementation to include in our benchmarks. Detailed information is provided below.

The proposed benchmark dataset is driven by the following considerations:

  1. Predictive Power: Benchmark results should allow us to predict the time required for a given implementation to validate all script witnesses on Cardano’s mainnet.
  2. Efficient Runtime: The benchmark should complete quickly to enable rapid experimentation and performance evaluation.
  3. Parallelization Awareness: It must assess both single-threaded and multi-threaded performance to identify implementation approaches that influence the parallel efficiency of script witness validation.
  4. Sufficient Sample Size: The dataset should contain enough samples to allow computing reasonable sub-splits for further analysis, such as by Plutus version or by Cardano era.

The procedure for creating the proposed benchmark dataset is as follows:

  1. Transaction Sampling: Randomly without replacement select a sample of 256,000 mainnet transactions containing Plutus script witnesses. This sample size is chosen as a balance between speed, sufficient data for analysis, and compatibility with high-end server hardware with up to 256 execution threads. The randomness of the sample allows for generalizable predictions of validation time of all transactions with script witnesses.
  2. Script Preparation: For each script witness in the selected transactions, prepare the required arguments and script context data. Save each as a Plutus script in Flat format, with all arguments pre-applied.
  3. File Organization: For easier debugging, organize all extracted scripts using the following filename pattern: <mainnet-epoch>/<transaction-id>-<script-hash>-<redeemer-idx>.flat.

To gather performance data across open-source Plutus implementations, I am reaching out to the projects listed below. If there are any other implementations not listed here, please let me know, as I’d be happy to include them in the benchmark analysis. The known Plutus implementations:

  1. https://github.com/IntersectMBO/plutus
  2. https://github.com/aiken-lang/aiken
  3. https://github.com/nau/scalus
  4. https://github.com/OpShin/uplc

I look forward to your feedback on the proposed benchmark dataset and to your support in providing code that can represent your project in this benchmark.

@rvcas
Copy link
Member

rvcas commented Nov 2, 2024

@sierkov we should use https://github.com/pragma-org/uplc instead of this repo

We are using a set of flat encoded files from the Haskell code base to benchmark against

https://github.com/pragma-org/uplc/blob/main/crates/uplc/benches/benchmarks/haskell.rs

We also have a binary of the Haskell benchmarks to make it possible to run in CI or to just compare against locally.

I'm also working on a go implementation for blinklabs

@sierkov
Copy link
Author

sierkov commented Nov 2, 2024

@rvcas, thank you for the quick response. Sure, I'll add https://github.com/pragma-org/uplc to the list instead. Would recreating this issue in that repository be helpful?

Regarding the go implementation, would you like us to benchmark it as well? If so, could you provide a link to it?

Regarding the benchmarked scripts, in my view, a dataset that is representative of the actual frequency and behavior of scripts executed on the mainnet can help all implementations optimize their performance for the actual profile of scripts.
Speaking practically, It's easier to optimize for things that are easy to measure.
So, the proposed dataset aims to make optimizing for the actual profile easier.
However, here I assume that no implementation targets only a specific subset of scripts.
Could you explain the methodology behind the script selection in your benchmark set?

If you were to translate your concerns into a requirement for the proposed dataset, what would that requirement be? For example, would you like to have some form of compatibility with your existing set? I'd like this dataset to provide practical value to participating projects, so if there are requirements that can help to make it more useful in your day-to-day development activities, I'd love to learn about them.

@MicroProofs MicroProofs added uplc Relates to Untyped Plutus Core help welcomed Contributor friendly labels Nov 13, 2024
@sierkov
Copy link
Author

sierkov commented Nov 19, 2024

@rvcas, I've shared the links to the dataset and the reference benchmarking code in a related issue in the main Plutus repository in which you are tagged as well:
IntersectMBO/plutus#6626

Please, let me know what you've decided about this task. Shall I move it to the https://github.com/pragma-org/uplc?

@KtorZ
Copy link
Member

KtorZ commented Nov 20, 2024

@sierkov, we can keep it here; it's the same people maintaining both repositories anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help welcomed Contributor friendly uplc Relates to Untyped Plutus Core
Projects
Status: 🪣 Backlog
Development

No branches or pull requests

4 participants