You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The purpose of this test is to (1) verify that the consensus reactor of CometBFT can deal with 134MB blocks across 100 validators, and (2) establish what is the minimum value for timeout_commit that can be used for such block sizes. We do not need to run any celestia-node nodes for the scope of this test. We should also disable mempool tx gossiping in this test and generate transactions locally, as the scope of this test is to test the consensus reactor only.
Validators should have a realistic network latency setup
Set max_bytes in genesis.json to 1073741824 (1GB)
Set broadcast=false in config.toml
Set RecvRate and SendRate to 10000000 (10MB) in config.toml (we can try to adjust this later if that causes issues)
Set timeout_commit in config.toml to 3 - we should play with this to see what's the lowest we can get away with
Each validator should have 16 vCPUs, to ensure that constructing the erasure code isn't a bottleneck
Test
On each validator, run the txsim utility locally on a separate process, with the following options: --blob 500 --blob-sizes 1000000-1000000 --blob-amounts 1-1 --feegrant true. This will create 500 routines on each validator, that are sending PFBs with 1MB blobs. Validators will need an account with sufficient funds to do this.
Record (1) the block size of each block to verify that blocks are getting filled up; (2) the number of signatures of each block to verify that all validators are able to sign and commit each block; (3) network bandwidth statistics
The text was updated successfully, but these errors were encountered:
To achieve this we would need to keep 2 chapters in mind:
Infrastructure for Setup
In order to fulfil 16 vcpus per validator node, the node instance for the k8s cluster should be at least c5.4xlarge(16 vcpus) or c5.9xlarge where we have 32 vcpus per instance. I'd rather start with c5.4xlarge(we have them by default rn) and decrease to 14-15 vcpus at the start for the validator containers that each pod will serve. This will make sure that we scale 100 aws's node instances to a 1/1 ratio
According to recent test runs with full validators' set and QGB, the current infrastructure implementation should not be the bottleneck
Testing Environment
Code base
As per @celestiaorg/celestia-core team members historical usage, it's much better to either branch out test-infra to remove celestia-node part of tests' code base or just fork out and make a canonical test-infra-consensus repo
Most of the setup of validators is complete
config.toml configuration should be straight forward to accommodate changes we need. Same can be said to genesis.json if we need to modify whatever
Network setup
Validators should have a realistic network latency setup
We already have the necessary requisites to start playing with 0/100/200/300+ ms latencies and do it per validator to make the network more 'realistic'. Unfortunately the latency is not dynamically changeable during test execution
Still, I would recommend to kickstart with no bandwidth and latency limitations and go to the monitoring of validators to see unrestricted figures on network per pod/validator in grafana dashboards
Txsim
Currently, we already have a docker image of txsim, that we can pull into the testable DockerFile that is built and run from testground's pov.
This means that we can just add another CLI call in a go test code and point the celestia address of each validator as the master account for the txsim to produce big blobs submissions
The purpose of this test is to (1) verify that the consensus reactor of CometBFT can deal with 134MB blocks across 100 validators, and (2) establish what is the minimum value for
timeout_commit
that can be used for such block sizes. We do not need to run any celestia-node nodes for the scope of this test. We should also disable mempool tx gossiping in this test and generate transactions locally, as the scope of this test is to test the consensus reactor only.Setup
1073741824
(1GB)10000000
(10MB) in config.toml (we can try to adjust this later if that causes issues)3
- we should play with this to see what's the lowest we can get away withTest
txsim
utility locally on a separate process, with the following options:--blob 500 --blob-sizes 1000000-1000000 --blob-amounts 1-1 --feegrant true
. This will create 500 routines on each validator, that are sending PFBs with 1MB blobs. Validators will need an account with sufficient funds to do this.The text was updated successfully, but these errors were encountered: