Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] subsystem-bench: Add real networking stack #6845

Closed
wants to merge 18 commits into from

Conversation

AndreiEres
Copy link
Contributor

@AndreiEres AndreiEres commented Dec 11, 2024

In this PR, I added a real networking stack to the subsystem benchmarks.

How the real networking stack works in benchmarks

Currently, in subsystem benchmarks, we initialize a mocked network that simulates communication between nodes without using a network bridge. Since real communication involves sending messages over a transport layer like TCP, and intercepting messages at that level is complicated, we decided to spawn all the nodes as real ones.

The spawned nodes have a network backend bound to local ports, allowing them to communicate with each other. Because we don’t need the full functionality from these nodes, they only include the necessary subsystems, such as a real network bridge and mocked subsystems for communication with the node under test. This approach had potential because we achieve real networking, and the node under test collects actual metrics with its registry.

The problem with too many tokio tasks

The problem is that I can only run the updated benchmark with a limited number of nodes. After adding 40 or more nodes, the tokio runtime blocks on the main task while awaiting results from other spawned tasks, which consequently cannot produce any results. To investigate the issue, I checked the number of tasks that Tokio spawns during the benchmarks.

Tokio tasks spawned during the statement-distribution benchmark:

Network Nodes Tasks
real 35 2900
mocked 35 80
mocked 500 380

The number of tasks required to simulate a small real network with only 35 nodes is almost ten times greater than that for a mocked network with 500 nodes. To continue the experiment, I checked the number of tasks for cargo run —bin polkadot – it was around 100. I believe this result is very rough, but I wouldn’t expect it to exceed 300-500 when acting as a validator.

Currently, the benchmarks use the Tokio runtime with 4 worker threads (which should be increased to 8 to accommodate the new reference hardware) and 512 (default) blocking threads. After increasing the number of worker threads to 8 and backing threads to 32,768, I managed to run the benchmark with 50 validators, which is far fewer than 500.

My conclusion

I think the idea of using real nodes in subsystem benchmarks failed. The functionality of all nodes with a real network requires a more fat Tokio runtime than we can afford in our benchmarks. I think we should stop pursuing this direction and explore another approach.

@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/12279204458
Failed job name: cargo-clippy

@AndreiEres
Copy link
Contributor Author

While working on this PR, I concluded that using real nodes in subsystem benchmarks is unsuccessful. I included all my thoughts in the PR's description.

@sandreim @alexggh @lexnv @dmitry-markin
Please take a look at that. What do you think? Where am I wrong?

@sandreim
Copy link
Contributor

sandreim commented Jan 6, 2025

While working on this PR, I concluded that using real nodes in subsystem benchmarks is unsuccessful. I included all my thoughts in the PR's description.

I didn't look much at the code, but I don't expect this to work if we spin up full blown network stacks for the emulated nodes. Instead I suggest we look into implementing a minimalist emulated node networking stack. We should extend litep2p on the real node to offer an interface for connecting/disconnecting/messaging/requests that can be directly used by the emulated nodes to send data directly in the real node network stack.

@AndreiEres AndreiEres closed this Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants