[WIP] subsystem-bench: Add real networking stack #6845

AndreiEres · 2024-12-11T13:56:31Z

In this PR, I added a real networking stack to the subsystem benchmarks.

How the real networking stack works in benchmarks

Currently, in subsystem benchmarks, we initialize a mocked network that simulates communication between nodes without using a network bridge. Since real communication involves sending messages over a transport layer like TCP, and intercepting messages at that level is complicated, we decided to spawn all the nodes as real ones.

The spawned nodes have a network backend bound to local ports, allowing them to communicate with each other. Because we don’t need the full functionality from these nodes, they only include the necessary subsystems, such as a real network bridge and mocked subsystems for communication with the node under test. This approach had potential because we achieve real networking, and the node under test collects actual metrics with its registry.

The problem with too many tokio tasks

The problem is that I can only run the updated benchmark with a limited number of nodes. After adding 40 or more nodes, the tokio runtime blocks on the main task while awaiting results from other spawned tasks, which consequently cannot produce any results. To investigate the issue, I checked the number of tasks that Tokio spawns during the benchmarks.

Tokio tasks spawned during the statement-distribution benchmark:

Network	Nodes	Tasks
real	35	2900
mocked	35	80
mocked	500	380

The number of tasks required to simulate a small real network with only 35 nodes is almost ten times greater than that for a mocked network with 500 nodes. To continue the experiment, I checked the number of tasks for cargo run —bin polkadot – it was around 100. I believe this result is very rough, but I wouldn’t expect it to exceed 300-500 when acting as a validator.

Currently, the benchmarks use the Tokio runtime with 4 worker threads (which should be increased to 8 to accommodate the new reference hardware) and 512 (default) blocking threads. After increasing the number of worker threads to 8 and backing threads to 32,768, I managed to run the benchmark with 50 validators, which is far fewer than 500.

My conclusion

I think the idea of using real nodes in subsystem benchmarks failed. The functionality of all nodes with a real network requires a more fat Tokio runtime than we can afford in our benchmarks. I think we should stop pursuing this direction and explore another approach.

paritytech-workflow-stopper · 2024-12-11T15:22:55Z

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/12279204458
Failed job name: cargo-clippy

AndreiEres · 2025-01-03T16:20:13Z

While working on this PR, I concluded that using real nodes in subsystem benchmarks is unsuccessful. I included all my thoughts in the PR's description.

@sandreim @alexggh @lexnv @dmitry-markin
Please take a look at that. What do you think? Where am I wrong?

sandreim · 2025-01-06T11:49:28Z

While working on this PR, I concluded that using real nodes in subsystem benchmarks is unsuccessful. I included all my thoughts in the PR's description.

I didn't look much at the code, but I don't expect this to work if we spin up full blown network stacks for the emulated nodes. Instead I suggest we look into implementing a minimalist emulated node networking stack. We should extend litep2p on the real node to offer an interface for connecting/disconnecting/messaging/requests that can be directly used by the emulated nodes to send data directly in the real node network stack.

AndreiEres added 18 commits December 5, 2024 17:30

Add dummy network_bridge_tx

528b9f6

Add dummy network_bridge_rx

6b72178

Add network_service

7dfc3b5

Update notification services

ca39735

Remove DummyNetwork

9e29c99

Add debug to DummyAuthotiryDiscoveryService

d905a35

Add peer overseer building

64f5dba

Add peer overseers to NetworkInterface

00cc581

Use right peer ids

5b43767

Try peer to peer test

a2a84e5

Update notification service

fa11395

Add mock for statement-distribution subsystem

32d098e

Use all peers

8d3b445

Pregenerate real peer ids

95e74c0

Connect peers

bb12146

Move fake network handlers to mocked subsystem

53a716e

Small fixes

3f30fc1

Clean up unrelated changes

8b2270b

AndreiEres closed this Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] subsystem-bench: Add real networking stack #6845

[WIP] subsystem-bench: Add real networking stack #6845

AndreiEres commented Dec 11, 2024 •

edited

Loading

paritytech-workflow-stopper bot commented Dec 11, 2024

AndreiEres commented Jan 3, 2025

sandreim commented Jan 6, 2025 •

edited

Loading

[WIP] subsystem-bench: Add real networking stack #6845

[WIP] subsystem-bench: Add real networking stack #6845

Conversation

AndreiEres commented Dec 11, 2024 • edited Loading

How the real networking stack works in benchmarks

The problem with too many tokio tasks

My conclusion

paritytech-workflow-stopper bot commented Dec 11, 2024

AndreiEres commented Jan 3, 2025

sandreim commented Jan 6, 2025 • edited Loading

AndreiEres commented Dec 11, 2024 •

edited

Loading

sandreim commented Jan 6, 2025 •

edited

Loading