CP-50475: parallelize device ops during VM lifecycle ops #6057

psafont · 2024-10-16T12:55:28Z

Operations on different devices should be independent and therefore can be
parallelized. This both means parallelizing operations on different device
types and on devices for the same type.

An atom to serialize action has been introduced because the operation regarding
a single device must be kept serialized.

Also removes some unneeded parallel atoms, which cause some overhead as well as polluting the traces

This makes VM_starts to run about ~1 second faster on some tests where VMs have 5 VIFs. I'm testing it further, but it doesn't look like it's breaking any tests. I want to push it through some bootstorms and measure improvements

Please do suggest some further parallelization if you can find it!

BVT+BST are all green

Parallel atoms do quite a bit of unnecessary actions even when they are empty. They are also not needed when running a single task. They also show as spans in the traces. Removing them makes the traces shorter and easier to read. Co-authored-by: Edwin Török <[email protected]> Signed-off-by: Pau Ruiz Safont <[email protected]>

Operations on different devices should be independent and therefore can be parallelized. This both means parallelizing operations on different device types and on devices for the same type. An atom to serialize action has been introduced because the operation regarding a single device must be kept serialized. Signed-off-by: Pau Ruiz Safont <[email protected]>

robhoes · 2024-10-16T13:22:51Z

I have to look at the details of the PR, but the first thing I wondered is why we need to introduce a new Serial atomic operation, as clearly we are already able to serialise. The list returned by atomics_of_operation is implicitly serial and executed as such by perform_atomics. So now we have two ways of expression serial (atomic) operations.

I can see that nesting is easier with an explicit Serial atomic. Should we then as a next step make all serialisation explicit by turning atomics_of_operation into atomic_of_operation and wrapping all lists with Serial? Then we can merge perform_atomics and perform_atomics as well.

psafont · 2024-10-16T13:40:19Z

I introduced the serial atomic because it was convenient. At some point I also added a nop one, which removes the need to concatenate lists quite a bit, but I'd rather do the latter than the former, as it could end up in the traces.

The list returned by atomics_of_operation is implicitly serial and executed as such by perform_atomics. So now we have two ways of expression serial (atomic) operations.

I can see that nesting is easier with an explicit Serial atomic.

An option to avoid the serial atom would be to make the parallel one to receive a list of lists of atomics ops, instead of a list of atomic ops, although I'm not fond of the implicitness of it.

Should we then as a next step make all serialisation explicit by turning atomics_of_operation into atomic_of_operation and wrapping all lists with Serial? Then we can merge perform_atomics and perform_atomics as well.

I can try :)

psafont · 2024-10-17T08:52:23Z

Trying is a good exercise of yak-shaving: there's code to calculate progress of tasks. It's not clear to me that it works well currently, especially with regard to the parallel atomic op; and merging the functions needs to track progress into the Serial atomic op. The code needs unit-testing to be able to make it's not made even worse and the results are predictable.

The benefits of the current patch can be seen in the traces

Now the critical path doesn't contain the VIFs, and now it's dominated even more by the SM and the need to plug RW VBDs before RO ones

psafont and others added 2 commits October 16, 2024 13:54

psafont requested a review from robhoes October 16, 2024 12:58

robhoes approved these changes Oct 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CP-50475: parallelize device ops during VM lifecycle ops #6057

CP-50475: parallelize device ops during VM lifecycle ops #6057

psafont commented Oct 16, 2024 •

edited

Loading

robhoes commented Oct 16, 2024

psafont commented Oct 16, 2024

psafont commented Oct 17, 2024

CP-50475: parallelize device ops during VM lifecycle ops #6057

Are you sure you want to change the base?

CP-50475: parallelize device ops during VM lifecycle ops #6057

Conversation

psafont commented Oct 16, 2024 • edited Loading

robhoes commented Oct 16, 2024

psafont commented Oct 16, 2024

psafont commented Oct 17, 2024

psafont commented Oct 16, 2024 •

edited

Loading