VM fault fastpath new benchmarks #18

alwin-joshy · 2022-02-09T06:00:17Z

New benchmarks for testing VM fault performance - converted some of the tests from seL4test to benchmarks (can later be extended for other types of benchmarks) as well as added new round trip + mapping benchmark (AARCH64 only).

Three new benchmarks were added, with each of them measuring the following scenarios:

Performace of VM fault -> VM fault handler
Performance of VM fault handler -> VM faulter
VM fault round trip including cost of remapping page from read only to read write (Only available on AARCH64 - could extend to other architectures but my feeling is that this one is mostly unnecessary as a combination of the first two tell much the same story).

The first two of these are for measuring either direction and the third is scenario of memory management to see the improvement for a real use case.

Both of the first two benchmarks are adapted from seL4test. The way that they work is that the faulter does an invalid memory access which causes a VM fault to occur. The fault handler then updates the register set of the faulting thread such that upon reply, it will restart at a different instruction (as opposed to reattempting the same faulting instruction), and proceed normally.

These benchmarks are all created such that they satisfy the conditions that are required by the vm fault fastpath to pass, such as the fault handler being a passive thread, the faulter having a lower priority than the handler, and so on.

lsf37 · 2022-02-09T07:11:13Z

Something seems funky with the commits, the PR contains some that are already in master. Can you try rebasing your branch over master?

alwin-joshy · 2022-02-09T09:52:11Z

Something seems funky with the commits, the PR contains some that are already in master. Can you try rebasing your branch over master?

Think that's fixed now, something weird happened when I was trying to fix an overly long commit message.

lsf37 · 2022-02-09T21:09:09Z

apps/fault/src/main.c

+        volatile int j;
+        for (j = 0; j < 10000; j++) {
+
+        }


This seems to be intended to pause for a bit. Is this something we do by this kind of loop or is there a wait/pause function this should be calling?

When I was first running these benchmarks, the variance was very high and it appeared to be a sort of bimodal distribution. I read Shane's thesis on the signal fastpath and he encountered the same issues and found that waiting between runs improved this issue. This is how he did it, but there may be a better standard way which I missed.

Happy with the waiting as such, just wanted to check on the way it's done. @kent-mcleod might know more about the standard way this is done in sel4bench/sel4test.

Maybe yielding here is a better option to give the other thread time to run?

libsel4benchsupport/sel4_arch_include/aarch64/sel4_arch/fault.h

apps/sel4bench/src/fault.c

lsf37 · 2022-02-09T21:20:22Z

Think that's fixed now, something weird happened when I was trying to fix an overly long commit message.

Thanks, that is better. The commits now unfortunately all have two sign-off lines (both from you, but with different addresses). Please fix to one. Also, the commits should have a body message that explains what is going on and gives background/reasoning for the change. The style fixups to your own changes should be squashed into the commit that makes the changes, similarly any other minor fixups. As far as I can see, this PR should probably be one or two commits in total.

It's fine for reviewing for now, but should be fixed up before we can merge.

I'm not the right person to review the content on this one, so will need to leave this to someone else, but more and better benchmarks sounds like a good idea to me. Commented on minor style things inline.

lsf37 · 2022-02-09T21:26:40Z

The files with only empty lines added/removed should be a separate commit. Usually we don't do these unless we also touch the code, but I think these specific ones are not a problem. For those commits it's Ok to leave out a body message and call them trivial: formatting or something like that.

apps/fault/src/main.c

axel-h · 2022-02-10T00:09:57Z

Could you add a description for the scenario that this implements?

apps/fault/src/main.c

gernotheiser · 2022-02-10T01:20:24Z

On 10 Feb 2022, at 10:02, Gerwin Klein ***@***.***> wrote: Happy with the waiting as such, just wanted to check on the way it's done. @kent-mcleod might know more about the standard way this is done in sel4bench/sel4test.

The methodology presently employed by sel4bench is unsound: Excessive writes leading to stalls that interfere with the operation to be benchmarked. Adding a pause between iterations allowing the writes to drain is a workaround that is ok for now, but eventually the overall approach needs to be fixed.

alwin-joshy · 2022-02-10T01:44:29Z

Could you add a description for the scenario that this implements?

Edited the PR description to add more details. Is there anything else you would like me to include?

kent-mcleod · 2022-02-10T03:57:05Z

The methodology presently employed by sel4bench is unsound: Excessive writes leading to stalls that interfere with the operation to be benchmarked. Adding a pause between iterations allowing the writes to drain is a workaround that is ok for now, but eventually the overall approach needs to be fixed.

@gernotheiser can you be more specific? My understanding is that sel4bench has always used tight loops and lots of startup iterations to try and minimize cache misses during measurements. In this particular case it appears that most of the writes would be coming from writing TCB register context data into TCB objects by the kernel and not due to overhead writes? In any case, pausing for a certain number of cycles would allow things like store buffers to clear but the pause time should at least become a controlled variable of the benchmark that's maybe varied to show the effect on latency due to stalls?

gernotheiser · 2022-02-10T05:55:59Z

Can’t look at the code right now (very poor connection) but from memory a lot of data is written by the benchmarking rig to a buffer to collect statistics, which, on some processors, exceeds the memory bandwidth leading to stalls. This results in large standard deviations due to a bimodal distribution of execution times, when the actual kernel operations measured should be highly deterministic and properly measured, standard deviations should be a few cycles. I don’t see a need to vary the pause amount, as it has nothing to do with what’s supposed to be measured, the stalls are a pure artefact of the measurement approach. A more appropriate methodology would just accumulate data in a few variables to compute core statistics after the timed loops. I believe @malus-brandywine is working at properly fixing this issue.

kent-mcleod · 2022-02-10T06:27:34Z

Can’t look at the code right now (very poor connection) but from memory a lot of data is written by the benchmarking rig to a buffer to collect statistics, which, on some processors, exceeds the memory bandwidth leading to stalls.

For this fault benchmark the only statistic that gets written out for each iteration is the number of cycles taken for that iteration as a 64bit value. An additional 64bit variable is used to hold the start time during the operation. I don't see how this would be dominating the storage bandwidth when there's many more registers that get saved during a context switch.

alwin-joshy · 2022-02-10T07:03:45Z

Can’t look at the code right now (very poor connection) but from memory a lot of data is written by the benchmarking rig to a buffer to collect statistics, which, on some processors, exceeds the memory bandwidth leading to stalls.

For this fault benchmark the only statistic that gets written out for each iteration is the number of cycles taken for that iteration as a 64bit value. An additional 64bit variable is used to hold the start time during the operation. I don't see how this would be dominating the storage bandwidth when there's many more registers that get saved during a context switch.

I have run a the benchmarks again (odroidc2 mcs on + off - this was the platform where I observed the irregular results) with and without the pause, and there appears to be no significant difference between the two in results. Maybe the results I was observing earlier were a result of some mistake I had made in the benchmarks but resolved later on without double checking if the pause was necessary.

apps/fault/src/main.c

lsf37 · 2023-05-04T03:45:03Z

Hey @alwin-joshy where are we with this PR? Would you be able to rebase and resolve conflicts? Are there any open issues?

alwin-joshy · 2023-05-04T03:47:17Z

I'll get it cleaned up by tomorrow and let you know

This commit adds a set of new benchmarks to test VM faults (primarily for evaluating the performance of the VM-fault fastpath). They were developed using existing code used for testing the correctness of VM faults in seL4test, and modified slightly to make them more useful as benchmarks. Signed-off-by: Alwin Joshy <[email protected]>

Signed-off-by: Alwin Joshy <[email protected]>

Applies patch from seL4#20. This was not merged previously as RISC-V did not implement the hardware and fault benchmarks, but this is no longer the case. Signed-off-by: Alwin Joshy <[email protected]>

alwin-joshy · 2024-09-11T01:06:45Z

@lsf37 Finally had some time to look into this and have addressed the issues.

The PR adds the VM fault benchmark for aarch32/64, x86_64 and RISC-V. There is also a separate commit that properly closes #20, as the fault and hardware benchmarks seem to now be supported for RISC-V (can move this to a separate PR if more appropriate).

alwin-joshy force-pushed the fault_fp_pr3 branch 2 times, most recently from 88cc188 to b7f13fc Compare February 9, 2022 09:50

lsf37 reviewed Feb 9, 2022

View reviewed changes

libsel4benchsupport/sel4_arch_include/aarch64/sel4_arch/fault.h Outdated Show resolved Hide resolved

lsf37 reviewed Feb 9, 2022

View reviewed changes

apps/sel4bench/src/fault.c Outdated Show resolved Hide resolved

axel-h reviewed Feb 9, 2022

View reviewed changes

apps/fault/src/main.c Outdated Show resolved Hide resolved

alwin-joshy force-pushed the fault_fp_pr3 branch 2 times, most recently from e74166b to b8c65c8 Compare February 9, 2022 23:40

axel-h reviewed Feb 10, 2022

View reviewed changes

apps/fault/src/main.c Outdated Show resolved Hide resolved

alwin-joshy force-pushed the fault_fp_pr3 branch 3 times, most recently from 5676be5 to 16540b0 Compare February 10, 2022 07:49

lsf37 added the hw-test sel4bench hardware runs label Feb 12, 2022

kent-mcleod reviewed Feb 26, 2023

View reviewed changes

apps/fault/src/main.c Outdated Show resolved Hide resolved

alwin-joshy mentioned this pull request Feb 27, 2023

VM fault fastpath seL4/seL4#744

Merged

alwin-joshy force-pushed the fault_fp_pr3 branch 2 times, most recently from e672773 to 9bec647 Compare February 28, 2023 04:34

alwin-joshy force-pushed the fault_fp_pr3 branch 2 times, most recently from 562afc6 to 28e7ea6 Compare June 5, 2023 08:23

alwin-joshy force-pushed the fault_fp_pr3 branch 9 times, most recently from 7d03027 to 3435e43 Compare September 10, 2024 09:35

alwin-joshy added 3 commits September 11, 2024 10:45

bench/vm_fault: support x86_64, AArch32, RISC-V

b34d62a

Signed-off-by: Alwin Joshy <[email protected]>

bench/riscv: enable config of benchmark options

0411a09

Applies patch from seL4#20. This was not merged previously as RISC-V did not implement the hardware and fault benchmarks, but this is no longer the case. Signed-off-by: Alwin Joshy <[email protected]>

alwin-joshy force-pushed the fault_fp_pr3 branch from 3435e43 to 0411a09 Compare September 11, 2024 00:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VM fault fastpath new benchmarks #18

VM fault fastpath new benchmarks #18

alwin-joshy commented Feb 9, 2022 •

edited

Loading

lsf37 commented Feb 9, 2022

alwin-joshy commented Feb 9, 2022

lsf37 Feb 9, 2022

alwin-joshy Feb 9, 2022

lsf37 Feb 9, 2022

axel-h Feb 10, 2022

lsf37 commented Feb 9, 2022

lsf37 commented Feb 9, 2022

axel-h commented Feb 10, 2022

gernotheiser commented Feb 10, 2022 via email

alwin-joshy commented Feb 10, 2022

kent-mcleod commented Feb 10, 2022

gernotheiser commented Feb 10, 2022 via email

kent-mcleod commented Feb 10, 2022

alwin-joshy commented Feb 10, 2022 •

edited

Loading

lsf37 commented May 4, 2023

alwin-joshy commented May 4, 2023

alwin-joshy commented Sep 11, 2024

VM fault fastpath new benchmarks #18

Are you sure you want to change the base?

VM fault fastpath new benchmarks #18

Conversation

alwin-joshy commented Feb 9, 2022 • edited Loading

lsf37 commented Feb 9, 2022

alwin-joshy commented Feb 9, 2022

lsf37 Feb 9, 2022

Choose a reason for hiding this comment

alwin-joshy Feb 9, 2022

Choose a reason for hiding this comment

lsf37 Feb 9, 2022

Choose a reason for hiding this comment

axel-h Feb 10, 2022

Choose a reason for hiding this comment

lsf37 commented Feb 9, 2022

lsf37 commented Feb 9, 2022

axel-h commented Feb 10, 2022

gernotheiser commented Feb 10, 2022 via email

alwin-joshy commented Feb 10, 2022

kent-mcleod commented Feb 10, 2022

gernotheiser commented Feb 10, 2022 via email

kent-mcleod commented Feb 10, 2022

alwin-joshy commented Feb 10, 2022 • edited Loading

lsf37 commented May 4, 2023

alwin-joshy commented May 4, 2023

alwin-joshy commented Sep 11, 2024

alwin-joshy commented Feb 9, 2022 •

edited

Loading

alwin-joshy commented Feb 10, 2022 •

edited

Loading