Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM fault fastpath new benchmarks #18

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

alwin-joshy
Copy link

@alwin-joshy alwin-joshy commented Feb 9, 2022

New benchmarks for testing VM fault performance - converted some of the tests from seL4test to benchmarks (can later be extended for other types of benchmarks) as well as added new round trip + mapping benchmark (AARCH64 only).

Three new benchmarks were added, with each of them measuring the following scenarios:

  • Performace of VM fault -> VM fault handler
  • Performance of VM fault handler -> VM faulter
  • VM fault round trip including cost of remapping page from read only to read write (Only available on AARCH64 - could extend to other architectures but my feeling is that this one is mostly unnecessary as a combination of the first two tell much the same story).

The first two of these are for measuring either direction and the third is scenario of memory management to see the improvement for a real use case.

Both of the first two benchmarks are adapted from seL4test. The way that they work is that the faulter does an invalid memory access which causes a VM fault to occur. The fault handler then updates the register set of the faulting thread such that upon reply, it will restart at a different instruction (as opposed to reattempting the same faulting instruction), and proceed normally.

These benchmarks are all created such that they satisfy the conditions that are required by the vm fault fastpath to pass, such as the fault handler being a passive thread, the faulter having a lower priority than the handler, and so on.

@lsf37
Copy link
Member

lsf37 commented Feb 9, 2022

Something seems funky with the commits, the PR contains some that are already in master. Can you try rebasing your branch over master?

@alwin-joshy alwin-joshy force-pushed the fault_fp_pr3 branch 2 times, most recently from 88cc188 to b7f13fc Compare February 9, 2022 09:50
@alwin-joshy
Copy link
Author

Something seems funky with the commits, the PR contains some that are already in master. Can you try rebasing your branch over master?

Think that's fixed now, something weird happened when I was trying to fix an overly long commit message.

Comment on lines 376 to 379
volatile int j;
for (j = 0; j < 10000; j++) {

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be intended to pause for a bit. Is this something we do by this kind of loop or is there a wait/pause function this should be calling?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was first running these benchmarks, the variance was very high and it appeared to be a sort of bimodal distribution. I read Shane's thesis on the signal fastpath and he encountered the same issues and found that waiting between runs improved this issue. This is how he did it, but there may be a better standard way which I missed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the waiting as such, just wanted to check on the way it's done. @kent-mcleod might know more about the standard way this is done in sel4bench/sel4test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe yielding here is a better option to give the other thread time to run?

apps/sel4bench/src/fault.c Outdated Show resolved Hide resolved
@lsf37
Copy link
Member

lsf37 commented Feb 9, 2022

Think that's fixed now, something weird happened when I was trying to fix an overly long commit message.

Thanks, that is better. The commits now unfortunately all have two sign-off lines (both from you, but with different addresses). Please fix to one. Also, the commits should have a body message that explains what is going on and gives background/reasoning for the change. The style fixups to your own changes should be squashed into the commit that makes the changes, similarly any other minor fixups. As far as I can see, this PR should probably be one or two commits in total.

It's fine for reviewing for now, but should be fixed up before we can merge.

I'm not the right person to review the content on this one, so will need to leave this to someone else, but more and better benchmarks sounds like a good idea to me. Commented on minor style things inline.

@lsf37
Copy link
Member

lsf37 commented Feb 9, 2022

The files with only empty lines added/removed should be a separate commit. Usually we don't do these unless we also touch the code, but I think these specific ones are not a problem. For those commits it's Ok to leave out a body message and call them trivial: formatting or something like that.

apps/fault/src/main.c Outdated Show resolved Hide resolved
@alwin-joshy alwin-joshy force-pushed the fault_fp_pr3 branch 2 times, most recently from e74166b to b8c65c8 Compare February 9, 2022 23:40
@axel-h
Copy link
Member

axel-h commented Feb 10, 2022

Could you add a description for the scenario that this implements?

apps/fault/src/main.c Outdated Show resolved Hide resolved
@gernotheiser
Copy link
Member

gernotheiser commented Feb 10, 2022 via email

@alwin-joshy
Copy link
Author

Could you add a description for the scenario that this implements?

Edited the PR description to add more details. Is there anything else you would like me to include?

@kent-mcleod
Copy link
Member

The methodology presently employed by sel4bench is unsound: Excessive writes leading to stalls that interfere with the operation to be benchmarked. Adding a pause between iterations allowing the writes to drain is a workaround that is ok for now, but eventually the overall approach needs to be fixed.

@gernotheiser can you be more specific? My understanding is that sel4bench has always used tight loops and lots of startup iterations to try and minimize cache misses during measurements. In this particular case it appears that most of the writes would be coming from writing TCB register context data into TCB objects by the kernel and not due to overhead writes? In any case, pausing for a certain number of cycles would allow things like store buffers to clear but the pause time should at least become a controlled variable of the benchmark that's maybe varied to show the effect on latency due to stalls?

@gernotheiser
Copy link
Member

gernotheiser commented Feb 10, 2022 via email

@kent-mcleod
Copy link
Member

Can’t look at the code right now (very poor connection) but from memory a lot of data is written by the benchmarking rig to a buffer to collect statistics, which, on some processors, exceeds the memory bandwidth leading to stalls.

For this fault benchmark the only statistic that gets written out for each iteration is the number of cycles taken for that iteration as a 64bit value. An additional 64bit variable is used to hold the start time during the operation. I don't see how this would be dominating the storage bandwidth when there's many more registers that get saved during a context switch.

@alwin-joshy
Copy link
Author

alwin-joshy commented Feb 10, 2022

Can’t look at the code right now (very poor connection) but from memory a lot of data is written by the benchmarking rig to a buffer to collect statistics, which, on some processors, exceeds the memory bandwidth leading to stalls.

For this fault benchmark the only statistic that gets written out for each iteration is the number of cycles taken for that iteration as a 64bit value. An additional 64bit variable is used to hold the start time during the operation. I don't see how this would be dominating the storage bandwidth when there's many more registers that get saved during a context switch.

I have run a the benchmarks again (odroidc2 mcs on + off - this was the platform where I observed the irregular results) with and without the pause, and there appears to be no significant difference between the two in results. Maybe the results I was observing earlier were a result of some mistake I had made in the benchmarks but resolved later on without double checking if the pause was necessary.

@alwin-joshy alwin-joshy force-pushed the fault_fp_pr3 branch 3 times, most recently from 5676be5 to 16540b0 Compare February 10, 2022 07:49
@lsf37 lsf37 added the hw-test sel4bench hardware runs label Feb 12, 2022
apps/fault/src/main.c Outdated Show resolved Hide resolved
@alwin-joshy alwin-joshy force-pushed the fault_fp_pr3 branch 2 times, most recently from e672773 to 9bec647 Compare February 28, 2023 04:34
@lsf37
Copy link
Member

lsf37 commented May 4, 2023

Hey @alwin-joshy where are we with this PR? Would you be able to rebase and resolve conflicts? Are there any open issues?

@alwin-joshy
Copy link
Author

I'll get it cleaned up by tomorrow and let you know

@alwin-joshy alwin-joshy force-pushed the fault_fp_pr3 branch 2 times, most recently from 562afc6 to 28e7ea6 Compare June 5, 2023 08:23
@alwin-joshy alwin-joshy force-pushed the fault_fp_pr3 branch 9 times, most recently from 7d03027 to 3435e43 Compare September 10, 2024 09:35
This commit adds a set of new benchmarks to test VM faults (primarily
for evaluating the performance of the VM-fault fastpath). They were
developed using existing code used for testing the correctness of VM
faults in seL4test, and modified slightly to make them more useful as
benchmarks.

Signed-off-by: Alwin Joshy <[email protected]>
Applies patch from seL4#20. This was
not merged previously as RISC-V did not implement the hardware and
fault benchmarks, but this is no longer the case.

Signed-off-by: Alwin Joshy <[email protected]>
@alwin-joshy
Copy link
Author

@lsf37 Finally had some time to look into this and have addressed the issues.

The PR adds the VM fault benchmark for aarch32/64, x86_64 and RISC-V. There is also a separate commit that properly closes #20, as the fault and hardware benchmarks seem to now be supported for RISC-V (can move this to a separate PR if more appropriate).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hw-test sel4bench hardware runs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants