-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VM fault fastpath new benchmarks #18
base: master
Are you sure you want to change the base?
Conversation
Something seems funky with the commits, the PR contains some that are already in master. Can you try rebasing your branch over master? |
88cc188
to
b7f13fc
Compare
Think that's fixed now, something weird happened when I was trying to fix an overly long commit message. |
apps/fault/src/main.c
Outdated
volatile int j; | ||
for (j = 0; j < 10000; j++) { | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be intended to pause for a bit. Is this something we do by this kind of loop or is there a wait/pause function this should be calling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was first running these benchmarks, the variance was very high and it appeared to be a sort of bimodal distribution. I read Shane's thesis on the signal fastpath and he encountered the same issues and found that waiting between runs improved this issue. This is how he did it, but there may be a better standard way which I missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with the waiting as such, just wanted to check on the way it's done. @kent-mcleod might know more about the standard way this is done in sel4bench/sel4test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe yielding here is a better option to give the other thread time to run?
libsel4benchsupport/sel4_arch_include/aarch64/sel4_arch/fault.h
Outdated
Show resolved
Hide resolved
Thanks, that is better. The commits now unfortunately all have two sign-off lines (both from you, but with different addresses). Please fix to one. Also, the commits should have a body message that explains what is going on and gives background/reasoning for the change. The style fixups to your own changes should be squashed into the commit that makes the changes, similarly any other minor fixups. As far as I can see, this PR should probably be one or two commits in total. It's fine for reviewing for now, but should be fixed up before we can merge. I'm not the right person to review the content on this one, so will need to leave this to someone else, but more and better benchmarks sounds like a good idea to me. Commented on minor style things inline. |
The files with only empty lines added/removed should be a separate commit. Usually we don't do these unless we also touch the code, but I think these specific ones are not a problem. For those commits it's Ok to leave out a body message and call them |
e74166b
to
b8c65c8
Compare
Could you add a description for the scenario that this implements? |
On 10 Feb 2022, at 10:02, Gerwin Klein ***@***.***> wrote:
Happy with the waiting as such, just wanted to check on the way it's done. @kent-mcleod might know more about the standard way this is done in sel4bench/sel4test.
The methodology presently employed by sel4bench is unsound: Excessive writes leading to stalls that interfere with the operation to be benchmarked. Adding a pause between iterations allowing the writes to drain is a workaround that is ok for now, but eventually the overall approach needs to be fixed.
|
Edited the PR description to add more details. Is there anything else you would like me to include? |
@gernotheiser can you be more specific? My understanding is that sel4bench has always used tight loops and lots of startup iterations to try and minimize cache misses during measurements. In this particular case it appears that most of the writes would be coming from writing TCB register context data into TCB objects by the kernel and not due to overhead writes? In any case, pausing for a certain number of cycles would allow things like store buffers to clear but the pause time should at least become a controlled variable of the benchmark that's maybe varied to show the effect on latency due to stalls? |
Can’t look at the code right now (very poor connection) but from memory a lot of data is written by the benchmarking rig to a buffer to collect statistics, which, on some processors, exceeds the memory bandwidth leading to stalls. This results in large standard deviations due to a bimodal distribution of execution times, when the actual kernel operations measured should be highly deterministic and properly measured, standard deviations should be a few cycles.
I don’t see a need to vary the pause amount, as it has nothing to do with what’s supposed to be measured, the stalls are a pure artefact of the measurement approach. A more appropriate methodology would just accumulate data in a few variables to compute core statistics after the timed loops. I believe @malus-brandywine is working at properly fixing this issue.
|
For this fault benchmark the only statistic that gets written out for each iteration is the number of cycles taken for that iteration as a 64bit value. An additional 64bit variable is used to hold the start time during the operation. I don't see how this would be dominating the storage bandwidth when there's many more registers that get saved during a context switch. |
I have run a the benchmarks again (odroidc2 mcs on + off - this was the platform where I observed the irregular results) with and without the pause, and there appears to be no significant difference between the two in results. Maybe the results I was observing earlier were a result of some mistake I had made in the benchmarks but resolved later on without double checking if the pause was necessary. |
5676be5
to
16540b0
Compare
e672773
to
9bec647
Compare
Hey @alwin-joshy where are we with this PR? Would you be able to rebase and resolve conflicts? Are there any open issues? |
I'll get it cleaned up by tomorrow and let you know |
562afc6
to
28e7ea6
Compare
7d03027
to
3435e43
Compare
This commit adds a set of new benchmarks to test VM faults (primarily for evaluating the performance of the VM-fault fastpath). They were developed using existing code used for testing the correctness of VM faults in seL4test, and modified slightly to make them more useful as benchmarks. Signed-off-by: Alwin Joshy <[email protected]>
Signed-off-by: Alwin Joshy <[email protected]>
Applies patch from seL4#20. This was not merged previously as RISC-V did not implement the hardware and fault benchmarks, but this is no longer the case. Signed-off-by: Alwin Joshy <[email protected]>
3435e43
to
0411a09
Compare
@lsf37 Finally had some time to look into this and have addressed the issues. The PR adds the VM fault benchmark for aarch32/64, x86_64 and RISC-V. There is also a separate commit that properly closes #20, as the fault and hardware benchmarks seem to now be supported for RISC-V (can move this to a separate PR if more appropriate). |
New benchmarks for testing VM fault performance - converted some of the tests from seL4test to benchmarks (can later be extended for other types of benchmarks) as well as added new round trip + mapping benchmark (AARCH64 only).
Three new benchmarks were added, with each of them measuring the following scenarios:
The first two of these are for measuring either direction and the third is scenario of memory management to see the improvement for a real use case.
Both of the first two benchmarks are adapted from seL4test. The way that they work is that the faulter does an invalid memory access which causes a VM fault to occur. The fault handler then updates the register set of the faulting thread such that upon reply, it will restart at a different instruction (as opposed to reattempting the same faulting instruction), and proceed normally.
These benchmarks are all created such that they satisfy the conditions that are required by the vm fault fastpath to pass, such as the fault handler being a passive thread, the faulter having a lower priority than the handler, and so on.