-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshots overhaul #176
Snapshots overhaul #176
Conversation
|
||
faabric::util::SnapshotData snap; | ||
snap.size = initialSnapSize; | ||
snap.data = faabric::util::allocateVirtualMemory(expandedSnapSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you introduce a std::unique_ptr-like (or even just std::unique_ptr with a custom deleter) wrapper for these allocations? I've been purging all uses of new/delete and using RAII memory management where possible, this feels like a step backwards. I know this could be impossible for the general case of applying snapshots onto faaslets, but at least in testing code the lifetime of these mappings is limited to a function call scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right re. test code, we can definitely do a better job there. As you say, we might not be able to avoid a raw pointer version of this everywhere, but I'll see what I can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two observations after a bit of fiddling:
-
To interface properly with Faasm, the snapshot API (
SnapshotRegistry
,SnapshotData
) still needs to support raw pointers (as both WAMR and WAVM still use them for wasm memory), hence we can't go full RAII on a lot of the underlying snapshot logic. We could introduce alternative versions of the API/ objects that support smart pointers alongside raw pointers, but this will result in a fair bit of extra code. -
I agree that using smart pointers in tests for memory provisioned on the fly is a good idea, and I can almost get something with a
unique_ptr
working, however, the custom deleter needs to callmunmap
and needs to know the size of the memory that was orinallymmap
ed, which makes defining the deleter a little more tricky. I got this far:
std::unique_ptr<uint8_t[]> doAlloc(size_t size, int prot, int flags)
{
auto deleter = [size](uint8_t* u) { munmap(u, size); };
std::unique_ptr<uint8_t[], decltype(deleter)> mem(
(uint8_t*)::mmap(nullptr, size, prot, flags, -1, 0), deleter);
return mem;
}
However, this won't compile as the return type of the function also needs to specify the type of the deleter, but I can't work out the syntax. Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use std::function<void(uint8_t*)>
as the deleter type to store that lambda
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative is to use a shared_ptr
, which I think should be more flexible re. the custom deleter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a minimal working example with unique_ptr:
#include <memory>
#include <functional>
#include <sys/mman.h>
using OwnedMmapRegion = std::unique_ptr<uint8_t[], std::function<void(uint8_t*)>>;
OwnedMmapRegion doAlloc(size_t size, int prot, int flags)
{
auto deleter = [size](uint8_t* u) { munmap(u, size); };
OwnedMmapRegion mem(
(uint8_t*)::mmap(nullptr, size, prot, flags, -1, 0), deleter);
return mem;
}
https://godbolt.org/z/TdTzaz9rW
I'm not sure what you mean by shared_ptr being more flexible with the custom deleter, the only difference would be that you could use reference counting to keep track when the last user is gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, cheers @KubaSz. What I meant re. shared_ptr
being more flexible is that I don't think the deleter type needs to be included in the pointer type, unlike a unique_ptr
, so the return type of the function could still be std::shared_ptr<uint8_t[]>
even with the deleter (and I could dodge the original problem I was facing). Anywho, your version is great so I'll see how things turn out with that.
src/snapshot/SnapshotServer.cpp
Outdated
// TODO - somehow avoid this copy? | ||
std::memcpy(snap.data, r->contents()->Data(), snap.size); | ||
|
||
reg.takeSnapshot(r->key()->str(), data, true); | ||
// TODO - avoid a further copy in here when setting up fd | ||
reg.takeSnapshot(r->key()->str(), snap, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least one copy could be avoided here by adding an API to create a snapshot (including its memfd) of a given size without data, then making the caller responsible for filling in the data. The call would have to create the memfd, ftruncate it and mmap it, saving the mmap location in the data
field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that might work, I'll see what I can do.
.github/workflows/sanitisers.yml
Outdated
@@ -99,7 +99,7 @@ jobs: | |||
${{ runner.os }}- | |||
# --- Build with thread sanitising | |||
- name: "Build tests with address sanitising" | |||
run: inv dev.sanitise Address --noclean | |||
run: inv dev.sanitise Address faabric_tests --noclean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a target to the dev.sanitise
task rather than assuming it's always the tests.
bin/workon.sh
Outdated
# Check these match the CI build | ||
export ASAN_OPTIONS = "verbosity=1:halt_on_error=1" | ||
export TSAN_OPTIONS="halt_on_error=1:suppressions=${PROJ_ROOT}/thread-sanitizer-ignorelist.txt:history_size=7:second_deadlock_stack=1" | ||
export UBSAN_OPTIONS="print_stacktrace=1:halt_on_error=1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important for replicating CI problems locally (without halt_on_error=1
it's hard to tell if anything's breaking).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks very good now, just a few minor comments/questions left from me
Apologies, this PR has turned into a beast, but I think it's worth spending time on, as it's one of the more fiddly parts of the system.
The logic for creating, updating, diffing and merging snapshots is currently spread over several classes, quite difficult to understand because of the unclear ownership of data, and sprinkled with data races. In this PR I refactor all the snapshot-related code into something (hopefully) easier to understand, and safer to use.
The
snapshot.h
andsnapshot.cpp
files have gotten way too big and should be split up by class, however, that would make the diff for this PR even larger than it already is, so can be done in a separate PR.Functionality changes:
SnapshotData
now has a max size, and will allocate a contiguous portion of virtual memory into which it may expand in future (in the case of wasm this will be of size 4GB). If data is being copied into the snapshot that is larger than the current size, it will allocate new portions of memory.SnapshotData
andSnapshotDiff
take ownership of their own data, and a newMemoryView
class is used to represent views of memory (which was previously rolled into these two classes).Refactoring:
SnapshotServer
class, but is now in theSnapshotData
class.SnapshotData
uses proper RAII to manage its contents.SnapshotDiff
s are now queued and applied only at explicit synchronisation points. Without this, changes to the snapshot on the main host may be interleaved and mess up the diffing process on that host. This has an additional benefit of making the process simpler to reason about.std::shared_ptr<SnapshotData>
rather than references. This reduces risk of confusion over copying and shared ownership of data (and we can delete the copy and assignment constructors).SnapshotData
class itself rather than spread between it and theSnapshotRegistry
.mmap
andmprotect
calls inline.getDirtyPages
is nowgetDirtyRegions
to support a minimal set of diffs (rather than one per page even if the pages are contiguous).