Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Periodically Check For Migration Opportunities #214
Periodically Check For Migration Opportunities #214
Changes from 23 commits
867f41f
4eef2d3
0a971c8
ff175df
aa9fcc8
4efe4ca
13560b3
0b02aba
1b98e6b
6308962
5192100
b3b50d5
fe07c6f
a45cc00
c177e6a
f5fa773
ec38aad
fefbe12
07709f0
6123721
446e90a
c090a16
51dcc95
ba44bdd
28d143e
54ab7b4
3bc2089
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the snapshot client to be a public member in faasm/faasm#565 before migrating a function. In there, we are doing a function chain from a function in a non-master host, to a function in the master host. Faabric won't register, take, and push snapshots in this case; we have to do it ourselves manually. Thus, we need access to the snapshot client from the scheduler instance.
We don't need the function call client to be a public member, but I was reluctant to split the declaration of the clients (happy to revert).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT this is the message that's sent to worker hosts to tell it to migrate a function across hosts. If that's right, doesn't that host already have the messge, so we just need the message ID here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All functions in the app will read the same
PendingMigrations
message. They:PendingMigration
repeated field (migrations
).dstHost
.Maybe the
srcHost
field is not strictly needed, but it can be asserted/makes printing and logging easier.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this
TODO
now?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, faabric does not know when the migration (app-wise) has finished. In fact, faabric does not even know when a migration is taking place. Faabric only is aware that some functions throw exceptions because they have been migrated.
The workaround (hack) I have implemented is the following:
MpiWorld::prepareMigration
the local leader sets a boolean flag in the MPI world (hasBeenMigrated
).hasBeenMigrated
flag.true
remove the pending migration from the map.This uses the fact that faasm will call
prepareMigration
before migrating, and will callMPI_Barrier
after.Unfortunately, this can't really be tested from within faabric (I have tested it using the distributed test in faasm).
I can't think of a not ad-hoc way of doing this due to the lack of information in faabric.