Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodically Check For Migration Opportunities #214

Merged
merged 27 commits into from
Feb 7, 2022

Conversation

csegarragonz
Copy link
Collaborator

@csegarragonz csegarragonz commented Jan 7, 2022

In this PR I introduce the ability to migrate detect migration opportunities, and migrate MPI functions.

Function migration in faabric is divided in two independent processes:

Detection of migration opportunities:

  • A function can opt in to be migrated by setting a positive value in a message field.
  • The first time the scheduler schedules a function with said parameter, it starts a background thread that checks for migration opportunities (only the master host for an app. checks for migration opportunities for it).
  • The type of migration opportunity to be looked for is specified with a MigrationStrategy but currently only MigrationStrategy::BIN_PACK is supported: we try to pack functions from hosts with fewer to hosts with more. In-flight apps willing to be checked for migration opportunities, and the opportunities found are stored in shared data structures.
  • Lastly, when a migration opportunity is found, the scheduler notifies all hosts running functions for the app that can be migrated.
  • The combination of synchronous cross-host function calls and locks in the scheduler guarantees that once the master host releases write lock on the migrations data structure, all other hosts' schedulers are already aware of it.

Performing the actual migration:

  • The actual migration is handled by faasm (Migration PoC faasm#565).
  • The only think we need to do from faabric is to prepare for the migration. Preparation is ad-hoc to the type of app being migrated, we implement MpiWorld::prepareMigration.

Known issues:

  • At the moment, we only remove a pendingMigration when execution finishes. Even if we migrate. It is not clear when is it safe to remove a migration after migrating.

@csegarragonz csegarragonz self-assigned this Jan 7, 2022
@csegarragonz csegarragonz force-pushed the thread-check-migrations-new branch 9 times, most recently from fc4c9d0 to 754ee92 Compare January 8, 2022 15:02
// during the migration? msg.set_groupid();

if (migrationStrategy ==
faabric::util::MigrationStrategy::BIN_PACK) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is quite similar to the makeSchedulingDecision code but with some differences that made it preferrable to be implemented as a separate bit of code:

  • First, we are not actually scheduling new functions, we are just checking if a particular kind of re-arrangement is possible.
  • Second, we don't enforce the changes, therefore we actually don't have to persist them (in the HostResources).

}

std::vector<std::shared_ptr<faabric::PendingMigrations>>
Scheduler::doCheckForMigrationOpportunities(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for a new method other than re-using makeSchedulingDecisions arises from the fact that:

  • In this method we just check for the ability to migrate, but do not actually change the underlying host resources.
  • The scheduling and migration policies can differ. The former being more complex and the latter simpler.

@csegarragonz csegarragonz changed the title [NEW] Periodically Check For Migration Opportunities Periodically Check For Migration Opportunities Jan 9, 2022
faabric::PendingMigrations msg;
msg.set_appid(originalDecision.appId);

if (migrationStrategy == faabric::util::MigrationStrategy::BIN_PACK) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't implemented the EMPTY_HOSTS migration strategy, so I haven't implemented support an env. variable to set the migration strategy.

@Shillaker Shillaker marked this pull request as draft January 12, 2022 07:48
@@ -1554,12 +1568,10 @@ std::shared_ptr<InMemoryMpiQueue> MpiWorld::getLocalQueue(int sendRank,
// Note - the queues themselves perform concurrency control
void MpiWorld::initLocalQueues()
{
// Assert we only allocate queues once
assert(localQueues.size() == 0);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove this assertion as now this method is also called at the end of prepareMigration to update the localQueues with the new ranks to hosts mapping.

if (msg.recordexecgraph()) {
world.setMsgForRank(msg);
}
world.setMsgForRank(msg);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need the message when migrating.

@@ -303,6 +309,9 @@ faabric::util::SchedulingDecision Scheduler::makeSchedulingDecision(

// Work out how many we can handle locally
int slots = thisHostResources.slots();
if (topologyHint == faabric::util::SchedulingTopologyHint::UNDERFULL) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all the logic required to implement the UNDERFULL topology hint. I could move it to a separate PR but it is literally just this and the tests.

// Remove the app from in-flight map if still there, and this host is the
// master host for the message
if (msg.masterhost() == thisHost) {
removePendingMigration(msg.appid());
Copy link
Collaborator Author

@csegarragonz csegarragonz Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the PR description, this is the only time we remove a pending migration.
In particular, we don't remove it after we do a migration. It is not clear when it is safe to do that.

The main reason for the complexity to check is that, at the moment, we don't hit a faabric-based barrier after migrating. The PoC for a migration app is designed in a way that functions hit an application-based barrier after migrating, and that is how we know we are good to continue executing.

faabric::scheduler::FunctionCallClient& getFunctionCallClient(
const std::string& otherHost);

faabric::snapshot::SnapshotClient& getSnapshotClient(
Copy link
Collaborator Author

@csegarragonz csegarragonz Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the snapshot client to be a public member in faasm/faasm#565 before migrating a function. In there, we are doing a function chain from a function in a non-master host, to a function in the master host. Faabric won't register, take, and push snapshots in this case; we have to do it ourselves manually. Thus, we need access to the snapshot client from the scheduler instance.

We don't need the function call client to be a public member, but I was reluctant to split the declaration of the clients (happy to revert).

@csegarragonz csegarragonz marked this pull request as ready for review February 3, 2022 11:24
src/scheduler/Scheduler.cpp Show resolved Hide resolved
src/scheduler/Executor.cpp Outdated Show resolved Hide resolved

// MPI migration
if (msg.ismpi()) {
// TODO - delete the pending migration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this TODO now?

Copy link
Collaborator Author

@csegarragonz csegarragonz Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, faabric does not know when the migration (app-wise) has finished. In fact, faabric does not even know when a migration is taking place. Faabric only is aware that some functions throw exceptions because they have been migrated.

The workaround (hack) I have implemented is the following:

  • When we call MpiWorld::prepareMigration the local leader sets a boolean flag in the MPI world (hasBeenMigrated).
  • Everytime we hit a barrier, after all ranks have hit the barrier (and before they have left it). The local leader checks for the hasBeenMigrated flag.
  • If true remove the pending migration from the map.

This uses the fact that faasm will call prepareMigration before migrating, and will call MPI_Barrier after.
Unfortunately, this can't really be tested from within faabric (I have tested it using the distributed test in faasm).

I can't think of a not ad-hoc way of doing this due to the lack of information in faabric.

src/scheduler/MpiWorld.cpp Outdated Show resolved Hide resolved
…heck for flag in barriers to remove used pendingMigrations in the scheduler
…tion, remove unnecessary field to callFunctions, and re-factor necessary calls to callFunctions
@Shillaker Shillaker merged commit 4f4628a into master Feb 7, 2022
@Shillaker Shillaker deleted the thread-check-migrations-new branch February 7, 2022 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants