Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove experimental hot spare policy #948

Merged
merged 1 commit into from
Aug 26, 2024

Conversation

manav-a
Copy link
Contributor

@manav-a manav-a commented Aug 23, 2024

Differential Revision: D61746435

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 23, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61746435

Summary:
Pull Request resolved: pytorch#948

This diff removes the experimental hot spare restart policy and uses role restart instead with quorum hosts set to the min nodes requirement. This isnt the best as this gives us no good way to differentiate between elasticity and quorum based restarts in the future but we can address this by supporting quorum restarts differently in the future.

Differential Revision: D61746435
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61746435

@manav-a manav-a changed the title Temporary Commit at 8/23/2024, 4:23:21 PM Remove experimental hot spare policy Aug 26, 2024
@manav-a manav-a requested a review from kunalb August 26, 2024 14:06
@facebook-github-bot facebook-github-bot merged commit 69129eb into pytorch:main Aug 26, 2024
20 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants