Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement FedProx to improve robustness to heterogeneity #802

Open
JulienVig opened this issue Oct 8, 2024 · 3 comments
Open

Implement FedProx to improve robustness to heterogeneity #802

JulienVig opened this issue Oct 8, 2024 · 3 comments
Labels
discojs Related to Disco.js federated For the federated setting server Related to the server
Milestone

Comments

@JulienVig
Copy link
Collaborator

Following @rabbanitw's comment, extending our federated averaging methods to support FedProx would greatly improve Disco's robustness to client heterogeneity (data and system).

In short, FedProx adds a regularizing term (called "proximal term") to the local objective function.
From the paper:

The proximal term is beneficial in two aspects:
(1) It addresses the issue of statistical heterogeneity by restricting the local updates to be closer to the initial (global) model without any need to manually set the number of local epochs.
(2) It allows for safely incorporating variable amounts of local work resulting from systems heterogeneity

The algorithm is summarized here.
Screenshot 2024-10-08 at 17 30 05
Disco currently doesn't select a subset of users and relies either on all users or on the first ones that reply with local updates.

@JulienVig JulienVig added federated For the federated setting discojs Related to Disco.js server Related to the server labels Oct 8, 2024
@JulienVig JulienVig added this to the v4.0.0 milestone Oct 8, 2024
@martinjaggi
Copy link
Member

ok sounds fine. though sth even better and slightly easier to do even is to use momentum (obtained as a difference vector between rounds), which then is added to every local step.
this works in federated and decentralized.

called the mime-light algorithm in
Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
https://arxiv.org/abs/2008.03606

one could start with whichever is easier to implement in disco, and then do some experiments to compare with/without it

@rabbanitw
Copy link
Collaborator

Thanks @JulienVig. Echoing @martinjaggi, I'm fine with whatever is the easier implementation. As long as we have any type of implementation to mitigate client drift so I can discuss it at GDHF, I think it'll be fine!

@JulienVig
Copy link
Collaborator Author

Thanks for the reference! Looking at both FedProx and MimeLite it seems that they should both be relatively easy to implement, though FedProx seems slightly easier (if I'm not mistaken it's just adding a regularizer to our objective function) while MimeLite requires additional communication (clients send the gradients to the server and the server sends the momentum to the clients)
We should investigate both, especially seeing how FedProx seems to underperform in the Mime experiments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discojs Related to Disco.js federated For the federated setting server Related to the server
Projects
None yet
Development

No branches or pull requests

3 participants