You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following @rabbanitw's comment, extending our federated averaging methods to support FedProx would greatly improve Disco's robustness to client heterogeneity (data and system).
In short, FedProx adds a regularizing term (called "proximal term") to the local objective function.
From the paper:
The proximal term is beneficial in two aspects:
(1) It addresses the issue of statistical heterogeneity by restricting the local updates to be closer to the initial (global) model without any need to manually set the number of local epochs.
(2) It allows for safely incorporating variable amounts of local work resulting from systems heterogeneity
The algorithm is summarized here.
Disco currently doesn't select a subset of users and relies either on all users or on the first ones that reply with local updates.
The text was updated successfully, but these errors were encountered:
ok sounds fine. though sth even better and slightly easier to do even is to use momentum (obtained as a difference vector between rounds), which then is added to every local step.
this works in federated and decentralized.
called the mime-light algorithm in
Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning https://arxiv.org/abs/2008.03606
one could start with whichever is easier to implement in disco, and then do some experiments to compare with/without it
Thanks @JulienVig. Echoing @martinjaggi, I'm fine with whatever is the easier implementation. As long as we have any type of implementation to mitigate client drift so I can discuss it at GDHF, I think it'll be fine!
Thanks for the reference! Looking at both FedProx and MimeLite it seems that they should both be relatively easy to implement, though FedProx seems slightly easier (if I'm not mistaken it's just adding a regularizer to our objective function) while MimeLite requires additional communication (clients send the gradients to the server and the server sends the momentum to the clients)
We should investigate both, especially seeing how FedProx seems to underperform in the Mime experiments.
Following @rabbanitw's comment, extending our federated averaging methods to support FedProx would greatly improve Disco's robustness to client heterogeneity (data and system).
In short, FedProx adds a regularizing term (called "proximal term") to the local objective function.
From the paper:
The algorithm is summarized here.
Disco currently doesn't select a subset of users and relies either on all users or on the first ones that reply with local updates.
The text was updated successfully, but these errors were encountered: