Implement FedProx to improve robustness to heterogeneity #802

JulienVig · 2024-10-08T15:36:10Z

Following @rabbanitw's comment, extending our federated averaging methods to support FedProx would greatly improve Disco's robustness to client heterogeneity (data and system).

In short, FedProx adds a regularizing term (called "proximal term") to the local objective function.
From the paper:

The proximal term is beneficial in two aspects:
(1) It addresses the issue of statistical heterogeneity by restricting the local updates to be closer to the initial (global) model without any need to manually set the number of local epochs.
(2) It allows for safely incorporating variable amounts of local work resulting from systems heterogeneity

The algorithm is summarized here.

Disco currently doesn't select a subset of users and relies either on all users or on the first ones that reply with local updates.

martinjaggi · 2024-10-08T18:10:39Z

ok sounds fine. though sth even better and slightly easier to do even is to use momentum (obtained as a difference vector between rounds), which then is added to every local step.
this works in federated and decentralized.

called the mime-light algorithm in
Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
https://arxiv.org/abs/2008.03606

one could start with whichever is easier to implement in disco, and then do some experiments to compare with/without it

rabbanitw · 2024-10-08T18:20:46Z

Thanks @JulienVig. Echoing @martinjaggi, I'm fine with whatever is the easier implementation. As long as we have any type of implementation to mitigate client drift so I can discuss it at GDHF, I think it'll be fine!

JulienVig · 2024-10-09T14:28:57Z

Thanks for the reference! Looking at both FedProx and MimeLite it seems that they should both be relatively easy to implement, though FedProx seems slightly easier (if I'm not mistaken it's just adding a regularizer to our objective function) while MimeLite requires additional communication (clients send the gradients to the server and the server sends the momentum to the clients)
We should investigate both, especially seeing how FedProx seems to underperform in the Mime experiments.

JulienVig added federated For the federated setting discojs Related to Disco.js server Related to the server labels Oct 8, 2024

JulienVig added this to the v4.0.0 milestone Oct 8, 2024

JulienVig mentioned this issue Oct 21, 2024

Disco preprocessing is too constraining #649

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement FedProx to improve robustness to heterogeneity #802

Implement FedProx to improve robustness to heterogeneity #802

JulienVig commented Oct 8, 2024

martinjaggi commented Oct 8, 2024

rabbanitw commented Oct 8, 2024

JulienVig commented Oct 9, 2024

Implement FedProx to improve robustness to heterogeneity #802

Implement FedProx to improve robustness to heterogeneity #802

Comments

JulienVig commented Oct 8, 2024

martinjaggi commented Oct 8, 2024

rabbanitw commented Oct 8, 2024

JulienVig commented Oct 9, 2024