Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add InnerDirichletPartitioner #2794

Merged
merged 16 commits into from
Mar 5, 2024
Merged

Conversation

adam-narozniak
Copy link
Member

@adam-narozniak adam-narozniak commented Jan 15, 2024

Proposal

Add Inner Dirichlet Partitioner based on Federated Learning Based on Dynamic Regularization https://arxiv.org/abs/2111.04263.

Explanation

This variation of Dirchlet-based partitioning differs from the #2795.

This implementation does the following:

  • Originally draws the number of samples - partition sizes - from the lognormal distribution, here it is parameterized by partition_sizes and required to be provided by a user.
  • The Dirichlet distribution is used to create the fraction for each partition separately. There, it decided on the division of classes per partition (the probabilities from Dirichlet sum to one per each partition; central to the original Dirichlet where the fractions sum up to 1 along all partitions of class k).

How does it differ from the original Dirichlet from #2795?
This implementation is used to choose to divide classes among partitions. The Dirichlet distribution is drawn such that the class n is split among p partitions. This is repeated N (number of unique classes) times. Therefore, there's no need to decide on the size of each partition in the original Dirichlet.

Also, what might help to understand the difference is to think about the shape of the alpha value (concentration) for the Dirichlet.
Original: size is equal to the number of partitions (and is repeated the number of unique classes)
This implementation: size equals the number of unique classes (and is repeated the number of partitions).

Note that it also means that at a certain point in this implementation, we run out of the samples from class n while sampling for later partitions, and the code needs to be adjusted for that (to give other samples in that case).

@adam-narozniak adam-narozniak self-assigned this Jan 16, 2024
@adam-narozniak adam-narozniak marked this pull request as ready for review January 16, 2024 16:46
@adam-narozniak adam-narozniak changed the title Inner dirichlet partitioner Add InnerDirichletPartitioner Feb 28, 2024
Copy link
Contributor

@jafermarq jafermarq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments

@danieljanes danieljanes enabled auto-merge (squash) March 5, 2024 11:03
@danieljanes danieljanes merged commit 3043d72 into main Mar 5, 2024
34 checks passed
@danieljanes danieljanes deleted the fds-add-inner-dirichlet-partitioner branch March 5, 2024 20:09
tanertopal added a commit that referenced this pull request Mar 6, 2024
* 'flwr_run' of github.com:adap/flower:
  Add `LegacyMessageType` (#3064)
  Add `MessageType` (#3005)
  Add Flower Client App connection error handling (#2969)
  Refactor `app_dir` arguments (#3061)
  Add InnerDirichletPartitioner (#2794)
  Add NumPy template to new command (#3059)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants