-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Distributors to make it acutally do its jobs #68
Refactor Distributors to make it acutally do its jobs #68
Conversation
This pull request was exported from Phabricator. Differential Revision: D64708506 |
…ch#68) Summary: In the current Shampoo design, there are three major components: `DistributedShampoo`, which describes the high-level algorithm flow; `Distributor`, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); and `PreconditionerList`, which contains the detailed algorithm implementation. The `Distributor` manages the distribution of parameters and then sends them to `DistributedShampoo` which invokes the corresponding algorithms implemented by each `PreconditionerList`. Since the `Distributor` handles parameter distribution, ideally, downstream classes (i.e., `DistributedShampoo` and `PreconditionerList`) should not need to use `Distributor.distributor_selector` to compress parameters for their use. However, in the current implementation, such usages appear in both [`DistributedShampoo`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/distributed_shampoo.py?lines=614-617) and [`PreconditionerList`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/utils/shampoo_preconditioner_list.py?lines=145-146). The main changes are listed as follows: 1. `Distributor` now only provides access to the local version of `params` and `block_info` to align with its responsibilities. 2. `Distributor` no longer needs to store `global_block_info_list`, the global version of `block_info`; instead, it stores `local_block_info_list`. 3. `PreconditionerList` no longer requires `distributor_selector` as an input argument because `Distributor` already performs this task. Differential Revision: D64708506
This pull request was exported from Phabricator. Differential Revision: D64708506 |
30a1c45
to
9f5b13e
Compare
…ch#68) Summary: In the current Shampoo design, there are three major components: `DistributedShampoo`, which describes the high-level algorithm flow; `Distributor`, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); and `PreconditionerList`, which contains the detailed algorithm implementation. The `Distributor` manages the distribution of parameters and then sends them to `DistributedShampoo` which invokes the corresponding algorithms implemented by each `PreconditionerList`. Since the `Distributor` handles parameter distribution, ideally, downstream classes (i.e., `DistributedShampoo` and `PreconditionerList`) should not need to use `Distributor.distributor_selector` to compress parameters for their use. However, in the current implementation, such usages appear in both [`DistributedShampoo`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/distributed_shampoo.py?lines=614-617) and [`PreconditionerList`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/utils/shampoo_preconditioner_list.py?lines=145-146). The main changes are listed as follows: 1. `Distributor` now only provides access to the local version of `params` and `block_info` to align with its responsibilities. 2. `Distributor` no longer needs to store `global_block_info_list`, the global version of `block_info`; instead, it stores `local_block_info_list`. 3. `PreconditionerList` no longer requires `distributor_selector` as an input argument because `Distributor` already performs this task. Reviewed By: chuanhaozhuge Differential Revision: D64708506
9f5b13e
to
c78d9a9
Compare
This pull request was exported from Phabricator. Differential Revision: D64708506 |
…ch#68) Summary: In the current Shampoo design, there are three major components: `DistributedShampoo`, which describes the high-level algorithm flow; `Distributor`, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); and `PreconditionerList`, which contains the detailed algorithm implementation. The `Distributor` manages the distribution of parameters and then sends them to `DistributedShampoo` which invokes the corresponding algorithms implemented by each `PreconditionerList`. Since the `Distributor` handles parameter distribution, ideally, downstream classes (i.e., `DistributedShampoo` and `PreconditionerList`) should not need to use `Distributor.distributor_selector` to compress parameters for their use. However, in the current implementation, such usages appear in both [`DistributedShampoo`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/distributed_shampoo.py?lines=614-617) and [`PreconditionerList`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/utils/shampoo_preconditioner_list.py?lines=145-146). The main changes are listed as follows: 1. `Distributor` now only provides access to the local version of `params` and `block_info` to align with its responsibilities. 2. `Distributor` no longer needs to store `global_block_info_list`, the global version of `block_info`; instead, it stores `local_block_info_list`. 3. `PreconditionerList` no longer requires `distributor_selector` as an input argument because `Distributor` already performs this task. Reviewed By: chuanhaozhuge Differential Revision: D64708506
c78d9a9
to
139a674
Compare
This pull request was exported from Phabricator. Differential Revision: D64708506 |
…ch#68) Summary: In the current Shampoo design, there are three major components: `DistributedShampoo`, which describes the high-level algorithm flow; `Distributor`, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); and `PreconditionerList`, which contains the detailed algorithm implementation. The `Distributor` manages the distribution of parameters and then sends them to `DistributedShampoo` which invokes the corresponding algorithms implemented by each `PreconditionerList`. Since the `Distributor` handles parameter distribution, ideally, downstream classes (i.e., `DistributedShampoo` and `PreconditionerList`) should not need to use `Distributor.distributor_selector` to compress parameters for their use. However, in the current implementation, such usages appear in both [`DistributedShampoo`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/distributed_shampoo.py?lines=614-617) and [`PreconditionerList`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/utils/shampoo_preconditioner_list.py?lines=145-146). The main changes are listed as follows: 1. `Distributor` now only provides access to the local version of `params` and `block_info` to align with its responsibilities. 2. `Distributor` no longer needs to store `global_block_info_list`, the global version of `block_info`; instead, it stores `local_block_info_list`. 3. `PreconditionerList` no longer requires `distributor_selector` as an input argument because `Distributor` already performs this task. Reviewed By: chuanhaozhuge Differential Revision: D64708506
139a674
to
e46f2e2
Compare
This pull request was exported from Phabricator. Differential Revision: D64708506 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now (ignoring the internal failures).
…ch#68) Summary: In the current Shampoo design, there are three major components: `DistributedShampoo`, which describes the high-level algorithm flow; `Distributor`, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); and `PreconditionerList`, which contains the detailed algorithm implementation. The `Distributor` manages the distribution of parameters and then sends them to `DistributedShampoo` which invokes the corresponding algorithms implemented by each `PreconditionerList`. Since the `Distributor` handles parameter distribution, ideally, downstream classes (i.e., `DistributedShampoo` and `PreconditionerList`) should not need to use `Distributor.distributor_selector` to compress parameters for their use. However, in the current implementation, such usages appear in both [`DistributedShampoo`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/distributed_shampoo.py?lines=614-617) and [`PreconditionerList`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/utils/shampoo_preconditioner_list.py?lines=145-146). The main changes are listed as follows: 1. `Distributor` now only provides access to the local version of `params` and `block_info` to align with its responsibilities. 2. `Distributor` no longer needs to store `global_block_info_list`, the global version of `block_info`; instead, it stores `local_block_info_list`. 3. `PreconditionerList` no longer requires `distributor_selector` as an input argument because `Distributor` already performs this task. Reviewed By: anana10c, chuanhaozhuge Differential Revision: D64708506
e46f2e2
to
e055ba5
Compare
This pull request was exported from Phabricator. Differential Revision: D64708506 |
…ch#68) Summary: In the current Shampoo design, there are three major components: `DistributedShampoo`, which describes the high-level algorithm flow; `Distributor`, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); and `PreconditionerList`, which contains the detailed algorithm implementation. The `Distributor` manages the distribution of parameters and then sends them to `DistributedShampoo` which invokes the corresponding algorithms implemented by each `PreconditionerList`. Since the `Distributor` handles parameter distribution, ideally, downstream classes (i.e., `DistributedShampoo` and `PreconditionerList`) should not need to use `Distributor.distributor_selector` to compress parameters for their use. However, in the current implementation, such usages appear in both [`DistributedShampoo`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/distributed_shampoo.py?lines=614-617) and [`PreconditionerList`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/utils/shampoo_preconditioner_list.py?lines=145-146). The main changes are listed as follows: 1. `Distributor` now only provides access to the local version of `params` and `block_info` to align with its responsibilities. 2. `Distributor` no longer needs to store `global_block_info_list`, the global version of `block_info`; instead, it stores `local_block_info_list`. 3. `PreconditionerList` no longer requires `distributor_selector` as an input argument because `Distributor` already performs this task. Reviewed By: anana10c, chuanhaozhuge Differential Revision: D64708506
e055ba5
to
57d5dc9
Compare
This pull request was exported from Phabricator. Differential Revision: D64708506 |
…ch#68) Summary: In the current Shampoo design, there are three major components: `DistributedShampoo`, which describes the high-level algorithm flow; `Distributor`, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); and `PreconditionerList`, which contains the detailed algorithm implementation. The `Distributor` manages the distribution of parameters and then sends them to `DistributedShampoo` which invokes the corresponding algorithms implemented by each `PreconditionerList`. Since the `Distributor` handles parameter distribution, ideally, downstream classes (i.e., `DistributedShampoo` and `PreconditionerList`) should not need to use `Distributor.distributor_selector` to compress parameters for their use. However, in the current implementation, such usages appear in both [`DistributedShampoo`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/distributed_shampoo.py?lines=614-617) and [`PreconditionerList`](https://www.internalfb.com/code/fbsource/[20bcdb4983a16c2cd7f00995754851159da91ed3]/fbcode/hpc/optimizers/distributed_shampoo/dev/utils/shampoo_preconditioner_list.py?lines=145-146). The main changes are listed as follows: 1. `Distributor` now only provides access to the local version of `params` and `block_info` to align with its responsibilities. 2. `Distributor` no longer needs to store `global_block_info_list`, the global version of `block_info`; instead, it stores `local_block_info_list`. 3. `PreconditionerList` no longer requires `distributor_selector` as an input argument because `Distributor` already performs this task. Reviewed By: anana10c, chuanhaozhuge Differential Revision: D64708506
This pull request has been merged in 618d857. |
Summary:
In the current Shampoo design, there are three major components:
DistributedShampoo
, which describes the high-level algorithm flow;Distributor
, which manages the distribution of parameters for different computing paradigms (e.g., DDP, FSDP, HSDP, etc.); andPreconditionerList
, which contains the detailed algorithm implementation.The
Distributor
manages the distribution of parameters and then sends them toDistributedShampoo
which invokes the corresponding algorithms implemented by eachPreconditionerList
.Since the
Distributor
handles parameter distribution, ideally, downstream classes (i.e.,DistributedShampoo
andPreconditionerList
) should not need to useDistributor.distributor_selector
to compress parameters for their use. However, in the current implementation, such usages appear in bothDistributedShampoo
andPreconditionerList
.The main changes are listed as follows:
Distributor
now only provides access to the local version ofparams
andblock_info
to align with its responsibilities.Distributor
no longer needs to storeglobal_block_info_list
, the global version ofblock_info
; instead, it storeslocal_block_info_list
.PreconditionerList
no longer requiresdistributor_selector
as an input argument becauseDistributor
already performs this task.Differential Revision: D64708506