A few fixes to our Examples #300

emersodb · 2024-11-29T21:52:44Z

PR Type

Fix

Short Description

Clickup Ticket(s): N/A

I went through all of our examples to make sure they run correctly. Generally everything still works great. There were just a few small bugs that I patched here.

Most of the changes in this PR are just me moving our example servers to not accept failures. That way we don't have zombie processes for anyone running the examples if something weird happens.

I also dropped a few print statements and moved a few files.

Tests Added

N/A

…l servers, dropping some print statements

emersodb · 2024-11-29T21:53:37Z

examples/assets/fed_eval_example/best_checkpoint_fczjmljm.pkl

I moved these to our examples/assets folder instead of the top level assets folder.

emersodb · 2024-11-29T21:57:24Z

examples/warm_up_example/fedavg_warm_up/README.md

@@ -6,10 +6,6 @@ The server has some custom metrics aggregation and uses Federated Averaging as i

 As this is a warm-up training for consecutive runs with different Federated Learning (FL) algorithms, it is crucial to set a fixed seed for both clients and the server to ensure uniformity in random data points across these runs. Therefore, we make sure to set a fixed seed for these consecutive runs in both the `client.py` and `server.py` files. Additionally, it is important to establish a checkpointing strategy for the clients using their randomly generated unique client names. This allows us to load each client's warmed-up model from this example in further instances. In this particular scenario, we set the checkpointing strategy to save the latest model. This ensures that we can load the trained local model for each client from this example in subsequent runs as a warmed-up model.

-### Weights and Biases Reporting


Dropping this part, since it's not really necessary for this example

emersodb · 2024-11-29T21:57:36Z

examples/warm_up_example/fedavg_warm_up/config.yaml

@@ -5,11 +5,3 @@ n_server_rounds: 2 # The number of rounds to run FL
 n_clients: 3 # The number of clients in the FL experiment
 local_epochs: 1 # The number of epochs to complete for client
 batch_size: 128 # The batch size for client training
-


Dropping this part, since it's not really necessary for this example

emersodb · 2024-11-29T21:57:44Z

examples/warm_up_example/warmed_up_fedprox/README.md

@@ -6,10 +6,6 @@ The server has some custom metrics aggregation and uses FedProx as its server-si

 After the warm-up training, clients can load their warmed-up models and continue training with the FedProx algorithm. To maintain consistency in the data loader between both runs, it is crucial to set a fixed seed for both clients and the server, ensuring uniformity in random data points across consecutive runs. Therefore, we ensure a fixed seed is set for these consecutive runs in both the `client.py` and `server.py` files. Additionally, to load the warmed-up models, it's important provide the path to the pretrained models based on client's unique name, ensuring that we can load the trained local model for each client from the previous example as a warmed-up model. Since models in the two runs can be different, loading weights from the pretrained model requires providing a mapping between the pretrained model and the model used in FL training. This mapping is accomplished through the `weights_mapping.json` file, which contains the names of the pretrained model's layers and the corresponding names of the layers in the model used in FL training.

-### Weights and Biases Reporting


Dropping this part, since it's not really necessary for this example

emersodb · 2024-11-29T21:57:50Z

examples/warm_up_example/warmed_up_fedprox/config.yaml

@@ -14,11 +14,3 @@ proximal_weight_patience : 5 # The number of rounds to wait before increasing or
 n_clients: 3 # The number of clients in the FL experiment
 local_epochs: 1 # The number of epochs to complete for client
 batch_size: 128 # The batch size for client training
-


Dropping this part, since it's not really necessary for this example

emersodb · 2024-11-29T21:58:34Z

examples/warm_up_example/fedavg_warm_up/client.py

@@ -30,9 +30,11 @@ def __init__(
        metrics: Sequence[Metric],
        device: torch.device,
        checkpoint_dir: str,
+        client_name: str,


Making this an argument you pass is easier to manage than leaving it up to the generate hash and trying to match things.

emersodb · 2024-11-29T21:59:23Z

examples/warm_up_example/warmed_up_fedprox/client.py

@@ -29,19 +28,17 @@ def __init__(
        data_path: Path,
        metrics: Sequence[Metric],
        device: torch.device,
-        pretrained_model_dir: Path,
+        pretrained_model_path: Path,


Moving to just specifying the model path rather than trying to infer it based on the client name, which can not longer be fixed by the random seed (they use UUIDs not under the hood).

emersodb · 2024-11-29T21:59:36Z

examples/warm_up_example/warmed_up_fenda/client.py

@@ -31,7 +30,7 @@ def __init__(
        data_path: Path,
        metrics: Sequence[Metric],
        device: torch.device,
-        pretrained_model_dir: Path,
+        pretrained_model_path: Path,


Moving to just specifying the model path rather than trying to infer it based on the client name, which can not longer be fixed by the random seed (they use UUIDs not under the hood).

emersodb · 2024-11-29T22:00:16Z

fl4health/strategies/basic_fedavg.py

@@ -132,6 +132,8 @@ def configure_fit(
            if self.on_fit_config_fn is not None:
                # Custom fit config function provided
                config = self.on_fit_config_fn(server_round)
+            else:


In a few examples, we don't specify a config, but we assume that current_server_round is always present. So this (and those below) ensure that it is in there.

emersodb · 2024-11-29T22:00:31Z

fl4health/utils/dataset.py

@@ -72,7 +72,7 @@ def __init__(
        transform: Callable | None = None,
        target_transform: Callable | None = None,
    ) -> None:
-        assert targets is not None, "SslTensorDataset targets must be None"
+        assert targets is None, "SslTensorDataset targets must be None"


This was a bug. See the error message.

emersodb · 2024-11-29T22:00:53Z

tests/smoke_tests/feature_alignment_config.yaml

For some reason this was in the smoke tests folder, but we don't run the example in the smokes.

…mples_12_2024

scarere

I'll let other people who are more knowledgeable on the examples approve this. I'm familiar mainly with the nnunet example and it seems like no major changes were made there

jewelltaylor

Looks good to me!

…mples_12_2024

Fixing some run issues with the examples, turning off failures for al…

e657116

…l servers, dropping some print statements

emersodb requested review from lotif, fatemetkl, jewelltaylor, scarere and sanaAyrml November 29, 2024 21:52

emersodb commented Nov 29, 2024

View reviewed changes

Small reversal of a change

20b3705

emersodb commented Nov 29, 2024

View reviewed changes

emersodb marked this pull request as ready for review November 29, 2024 22:01

emersodb added 2 commits November 29, 2024 17:02

Small fix

c42882b

Merge branch 'dbe/migrate_some_mypy_typing' into dbe/test_run_all_exa…

7e8d547

…mples_12_2024

scarere reviewed Dec 2, 2024

View reviewed changes

jewelltaylor approved these changes Dec 10, 2024

View reviewed changes

emersodb added 2 commits January 6, 2025 15:43

Merge branch 'dbe/migrate_some_mypy_typing' into dbe/test_run_all_exa…

cf12534

…mples_12_2024

Merge branch 'dbe/migrate_some_mypy_typing' into dbe/test_run_all_exa…

32e8170

…mples_12_2024

Base automatically changed from dbe/migrate_some_mypy_typing to main January 8, 2025 17:42

Merge branch 'main' into dbe/test_run_all_examples_12_2024

8831cea

emersodb merged commit 4f3cefd into main Jan 8, 2025
6 checks passed

emersodb deleted the dbe/test_run_all_examples_12_2024 branch January 8, 2025 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few fixes to our Examples #300

A few fixes to our Examples #300

emersodb commented Nov 29, 2024

emersodb Nov 29, 2024

emersodb Nov 29, 2024

emersodb Nov 29, 2024

emersodb Nov 29, 2024

emersodb Nov 29, 2024

emersodb Nov 29, 2024

emersodb Nov 29, 2024

emersodb Nov 29, 2024

emersodb Nov 29, 2024

emersodb Nov 29, 2024

emersodb Nov 29, 2024

scarere left a comment

jewelltaylor left a comment

		@@ -6,10 +6,6 @@ The server has some custom metrics aggregation and uses Federated Averaging as i

		As this is a warm-up training for consecutive runs with different Federated Learning (FL) algorithms, it is crucial to set a fixed seed for both clients and the server to ensure uniformity in random data points across these runs. Therefore, we make sure to set a fixed seed for these consecutive runs in both the `client.py` and `server.py` files. Additionally, it is important to establish a checkpointing strategy for the clients using their randomly generated unique client names. This allows us to load each client's warmed-up model from this example in further instances. In this particular scenario, we set the checkpointing strategy to save the latest model. This ensures that we can load the trained local model for each client from this example in subsequent runs as a warmed-up model.

		### Weights and Biases Reporting

		@@ -6,10 +6,6 @@ The server has some custom metrics aggregation and uses FedProx as its server-si

		After the warm-up training, clients can load their warmed-up models and continue training with the FedProx algorithm. To maintain consistency in the data loader between both runs, it is crucial to set a fixed seed for both clients and the server, ensuring uniformity in random data points across consecutive runs. Therefore, we ensure a fixed seed is set for these consecutive runs in both the `client.py` and `server.py` files. Additionally, to load the warmed-up models, it's important provide the path to the pretrained models based on client's unique name, ensuring that we can load the trained local model for each client from the previous example as a warmed-up model. Since models in the two runs can be different, loading weights from the pretrained model requires providing a mapping between the pretrained model and the model used in FL training. This mapping is accomplished through the `weights_mapping.json` file, which contains the names of the pretrained model's layers and the corresponding names of the layers in the model used in FL training.

		### Weights and Biases Reporting

A few fixes to our Examples #300

A few fixes to our Examples #300

Conversation

emersodb commented Nov 29, 2024

PR Type

Short Description

Tests Added

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scarere left a comment

Choose a reason for hiding this comment

jewelltaylor left a comment

Choose a reason for hiding this comment