diff --git a/federated_learning/nvflare/README.md b/federated_learning/nvflare/README.md
deleted file mode 100644
index d8fbfe869c..0000000000
--- a/federated_learning/nvflare/README.md
+++ /dev/null
@@ -1,7 +0,0 @@
-**Federated learning with [NVFlare](./federated_learning/nvflare)**
-
-The examples here show how to train federated learning models with [NVFlare](https://pypi.org/project/nvflare/) and MONAI-based trainers.
-
-1. [nvflare_example](./nvflare_example/README.md) shows how to run NVFlare with MONAI on a local machine to simulate an FL setting (server and client communicate over localhost). It also shows how to run a simulated FL experiment completely automated using the admin API. To streamline the experimentation, we have already prepared startup kits for up to 8 clients in this tutorial.
-
-2. [nvflare_example_docker](./nvflare_example_docker/README.md) provides further details on running FL with MONAI and NVFlare using docker containers for the server and each client for easier real-world deployment.
diff --git a/federated_learning/nvflare/nvflare_example/.gitignore b/federated_learning/nvflare/nvflare_example/.gitignore
deleted file mode 100644
index 8447e9a17e..0000000000
--- a/federated_learning/nvflare/nvflare_example/.gitignore
+++ /dev/null
@@ -1,8 +0,0 @@
-# virtual environments
-nvflare_monai
-# workspace
-fl_workspace
-# data
-*.nii.gz
-# pycharm
-.idea
diff --git a/federated_learning/nvflare/nvflare_example/README.md b/federated_learning/nvflare/nvflare_example/README.md
deleted file mode 100644
index 28ad39fe3c..0000000000
--- a/federated_learning/nvflare/nvflare_example/README.md
+++ /dev/null
@@ -1,186 +0,0 @@
-# Federated Learning with MONAI using NVFlare (without docker)
-The purpose of this tutorial is to show how to run [NVFlare](https://pypi.org/project/nvflare) with MONAI on a local machine to simulate a FL setting (server and client communicate over localhost).
-It is based on the [tutorial](../nvflare_example_docker) showing how to run FL with MONAI and NVFlare which using a docker container for the server and each client.
-
-## Environment setup
-(If needed) install pip and virtualenv (on macOS and Linux):
-```
-python3 -m pip install --user --upgrade pip
-python3 -m pip install --user virtualenv
-```
-(If needed) make all shell scripts executable using
-```
-find . -name ".sh" -exec chmod +x {} \;
-```
-initialize virtual environment and set the current folder (see `projectpath` in `set_env.sh`).
-```
-source ./virtualenv/set_env.sh
-```
-install required packages
-```
-pip install --upgrade pip
-pip install -r ${projectpath}/virtualenv/requirements.txt
-```
-
-## FL workspace preparation for NVFlare
-NVFlare has a "provision" mechanism to automatically generate the fl workspace, see [here](https://docs.nvidia.com/clara/clara-train-sdk/federated-learning/fl_provisioning_tool.html) for details.
-
-In this example, for convenience, we included a pregenerated workspace supporting up to 8 clients which needs to be extracted.
-```
-unzip ${projectpath}/fl_workspace_pregenerated.zip
-```
-*Note: (Optional)* If you need to modify the fl workspace (changing the number of max clients, client names, etc.), please follow the instructions [here](https://docs.nvidia.com/clara/clara-train-sdk/federated-learning/fl_provisioning_tool.html). We included the sample project.yml and authz_config.json files used for generating the 8-client workspace under `${projectpath}/fl_utils/workspace_gen`. After modification, the provisioning tool can be run as: `provision -p project.yml -a authz_config.json`
-
-## Example task - spleen segmentation with MONAI
-In this example, we used spleen segmentation task with a MONAI-based client trainer under `${projectpath}/spleen_example`
-### Download the data
-Download the Spleen segmentation task dataset from http://medicaldecathlon.com.
-```
-${projectpath}/spleen_example/data/download_dataset.sh
-```
-This will create a `${projectpath}/data` folder containing the dataset and pre-assigned 8-client datalists.
-
-## Run federated learning
-Two steps for running the federated learning using NVFlare+MONAI:
-1. start the server, clients, and admin under NVFlare workspace
-2. start the actual training process with MONAI implementation
-### Start server and clients
-To start the server and clients, run the following script (example with 2 clients).
-```
-export n_clients=2
-${projectpath}/fl_utils/fl_run/start_fl.sh ${n_clients}
-```
-*Note:* Currently, `start_fl.sh` will run the clients on all available GPUs. For further control, modify `export CUDA_VISIBLE_DEVICES` command in `start_fl.sh` to set which GPU a client should run on. Note that multiple clients can run on a single GPU as long as the memory is sufficient.
-
-### Start admin client
-In new terminal, activate environment again
-```
-source ./virtualenv/set_env.sh
-```
-Then, start admin client
-```
-${projectpath}/fl_workspace/admin/startup/fl_admin.sh
-```
-*Note:* The user name is `admin@nvidia.com`.
-
-Use the admin client to control the FL process:
-
-(Optional) Check the server status
-```
-> check_status server
-```
-Expected output
-```
-FL run number has not been set.
-FL server status: training not started
-Registered clients: 2
--------------------------------------------------------------------------------------------------
-| CLIENT NAME | TOKEN | LAST ACCEPTED ROUND | CONTRIBUTION COUNT |
--------------------------------------------------------------------------------------------------
-| client2 | 9d2c2d14-cefb-497d-bf13-042dd3e7965f | | 0 |
-| client1 | fd9a8872-2dba-4e25-829e-db0a524a66d6 | | 0 |
--------------------------------------------------------------------------------------------------
-
-```
-(Optional) Check the client status
-```
-> check_status client
-```
-Expected output:
-```
-instance:client2 : client name: client2 token: 9d2c2d14-cefb-497d-bf13-042dd3e7965f status: training not started
-instance:client1 : client name: client1 token: fd9a8872-2dba-4e25-829e-db0a524a66d6 status: training not started
-```
-*Note:* For more details about the admin client and its commands, see [here](https://docs.nvidia.com/clara/clara-train-sdk/federated-learning/fl_admin_commands.html).
-
-### Start FL training with spleen_example
-Upload and deploy the training configurations.
-Then in admin,
-```
-> set_run_number 1
-> upload_folder ../../../spleen_example
-> deploy spleen_example server
-> deploy spleen_example client
-```
-*Note:* the upload_folder is expecting the config directory to be given either as absolute path or relative to the `fl_workspace/admin/transfer` folder as shown in the command above.
-
-Inside the server/client terminal, deploy the training configurations that specify the data json for each client
-```
-${projectpath}/fl_utils/fl_run/deploy_train_configs.sh ${n_clients}
-```
-Next, you can start the FL server in the admin terminal and begin training:
-```
-> start server
-> start client
-```
-(Optional) monitor the training progress
-
-Server status:
-```
-> check_status server
-FL run number:1
-FL server status: training started
-run number:1 start round:0 max round:200 current round:1
-min_num_clients:2 max_num_clients:100
-Registered clients: 2
-Total number of clients submitted models for current round: 0
--------------------------------------------------------------------------------------------------
-| CLIENT NAME | TOKEN | LAST ACCEPTED ROUND | CONTRIBUTION COUNT |
--------------------------------------------------------------------------------------------------
-| client2 | 9d2c2d14-cefb-497d-bf13-042dd3e7965f | 0 | 1 |
-| client1 | fd9a8872-2dba-4e25-829e-db0a524a66d6 | 0 | 1 |
--------------------------------------------------------------------------------------------------
-```
-Client status:
-```
-> check_status client
-instance:client2 : client name: client2 token: 9d2c2d14-cefb-497d-bf13-042dd3e7965f status: training started
-instance:client1 : client name: client1 token: fd9a8872-2dba-4e25-829e-db0a524a66d6 status: training started
-
-```
-(Optional) shutdown FL system:
-```
-> shutdown client
-admin@nvidia.com
-> shutdown server
-admin@nvidia.com
-```
-(Optional) clean up previous runs
-```
-${projectpath}/fl_utils/fl_run/clean_up.sh ${n_clients}
-```
-
-## Automate running FL
-Alternatively, the following commands automate the above described steps. It makes use of NVFlare's AdminAPI. The script will automatically start the server and clients, upload the configuration folders and deploy them with the client-specific data list. It will also set the minimum number of clients needed for each global model update depending on the given argument.
-
-*Note:* make sure there is no server or clients running. You can check if a NVFlare process is still running before starting a new experiment via `ps -as | grep nvflare`. Shut them down using the `shutdown` admin commands as described above if there are any.
-
-First, start environment again (if not already activated)
-```
-source ./virtualenv/set_env.sh
-```
-Then, run the FL experiment as above automatically.
-```
-export n_clients=2
-${projectpath}/fl_utils/fl_run_auto/run_fl.sh ${n_clients}
-```
-*Note:* This script will automatically shutdown the server and client in case of an error or misconfiguration. You can check if a NVFlare process is still running before starting a new experiment via `ps -as | grep nvflare`. It is best to not keep old processes running while trying to start a new experiment.
-
-Here, you can also use the admin client as show above to monitor the automatically started FL experiment. Just open a terminal and execute
-```
-source ./virtualenv/set_env.sh
-${projectpath}/fl_workspace/admin/startup/fl_admin.sh
-```
-(username: `admin@nvidia.com`)
-
-## Visualize the training progress
-To visualize the training progress, run tensorboard in the server/client terminal:
-```
-tensorboard --logdir="./" &
-```
-and point your browser to `http://localhost:6006/#scalars`. You should see the performance of the global model to be the same at the beginning of each round, as the clients in this example all share the same validation set.
-
-
-## Further reading
-For more details visit the [NVFlare documentation](https://pypi.org/project/nvflare).
-For more examples using NVFlare, see [here](https://github.com/NVIDIA/clara-train-examples/tree/master/PyTorch/NoteBooks/FL).
diff --git a/federated_learning/nvflare/nvflare_example/fl_utils/fl_run/clean_up.sh b/federated_learning/nvflare/nvflare_example/fl_utils/fl_run/clean_up.sh
deleted file mode 100755
index 335c12da4c..0000000000
--- a/federated_learning/nvflare/nvflare_example/fl_utils/fl_run/clean_up.sh
+++ /dev/null
@@ -1,19 +0,0 @@
-#!/usr/bin/env bash
-
-n_clients=$1
-
-if test -z "$n_clients"
-then
- echo "Please provide the number of clients, e.g. ./clean_up.sh 2"
- exit 1
-fi
-
-rm -rf ${projectpath}/fl_workspace/server/run_*
-rm -rf ${projectpath}/fl_workspace/server/transfer/*
-rm ${projectpath}/fl_workspace/server/*.*
-for id in $(eval echo "{1..$n_clients}")
-do
- rm -rf ${projectpath}/fl_workspace/client${id}/run_*
- rm -rf ${projectpath}/fl_workspace/client${id}/transfer/*
- rm ${projectpath}/fl_workspace/client${id}/*.*
-done
diff --git a/federated_learning/nvflare/nvflare_example/fl_utils/fl_run/deploy_train_configs.sh b/federated_learning/nvflare/nvflare_example/fl_utils/fl_run/deploy_train_configs.sh
deleted file mode 100755
index 1d27e1712f..0000000000
--- a/federated_learning/nvflare/nvflare_example/fl_utils/fl_run/deploy_train_configs.sh
+++ /dev/null
@@ -1,18 +0,0 @@
-#!/usr/bin/env bash
-
-n_clients=$1
-
-if test -z "$n_clients"
-then
- echo "Please provide the number of clients, e.g. ./deploy_train_configs.sh 2"
- exit 1
-fi
-
-run_number=1
-
-for i in $(eval echo "{1..$n_clients}")
-do
- echo "Deploying train config for client${i}"
- cp ${projectpath}/spleen_example/config/config_train_${i}.json \
- ${projectpath}/fl_workspace/client${i}/run_${run_number}/mmar_client${i}/config/config_train.json
-done
diff --git a/federated_learning/nvflare/nvflare_example/fl_utils/fl_run/start_fl.sh b/federated_learning/nvflare/nvflare_example/fl_utils/fl_run/start_fl.sh
deleted file mode 100755
index 5b8071281e..0000000000
--- a/federated_learning/nvflare/nvflare_example/fl_utils/fl_run/start_fl.sh
+++ /dev/null
@@ -1,26 +0,0 @@
-#!/usr/bin/env bash
-
-n_clients=$1
-
-if test -z "$n_clients"
-then
- echo "Please provide the number of clients, e.g. ./start_fl.sh 2"
- exit 1
-fi
-
-n_gpus=$(nvidia-smi --list-gpus | wc -l)
-echo "There are ${n_gpus} GPUs."
-
-# Start server
-echo "Starting server and ${n_clients} clients"
-${projectpath}/fl_workspace/server/startup/start.sh
-sleep 10s
-
-# Start clients
-for i in $(eval echo "{1..$n_clients}")
-do
- gpu_idx=$((${i} % ${n_gpus}))
- echo "Starting client${i} on GPU ${gpu_idx}"
- export CUDA_VISIBLE_DEVICES=${gpu_idx}
- ${projectpath}/fl_workspace/client${i}/startup/start.sh
-done
diff --git a/federated_learning/nvflare/nvflare_example/fl_utils/fl_run_auto/api_utils.py b/federated_learning/nvflare/nvflare_example/fl_utils/fl_run_auto/api_utils.py
deleted file mode 100644
index 33d6568c56..0000000000
--- a/federated_learning/nvflare/nvflare_example/fl_utils/fl_run_auto/api_utils.py
+++ /dev/null
@@ -1,55 +0,0 @@
-import time
-import re
-
-
-def api_command_wrapper(api, command):
- print("\nISSUING COMMAND: {}".format(command))
- reply = api.do_command(command)
- print(reply)
- assert reply['status'] == 'SUCCESS', "command was not successful!"
-
- # check for other errors
- for r in reply['data']:
- if r['type'] == 'string':
- # print(r['data'])
- assert 'error' not in r['data'].lower(), f"there was an error in reply executing: {command}"
- if r['type'] == 'error':
- raise ValueError(f"there was an error executing: {command}")
- return reply
-
-
-def wait_to_complete(api, interval=60):
- fl_is_training = True
- while fl_is_training:
- time.sleep(interval)
- reply = api_command_wrapper(api, "check_status client")
- nr_clients_starting = len([m.start() for m in re.finditer('status: training starting', reply['data'][0]['data'])])
- nr_clients_started = len([m.start() for m in re.finditer('status: training started', reply['data'][0]['data'])])
- nr_clients_crosssiteval = len([m.start() for m in re.finditer('status: cross site validation', reply['data'][0]['data'])])
- print(f'{nr_clients_starting} clients starting training.')
- print(f'{nr_clients_started} clients in training.')
- print(f'{nr_clients_crosssiteval} clients in cross-site validation.')
-
- reply = api_command_wrapper(api, "check_status server")
- if 'status: training stopped' in reply['data'][0]['data']:
- server_is_training = False
- print('Server stopped.')
- else:
- print('Server is training.')
- server_is_training = True
-
- if (~server_is_training) and \
- (nr_clients_started == 0) and \
- (nr_clients_starting == 0) and \
- (nr_clients_crosssiteval == 0):
- fl_is_training = False
- print('FL training & cross-site validation stopped/completed.')
-
- return True
-
-
-def fl_shutdown(api):
- print('Shutting down FL system...')
- api_command_wrapper(api, "shutdown client")
- time.sleep(10)
- api_command_wrapper(api, "shutdown server")
diff --git a/federated_learning/nvflare/nvflare_example/fl_utils/fl_run_auto/run_fl.py b/federated_learning/nvflare/nvflare_example/fl_utils/fl_run_auto/run_fl.py
deleted file mode 100755
index 2d53c57555..0000000000
--- a/federated_learning/nvflare/nvflare_example/fl_utils/fl_run_auto/run_fl.py
+++ /dev/null
@@ -1,134 +0,0 @@
-#!/usr/bin/env python3
-import os
-import argparse
-import time
-import re
-import sys
-import json
-import shutil
-import uuid
-# use source
-#src_root = "/workspace/Code/clara4.0"
-#python_paths = [f"{src_root}/common/dlmed/src",
-# f"{src_root}/train/automl/src",
-# f"{src_root}/flare"]
-#[sys.path.insert(0, item) for item in python_paths]
-
-from dlmed.hci.client.api import AdminAPI
-from api_utils import api_command_wrapper, wait_to_complete, fl_shutdown
-
-
-def create_tmp_config_dir(upload_dir, config):
- tmp_config = str(uuid.uuid4())
- print(f"Creating temporary config from {config} -> {tmp_config}")
- tmp_config_dir = os.path.join(upload_dir, tmp_config) # creat a temp config for this run
- if os.path.isdir(tmp_config_dir):
- shutil.rmtree(tmp_config_dir)
- shutil.copytree(os.path.join(upload_dir, config), tmp_config_dir)
-
- return tmp_config, tmp_config_dir
-
-
-def main():
- parser = argparse.ArgumentParser()
- parser.add_argument('--nr_clients', type=int, default=2, help="Minimum number of clients.")
- parser.add_argument('--run_number', type=int, default=1, help="FL run number.")
- parser.add_argument('--config', type=str, default='spleen_example', help="directory name with training configs.")
- parser.add_argument('--admin_dir', type=str, default='./admin', help="Path to admin directory.")
-
- args = parser.parse_args()
-
- host = 'localhost'
- port = 8003
-
- # Set up certificate names and admin folders
- ca_cert = os.path.join(args.admin_dir, 'startup', 'rootCA.pem')
- client_cert = os.path.join(args.admin_dir, 'startup', 'client.crt')
- client_key = os.path.join(args.admin_dir, 'startup', 'client.key')
- upload_dir = os.path.join(args.admin_dir, 'transfer')
- download_dir = os.path.join(args.admin_dir, 'download')
- if not os.path.isdir(download_dir):
- os.makedirs(download_dir)
-
- assert os.path.isdir(args.admin_dir), f"admin directory does not exist at {args.admin_dir}"
- assert os.path.isfile(ca_cert), f"rootCA.pem does not exist at {ca_cert}"
- assert os.path.isfile(client_cert), f"client.crt does not exist at {client_cert}"
- assert os.path.isfile(client_key), f"client.key does not exist at {client_key}"
-
- # Connect with admin client
- api = AdminAPI(
- host=host,
- port=port,
- ca_cert=ca_cert,
- client_cert=client_cert,
- client_key=client_key,
- upload_dir=upload_dir,
- download_dir=download_dir,
- debug=False
- )
- reply = api.login(username="admin@nvidia.com")
- for k in reply.keys():
- assert "error" not in reply[k].lower(), f"Login not successful with {reply}"
-
- # Execute commands
- api_command_wrapper(api, "set_timeout 30")
-
- # create a temporary config for editing
- try:
- tmp_config, tmp_config_dir = create_tmp_config_dir(upload_dir, args.config)
- except BaseException as e:
- print(f"There was an exception {e}. Shutting down clients and server.")
- fl_shutdown(api)
- sys.exit(1)
-
- # update server config to set min_num_clients:
- server_config_file = os.path.join(tmp_config_dir, 'config', 'config_fed_server.json')
- with open(server_config_file, 'r') as f:
- server_config = json.load(f)
- server_config['servers'][0]['min_num_clients'] = args.nr_clients
- with open(server_config_file, 'w') as f:
- json.dump(server_config, f, indent=4)
-
- api_command_wrapper(api, "check_status server")
-
- api_command_wrapper(api, f"set_run_number {args.run_number}")
-
- api_command_wrapper(api, f"upload_folder {tmp_config}")
-
- api_command_wrapper(api, f"deploy {tmp_config} server")
-
- api_command_wrapper(api, "start server")
-
- time.sleep(10)
- # deploy clients
- for client_id in range(1, args.nr_clients+1):
- # update client's train config to set seed:
- ref_train_config_file = os.path.join(upload_dir, args.config, 'config', f'config_train_{client_id}.json')
- train_config_file = os.path.join(tmp_config_dir, 'config', f'config_train.json')
-
- print(f"Deploying train config for client{client_id}")
- shutil.copyfile(ref_train_config_file, train_config_file)
-
- # upload & deploy on client
- api_command_wrapper(api, f"upload_folder {tmp_config}")
- api_command_wrapper(api, f"deploy {tmp_config} client client{client_id}")
-
- api_command_wrapper(api, "start client")
-
- # delete temporary config
- if os.path.isdir(tmp_config_dir):
- shutil.rmtree(tmp_config_dir)
-
- # Keep checking the server and client statuses until FL training is complete.
- wait_to_complete(api, interval=30)
-
- # shutdown
- fl_shutdown(api)
-
- # log out
- print('Admin logging out.')
- api.logout()
-
-
-if __name__ == "__main__":
- main()
diff --git a/federated_learning/nvflare/nvflare_example/fl_utils/fl_run_auto/run_fl.sh b/federated_learning/nvflare/nvflare_example/fl_utils/fl_run_auto/run_fl.sh
deleted file mode 100755
index 3cc9f1af04..0000000000
--- a/federated_learning/nvflare/nvflare_example/fl_utils/fl_run_auto/run_fl.sh
+++ /dev/null
@@ -1,20 +0,0 @@
-#!/usr/bin/env bash
-
-n_clients=$1
-
-if test -z "$n_clients"
-then
- echo "Please provide the number of clients, e.g. ./run_fl.sh 2"
- exit 1
-fi
-
-## Start server and clients ##
-${projectpath}/fl_utils/fl_run/start_fl.sh ${n_clients}
-echo "Waiting for server and clients to start..."
-sleep 30s
-
-## Run FL training ##
-echo "Run FL training"
-python3 ${projectpath}/fl_utils/fl_run_auto/run_fl.py --nr_clients ${n_clients} --run_number 1 \
- --config "../../../spleen_example" \
- --admin_dir "${projectpath}/fl_workspace/admin" &
diff --git a/federated_learning/nvflare/nvflare_example/fl_utils/workspace_gen/authz_config.json b/federated_learning/nvflare/nvflare_example/fl_utils/workspace_gen/authz_config.json
deleted file mode 100644
index 12e8d8408d..0000000000
--- a/federated_learning/nvflare/nvflare_example/fl_utils/workspace_gen/authz_config.json
+++ /dev/null
@@ -1,105 +0,0 @@
-{
- "version": "1.0",
-
- "roles": {
- "super": "super user of system",
- "lead_researcher": "lead researcher of the study",
- "site_researcher": "site researcher of the study",
- "site_it": "site IT of the study",
- "lead_it": "lead IT of the study"
- },
- "groups": {
- "relaxed": {
- "desc": "the org group with relaxed policies",
- "rules": {
- "allow_byoc": true,
- "allow_custom_datalist": true
- }
- },
- "strict": {
- "desc": "the org group with strict policies",
- "rules": {
- "allow_byoc": false,
- "allow_custom_datalist": false
- }
- },
- "general": {
- "desc": "general group user rights",
- "role_rights": {
- "super": {
- "operate_all": true,
- "view_all": true,
- "train_all": true
- },
- "lead_researcher": {
- "train_all": true,
- "view_all": true
- },
- "site_researcher": {
- "train_self": true,
- "view_self": true
- },
- "lead_it": {
- "operate_all": true,
- "view_all": true
- },
- "site_it": {
- "operate_self": true,
- "view_self": true
- }
- }
- }
- },
- "users": {
- "admin@nvidia.com": {
- "org": "nvidia",
- "roles": ["super"]
- },
- "client1": {
- "org": "nvidia",
- "roles": ["lead_it", "site_researcher"]
- },
- "client2": {
- "org": "nvidia",
- "roles": ["lead_it", "site_researcher"]
- },
- "client3": {
- "org": "nvidia",
- "roles": ["lead_it", "site_researcher"]
- },
- "client4": {
- "org": "nvidia",
- "roles": ["lead_it", "site_researcher"]
- },
- "client5": {
- "org": "nvidia",
- "roles": ["lead_it", "site_researcher"]
- },
- "client6": {
- "org": "nvidia",
- "roles": ["lead_it", "site_researcher"]
- },
- "client7": {
- "org": "nvidia",
- "roles": ["lead_it", "site_researcher"]
- },
- "client8": {
- "org": "nvidia",
- "roles": ["lead_it", "site_researcher"]
- }
- },
- "orgs": {
- "nvidia": ["general", "relaxed"]
- },
- "sites": {
- "client1": "nvidia",
- "client2": "nvidia",
- "client3": "nvidia",
- "client4": "nvidia",
- "client5": "nvidia",
- "client6": "nvidia",
- "client7": "nvidia",
- "client8": "nvidia",
- "server": "nvidia"
- }
-}
diff --git a/federated_learning/nvflare/nvflare_example/fl_utils/workspace_gen/project.yml b/federated_learning/nvflare/nvflare_example/fl_utils/workspace_gen/project.yml
deleted file mode 100644
index d2896f6106..0000000000
--- a/federated_learning/nvflare/nvflare_example/fl_utils/workspace_gen/project.yml
+++ /dev/null
@@ -1,58 +0,0 @@
-# org is to describe each participant's organization and is optional
-
-# the name of this project
-name: fl
-
-config_folder: config
-
-server:
- auth: true
-
- org: nvidia
-
- # set cn to the server's fully qualified domain name
- # never set it to example.com
- cn: localhost
-
- # replace the number with that all clients can reach out to, and that the server can open to listen to
- fed_learn_port: 8002
-
- # again, replace the number with that all clients can reach out to, and that the server can open to listen to
- # the value must be different from fed_learn_port
- admin_port: 8003
-
- # admin_storage is the mmar upload folder name on the server
- admin_storage: transfer
-
- min_num_clients: 1
- max_num_clients: 100
-
- # don't use a config_validator
- config_validator:
-
-# The following values under fl_clients and admin_clients are for demo purpose only.
-# Please change them according to the information of actual project.
-fl_clients:
- # client_name must be unique
- # email is optional
- - org: nvidia
- client_name: client1
- - org: nvidia
- client_name: client2
- - org: nvidia
- client_name: client3
- - org: nvidia
- client_name: client4
- - org: nvidia
- client_name: client5
- - org: nvidia
- client_name: client6
- - org: nvidia
- client_name: client7
- - org: nvidia
- client_name: client8
-
-admin_clients:
- # email is the user name for admin authentication. Hence, it must be unique within the project
- - org: nvidia
- email: admin@nvidia.com
diff --git a/federated_learning/nvflare/nvflare_example/fl_workspace_pregenerated.zip b/federated_learning/nvflare/nvflare_example/fl_workspace_pregenerated.zip
deleted file mode 100644
index e3207599e2..0000000000
Binary files a/federated_learning/nvflare/nvflare_example/fl_workspace_pregenerated.zip and /dev/null differ
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_fed_client.json b/federated_learning/nvflare/nvflare_example/spleen_example/config/config_fed_client.json
deleted file mode 100644
index ed09a110a3..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_fed_client.json
+++ /dev/null
@@ -1,17 +0,0 @@
-{
- "format_version": 1,
-
- "client": {
- "outbound_filters": [
- ],
- "inbound_filters": [
- ]
- },
- "client_trainer": {
- "path": "monai_trainer.MONAITrainer",
- "args": {
- "aggregation_epochs": 5,
- "aggregation_iters": 0
- }
- }
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_fed_server.json b/federated_learning/nvflare/nvflare_example/spleen_example/config/config_fed_server.json
deleted file mode 100644
index 28387ae818..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_fed_server.json
+++ /dev/null
@@ -1,61 +0,0 @@
-{
- "format_version": 1,
-
- "servers": [
- {
- "min_num_clients": 2,
- "max_num_clients": 100,
- "wait_after_min_clients": 10,
- "heart_beat_timeout": 600,
- "start_round": 0,
- "num_rounds": 200
- }
- ],
- "aggregator":
- {
- "name": "AccumulateWeightedAggregator",
- "args": {
- "exclude_vars": "dummy",
- "aggregation_weights":
- {
- "client1": 1,
- "client2": 1,
- "client3": 1,
- "client4": 1
- }
- }
- },
- "outbound_filters": [
- ],
- "inbound_filters": [
- ],
- "model_persistor":
- {
- "name": "PTFileModelPersistor",
- "args": {
- "exclude_vars": "dummy",
- "model": {
- "path": "monai.networks.nets.unet.UNet",
- "args": {
- "dimensions": 3,
- "in_channels": 1,
- "out_channels": 2,
- "channels": [16, 32, 64, 128, 256],
- "strides": [2, 2, 2, 2],
- "num_res_units": 2,
- "norm": "batch"
- }
- }
- }
- },
- "shareable_generator": {
- "name": "FullModelShareableGenerator"
- },
- "handlers":
- [
- {
- "name": "IntimeModelSelectionHandler",
- "args": {}
- }
- ]
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_1.json b/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_1.json
deleted file mode 100644
index 7b915f6614..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_1.json
+++ /dev/null
@@ -1,11 +0,0 @@
-{
- "max_epochs": 100,
- "learning_rate": 2e-4,
- "amp": true,
- "use_gpu": true,
- "multi_gpu": false,
- "val_interval": 5,
- "data_list_base_dir": "../../data/Task09_Spleen/",
- "data_list_json_file": "dataset_1.json",
- "ckpt_dir": "models"
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_2.json b/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_2.json
deleted file mode 100644
index 57109fd528..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_2.json
+++ /dev/null
@@ -1,11 +0,0 @@
-{
- "max_epochs": 100,
- "learning_rate": 2e-4,
- "amp": true,
- "use_gpu": true,
- "multi_gpu": false,
- "val_interval": 5,
- "data_list_base_dir": "../../data/Task09_Spleen/",
- "data_list_json_file": "dataset_2.json",
- "ckpt_dir": "models"
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_3.json b/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_3.json
deleted file mode 100644
index a92223d1db..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_3.json
+++ /dev/null
@@ -1,11 +0,0 @@
-{
- "max_epochs": 100,
- "learning_rate": 2e-4,
- "amp": true,
- "use_gpu": true,
- "multi_gpu": false,
- "val_interval": 5,
- "data_list_base_dir": "../../data/Task09_Spleen/",
- "data_list_json_file": "dataset_3.json",
- "ckpt_dir": "models"
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_4.json b/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_4.json
deleted file mode 100644
index b72f5704b7..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_4.json
+++ /dev/null
@@ -1,11 +0,0 @@
-{
- "max_epochs": 100,
- "learning_rate": 2e-4,
- "amp": true,
- "use_gpu": true,
- "multi_gpu": false,
- "val_interval": 5,
- "data_list_base_dir": "../../data/Task09_Spleen/",
- "data_list_json_file": "dataset_4.json",
- "ckpt_dir": "models"
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_5.json b/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_5.json
deleted file mode 100644
index 3721acb219..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_5.json
+++ /dev/null
@@ -1,11 +0,0 @@
-{
- "max_epochs": 100,
- "learning_rate": 2e-4,
- "amp": true,
- "use_gpu": true,
- "multi_gpu": false,
- "val_interval": 5,
- "data_list_base_dir": "../../data/Task09_Spleen/",
- "data_list_json_file": "dataset_5.json",
- "ckpt_dir": "models"
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_6.json b/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_6.json
deleted file mode 100644
index 9d8b00a20d..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_6.json
+++ /dev/null
@@ -1,11 +0,0 @@
-{
- "max_epochs": 100,
- "learning_rate": 2e-4,
- "amp": true,
- "use_gpu": true,
- "multi_gpu": false,
- "val_interval": 5,
- "data_list_base_dir": "../../data/Task09_Spleen/",
- "data_list_json_file": "dataset_6.json",
- "ckpt_dir": "models"
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_7.json b/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_7.json
deleted file mode 100644
index 5fa0ec0ebb..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_7.json
+++ /dev/null
@@ -1,11 +0,0 @@
-{
- "max_epochs": 100,
- "learning_rate": 2e-4,
- "amp": true,
- "use_gpu": true,
- "multi_gpu": false,
- "val_interval": 5,
- "data_list_base_dir": "../../data/Task09_Spleen/",
- "data_list_json_file": "dataset_7.json",
- "ckpt_dir": "models"
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_8.json b/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_8.json
deleted file mode 100644
index 95428d80b1..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/config/config_train_8.json
+++ /dev/null
@@ -1,11 +0,0 @@
-{
- "max_epochs": 100,
- "learning_rate": 2e-4,
- "amp": true,
- "use_gpu": true,
- "multi_gpu": false,
- "val_interval": 5,
- "data_list_base_dir": "../../data/Task09_Spleen/",
- "data_list_json_file": "dataset_8.json",
- "ckpt_dir": "models"
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/custom/monai_trainer.py b/federated_learning/nvflare/nvflare_example/spleen_example/custom/monai_trainer.py
deleted file mode 100644
index 3149c9ecf0..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/custom/monai_trainer.py
+++ /dev/null
@@ -1,183 +0,0 @@
-# Copyright 2020 MONAI Consortium
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-# http://www.apache.org/licenses/LICENSE-2.0
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import logging
-
-import torch.distributed as dist
-from nvflare.apis.event_type import EventType
-from nvflare.apis.fl_constant import FLConstants, ShareableKey
-from nvflare.apis.fl_context import FLContext
-from nvflare.apis.shareable import Shareable
-from nvflare.apis.trainer import Trainer
-from nvflare.common.signal import Signal
-from nvflare.utils.fed_utils import generate_failure
-
-from train_configer import TrainConfiger
-from utils import (
- IterAggregateHandler,
- MONAIModelManager,
- TrainContext,
- get_lr_values,
- set_engine_state,
-)
-
-
-class MONAITrainer(Trainer):
- """
- This class implements a MONAI based trainer that can be used for Federated Learning.
-
- Args:
- aggregation_epochs: the number of training epochs for a round.
- This parameter only works when `aggregation_iters` is 0. Defaults to 1.
- aggregation_iters: the number of training iterations for a round.
- If the value is larger than 0, the trainer will use iteration based aggregation
- rather than epoch based aggregation. Defaults to 0.
-
- """
-
- def __init__(self, aggregation_epochs: int = 1, aggregation_iters: int = 0):
- super().__init__()
- self.aggregation_epochs = aggregation_epochs
- self.aggregation_iters = aggregation_iters
- self.model_manager = MONAIModelManager()
- self.logger = logging.getLogger(self.__class__.__name__)
-
- def _initialize_trainer(self, fl_ctx: FLContext):
- """
- The trainer's initialization function. At the beginning of a FL experiment,
- the train and evaluate engines, as well as train context and FL context
- should be initialized.
- """
- # Initialize train and evaluation engines.
- config_root = fl_ctx.get_prop(FLConstants.TRAIN_ROOT)
- fl_args = fl_ctx.get_prop(FLConstants.ARGS)
-
- conf = TrainConfiger(
- config_root=config_root,
- wf_config_file_name=fl_args.train_config,
- local_rank=fl_args.local_rank,
- )
- conf.configure()
-
- self.train_engine = conf.train_engine
- self.eval_engine = conf.eval_engine
- self.multi_gpu = conf.multi_gpu
-
- # for iterations based aggregation, the train engine should attach
- # the following handler.
- if self.aggregation_iters > 0:
- IterAggregateHandler(interval=self.aggregation_iters).attach(
- self.train_engine
- )
-
- # Instantiate a train context class. This instance is used to
- # save training related information such as current epochs, iterations
- # and the learning rate.
- self.train_ctx = TrainContext()
- self.train_ctx.initial_learning_rate = get_lr_values(
- self.train_engine.optimizer
- )
-
- # Initialize the FL context.
- fl_ctx.set_prop(FLConstants.MY_RANK, self.train_engine.state.rank)
- fl_ctx.set_prop(FLConstants.MODEL_NETWORK, self.train_engine.network)
- fl_ctx.set_prop(FLConstants.MULTI_GPU, self.multi_gpu)
- fl_ctx.set_prop(FLConstants.DEVICE, self.train_engine.state.device)
-
- def handle_event(self, event_type: str, fl_ctx: FLContext):
- """
- This function is an extended function from the super class.
- It is used to perform the handler process based on the
- event_type. At the start point of a FL experiment, necessary
- components should be initialized. At the end of the experiment,
- the running engines should be terminated.
-
- Args:
- event_type: the type of event that will be fired. In MONAITrainer,
- only `START_RUN` and `END_RUN` need to be handled.
- fl_ctx: an `FLContext` object.
-
- """
- if event_type == EventType.START_RUN:
- self._initialize_trainer(fl_ctx)
- elif event_type == EventType.END_RUN:
- try:
- self.train_engine.terminate()
- self.eval_engine.terminate()
- except BaseException as e:
- self.logger.info(f"exception in closing engines {e}")
-
- def train(
- self, shareable: Shareable, fl_ctx: FLContext, abort_signal: Signal
- ) -> Shareable:
- """
- This function is an extended function from the super class.
- As a supervised learning based trainer, the train function will run
- evaluate and train engines based on model weights from `shareable`.
- After fininshing training, a new `Shareable` object will be submitted
- to server for aggregation.
-
- Args:
- shareable: the `Shareable` object acheived from server.
- fl_ctx: the `FLContext` object achieved from server.
- abort_signal: if triggered, the training will be aborted.
-
- Returns:
- a new `Shareable` object to be submitted to server for aggregation.
- """
- # check abort signal
- self.logger.info(f"MonaiTrainer abort signal: {abort_signal.triggered}")
- if abort_signal.triggered:
- self.finalize(fl_ctx)
- shareable = generate_failure(fl_ctx=fl_ctx, reason="abort signal triggered")
- return shareable
- # achieve model weights
- if self.train_engine.state.rank == 0:
- model_weights = shareable[ShareableKey.MODEL_WEIGHTS]
- # load achieved model weights for the network (saved in fl_ctx)
- self.model_manager.assign_current_model(model_weights, fl_ctx)
- # for multi-gpu training, only rank 0 process will achieve the model weights.
- # Thus, it should be broadcasted to all processes.
- if self.multi_gpu:
- net = fl_ctx.get_prop(FLConstants.MODEL_NETWORK)
- for _, v in net.state_dict().items():
- dist.broadcast(v, src=0)
-
- # set engine state parameters, like number of training epochs/iterations.
- self.train_engine = set_engine_state(
- self.train_engine, self.aggregation_epochs, self.aggregation_iters
- )
- # get current epoch and iteration when a round starts
- self.train_ctx.epoch_of_start_time = self.train_engine.state.epoch
- self.train_ctx.iter_of_start_time = self.train_engine.state.iteration
- # execute validation at the beginning of every round
- self.eval_engine.run(self.train_engine.state.epoch + 1)
- self.train_ctx.fl_init_validation_metric = self.eval_engine.state.metrics.get(
- self.eval_engine.state.key_metric_name, -1
- )
- self.train_engine.run()
- # calculate current iteration and epoch data after training
- self.train_ctx.current_iters = (
- self.train_engine.state.iteration - self.train_ctx.iter_of_start_time
- )
- self.train_ctx.current_executed_epochs = (
- self.train_engine.state.epoch - self.train_ctx.epoch_of_start_time
- )
- # create a new `Shareable` object
- if self.train_engine.state.rank == 0:
- self.train_ctx.set_context(self.train_engine, self.eval_engine)
- shareable = self.model_manager.generate_shareable(
- self.train_ctx,
- fl_ctx,
- )
- # update train context into FL context.
- fl_ctx.set_prop(FLConstants.TRAIN_CONTEXT, self.train_ctx)
- return shareable
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/custom/utils.py b/federated_learning/nvflare/nvflare_example/spleen_example/custom/utils.py
deleted file mode 100644
index d0c1252a19..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/custom/utils.py
+++ /dev/null
@@ -1,187 +0,0 @@
-# Copyright 2020 MONAI Consortium
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-# http://www.apache.org/licenses/LICENSE-2.0
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import logging
-import math
-from typing import Dict
-
-import numpy as np
-import torch
-from ignite.engine import Engine, Events
-from nvflare.apis.fl_constant import FLConstants, ShareableKey, ShareableValue
-from nvflare.apis.fl_context import FLContext
-from nvflare.apis.shareable import Shareable
-from torch.optim import Optimizer
-
-
-class TrainContext:
- """
- Train Context class contains training related parameters/variables,
- such as learning rate, number of gpus and current training iterations.
- """
-
- def __init__(self):
- self.initial_learning_rate = 0
- self.current_learning_rate = 0
- self.current_iters = 0
- self.current_executed_epochs = 0
- self.fl_init_validation_metric = 0
- self.epoch_of_start_time = 0
- self.iter_of_start_time = 0
-
- def set_context(self, train_engine: Engine, eval_engine: Engine):
- """
- This function is usually called after train engine has finished running.
- The variables that updated here will add to the shareable object and then
- submit to server. You can add other variables in this function if they are
- needed to be shared.
- """
- self.current_learning_rate = get_lr_values(train_engine.optimizer)
-
-
-class MONAIModelManager:
- def __init__(self):
- self.logger = logging.getLogger("ModelShareableManager")
-
- def assign_current_model(
- self, model_weights: Dict[str, np.ndarray], fl_ctx: FLContext
- ):
- """
- This function is used to load provided weights for the network saved
- in FL context.
- Before loading weights, tensors might need to be reshaped to support HE for secure aggregation.
- More info of HE:
- https://github.com/NVIDIA/clara-train-examples/blob/master/PyTorch/NoteBooks/FL/Homomorphic_Encryption.ipynb
-
- """
- net = fl_ctx.get_prop(FLConstants.MODEL_NETWORK)
- if fl_ctx.get_prop(FLConstants.MULTI_GPU):
- net = net.module
-
- local_var_dict = net.state_dict()
- model_keys = model_weights.keys()
- for var_name in local_var_dict:
- if var_name in model_keys:
- weights = model_weights[var_name]
- try:
- local_var_dict[var_name] = torch.as_tensor(np.reshape(weights, local_var_dict[var_name].shape))
- except Exception as e:
- raise ValueError(
- "Convert weight from {} failed with error: {}".format(
- var_name, str(e)
- )
- )
-
- net.load_state_dict(local_var_dict)
-
- def extract_model(self, fl_ctx: FLContext) -> Dict[str, np.ndarray]:
- """
- This function is used to extract weights of the network saved in FL
- context.
- The extracted weights will be converted into a numpy array based dict.
- """
- net = fl_ctx.get_prop(FLConstants.MODEL_NETWORK)
- if fl_ctx.get_prop(FLConstants.MULTI_GPU):
- net = net.module
- local_state_dict = net.state_dict()
- local_model_dict = {}
- for var_name in local_state_dict:
- try:
- local_model_dict[var_name] = local_state_dict[var_name].cpu().numpy()
- except Exception as e:
- raise ValueError(
- "Convert weight from {} failed with error: {}".format(
- var_name, str(e)
- )
- )
-
- return local_model_dict
-
- def generate_shareable(self, train_ctx: TrainContext, fl_ctx: FLContext):
- """
- This function is used to generate a shareable instance according to
- the train context and FL context.
- A Shareable instance can not only contain model weights, but also
- some additional information that clients want to share. These information
- should be added into ShareableKey.META.
- """
-
- # input the initial metric into meta data. You can also add other parameters.
- meta_data = {}
- meta_data[FLConstants.INITIAL_METRICS] = train_ctx.fl_init_validation_metric
- meta_data[FLConstants.CURRENT_LEARNING_RATE] = train_ctx.current_learning_rate
- meta_data[FLConstants.NUM_STEPS_CURRENT_ROUND] = train_ctx.current_iters
-
- shareable = Shareable()
- shareable[ShareableKey.TYPE] = ShareableValue.TYPE_WEIGHT_DIFF
- shareable[ShareableKey.DATA_TYPE] = ShareableValue.DATA_TYPE_UNENCRYPTED
- shareable[ShareableKey.MODEL_WEIGHTS] = self.extract_model(fl_ctx)
- shareable[ShareableKey.META] = meta_data
-
- return shareable
-
-
-class IterAggregateHandler:
- """
- This class implements an event handler for iteration based aggregation.
- """
-
- def __init__(self, interval: int):
- self.interval = interval
-
- def attach(self, engine: Engine):
- engine.add_event_handler(Events.ITERATION_COMPLETED(every=self.interval), self)
-
- def __call__(self, engine: Engine):
- engine.terminate()
- # save current iteration for next round
- engine.state.dataloader_iter = engine._dataloader_iter
- if engine.state.iteration % engine.state.epoch_length == 0:
- # if current iteration is end of 1 epoch, manually trigger epoch completed event
- engine._fire_event(Events.EPOCH_COMPLETED)
-
-
-def get_lr_values(optimizer: Optimizer):
- """
- This function is used to get the learning rates of the optimizer.
- """
- return [group["lr"] for group in optimizer.state_dict()["param_groups"]]
-
-
-def set_engine_state(engine: Engine, aggregation_epochs: int, aggregation_iters: int):
- """
- This function is used to set the engine's state parameters according to
- the aggregation ways (iteration based or epoch based).
-
- Args:
- engine: the engine that to be processed.
- aggregation_epochs: the number of epochs before aggregation.
- This parameter only works when `aggregation_iters` is 0.
- aggregation_iters: the number of iterations before aggregation.
- If the value is larger than 0, the engine will use iteration based aggregation
- rather than epoch based aggregation.
-
- """
- if aggregation_iters > 0:
- next_aggr_iter = engine.state.iteration + aggregation_iters
- engine.state.max_epochs = math.ceil(next_aggr_iter / engine.state.epoch_length)
- previous_iter = engine.state.iteration % engine.state.epoch_length
- if engine.state.iteration > 0 and previous_iter != 0:
- # init to continue from previous epoch
- engine.state.epoch -= 1
- if hasattr(engine.state, "dataloader_iter"):
- # initialize to continue from previous iteration
- engine._init_iter.append(previous_iter)
- engine._dataloader_iter = engine.state.dataloader_iter
- else:
- engine.state.max_epochs = engine.state.epoch + aggregation_epochs
-
- return engine
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_1.json b/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_1.json
deleted file mode 100644
index 3aecb2ff62..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_1.json
+++ /dev/null
@@ -1,46 +0,0 @@
-{
- "training": [{
- "image": "./imagesTr/spleen_40.nii.gz",
- "label": "./labelsTr/spleen_40.nii.gz"
- }, {
- "image": "./imagesTr/spleen_44.nii.gz",
- "label": "./labelsTr/spleen_44.nii.gz"
- }, {
- "image": "./imagesTr/spleen_38.nii.gz",
- "label": "./labelsTr/spleen_38.nii.gz"
- }, {
- "image": "./imagesTr/spleen_25.nii.gz",
- "label": "./labelsTr/spleen_25.nii.gz"
- }, {
- "image": "./imagesTr/spleen_13.nii.gz",
- "label": "./labelsTr/spleen_13.nii.gz"
- }, {
- "image": "./imagesTr/spleen_6.nii.gz",
- "label": "./labelsTr/spleen_6.nii.gz"
- }, {
- "image": "./imagesTr/spleen_19.nii.gz",
- "label": "./labelsTr/spleen_19.nii.gz"
- }, {
- "image": "./imagesTr/spleen_24.nii.gz",
- "label": "./labelsTr/spleen_24.nii.gz"
- },{
- "image": "./imagesTr/spleen_31.nii.gz",
- "label": "./labelsTr/spleen_31.nii.gz"
- }],
- "validation": [{
- "image": "./imagesTr/spleen_22.nii.gz",
- "label": "./labelsTr/spleen_22.nii.gz"
- }, {
- "image": "./imagesTr/spleen_2.nii.gz",
- "label": "./labelsTr/spleen_2.nii.gz"
- }, {
- "image": "./imagesTr/spleen_3.nii.gz",
- "label": "./labelsTr/spleen_3.nii.gz"
- }, {
- "image": "./imagesTr/spleen_45.nii.gz",
- "label": "./labelsTr/spleen_45.nii.gz"
- }, {
- "image": "./imagesTr/spleen_32.nii.gz",
- "label": "./labelsTr/spleen_32.nii.gz"
- }]
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_2.json b/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_2.json
deleted file mode 100644
index 626f6a51dd..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_2.json
+++ /dev/null
@@ -1,46 +0,0 @@
-{
- "training": [{
- "image": "./imagesTr/spleen_12.nii.gz",
- "label": "./labelsTr/spleen_12.nii.gz"
- }, {
- "image": "./imagesTr/spleen_47.nii.gz",
- "label": "./labelsTr/spleen_47.nii.gz"
- }, {
- "image": "./imagesTr/spleen_28.nii.gz",
- "label": "./labelsTr/spleen_28.nii.gz"
- }, {
- "image": "./imagesTr/spleen_61.nii.gz",
- "label": "./labelsTr/spleen_61.nii.gz"
- }, {
- "image": "./imagesTr/spleen_29.nii.gz",
- "label": "./labelsTr/spleen_29.nii.gz"
- }, {
- "image": "./imagesTr/spleen_14.nii.gz",
- "label": "./labelsTr/spleen_14.nii.gz"
- }, {
- "image": "./imagesTr/spleen_63.nii.gz",
- "label": "./labelsTr/spleen_63.nii.gz"
- }, {
- "image": "./imagesTr/spleen_59.nii.gz",
- "label": "./labelsTr/spleen_59.nii.gz"
- }, {
- "image": "./imagesTr/spleen_33.nii.gz",
- "label": "./labelsTr/spleen_33.nii.gz"
- }],
- "validation": [{
- "image": "./imagesTr/spleen_22.nii.gz",
- "label": "./labelsTr/spleen_22.nii.gz"
- }, {
- "image": "./imagesTr/spleen_2.nii.gz",
- "label": "./labelsTr/spleen_2.nii.gz"
- }, {
- "image": "./imagesTr/spleen_3.nii.gz",
- "label": "./labelsTr/spleen_3.nii.gz"
- }, {
- "image": "./imagesTr/spleen_45.nii.gz",
- "label": "./labelsTr/spleen_45.nii.gz"
- }, {
- "image": "./imagesTr/spleen_32.nii.gz",
- "label": "./labelsTr/spleen_32.nii.gz"
- }]
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_3.json b/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_3.json
deleted file mode 100644
index c786fb4bc9..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_3.json
+++ /dev/null
@@ -1,46 +0,0 @@
-{
- "training": [{
- "image": "./imagesTr/spleen_52.nii.gz",
- "label": "./labelsTr/spleen_52.nii.gz"
- }, {
- "image": "./imagesTr/spleen_9.nii.gz",
- "label": "./labelsTr/spleen_9.nii.gz"
- }, {
- "image": "./imagesTr/spleen_10.nii.gz",
- "label": "./labelsTr/spleen_10.nii.gz"
- }, {
- "image": "./imagesTr/spleen_41.nii.gz",
- "label": "./labelsTr/spleen_41.nii.gz"
- }, {
- "image": "./imagesTr/spleen_60.nii.gz",
- "label": "./labelsTr/spleen_60.nii.gz"
- }, {
- "image": "./imagesTr/spleen_56.nii.gz",
- "label": "./labelsTr/spleen_56.nii.gz"
- }, {
- "image": "./imagesTr/spleen_26.nii.gz",
- "label": "./labelsTr/spleen_26.nii.gz"
- }, {
- "image": "./imagesTr/spleen_17.nii.gz",
- "label": "./labelsTr/spleen_17.nii.gz"
- }, {
- "image": "./imagesTr/spleen_8.nii.gz",
- "label": "./labelsTr/spleen_8.nii.gz"
- }],
- "validation": [{
- "image": "./imagesTr/spleen_22.nii.gz",
- "label": "./labelsTr/spleen_22.nii.gz"
- }, {
- "image": "./imagesTr/spleen_2.nii.gz",
- "label": "./labelsTr/spleen_2.nii.gz"
- }, {
- "image": "./imagesTr/spleen_3.nii.gz",
- "label": "./labelsTr/spleen_3.nii.gz"
- }, {
- "image": "./imagesTr/spleen_45.nii.gz",
- "label": "./labelsTr/spleen_45.nii.gz"
- }, {
- "image": "./imagesTr/spleen_32.nii.gz",
- "label": "./labelsTr/spleen_32.nii.gz"
- }]
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_4.json b/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_4.json
deleted file mode 100644
index 4dc6e4c0c8..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_4.json
+++ /dev/null
@@ -1,46 +0,0 @@
-{
- "training": [{
- "image": "./imagesTr/spleen_16.nii.gz",
- "label": "./labelsTr/spleen_16.nii.gz"
- }, {
- "image": "./imagesTr/spleen_20.nii.gz",
- "label": "./labelsTr/spleen_20.nii.gz"
- }, {
- "image": "./imagesTr/spleen_18.nii.gz",
- "label": "./labelsTr/spleen_18.nii.gz"
- }, {
- "image": "./imagesTr/spleen_46.nii.gz",
- "label": "./labelsTr/spleen_46.nii.gz"
- }, {
- "image": "./imagesTr/spleen_27.nii.gz",
- "label": "./labelsTr/spleen_27.nii.gz"
- }, {
- "image": "./imagesTr/spleen_49.nii.gz",
- "label": "./labelsTr/spleen_49.nii.gz"
- }, {
- "image": "./imagesTr/spleen_62.nii.gz",
- "label": "./labelsTr/spleen_62.nii.gz"
- }, {
- "image": "./imagesTr/spleen_53.nii.gz",
- "label": "./labelsTr/spleen_53.nii.gz"
- }, {
- "image": "./imagesTr/spleen_21.nii.gz",
- "label": "./labelsTr/spleen_21.nii.gz"
- }],
- "validation": [{
- "image": "./imagesTr/spleen_22.nii.gz",
- "label": "./labelsTr/spleen_22.nii.gz"
- }, {
- "image": "./imagesTr/spleen_2.nii.gz",
- "label": "./labelsTr/spleen_2.nii.gz"
- }, {
- "image": "./imagesTr/spleen_3.nii.gz",
- "label": "./labelsTr/spleen_3.nii.gz"
- }, {
- "image": "./imagesTr/spleen_45.nii.gz",
- "label": "./labelsTr/spleen_45.nii.gz"
- }, {
- "image": "./imagesTr/spleen_32.nii.gz",
- "label": "./labelsTr/spleen_32.nii.gz"
- }]
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_5.json b/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_5.json
deleted file mode 100644
index 3aecb2ff62..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_5.json
+++ /dev/null
@@ -1,46 +0,0 @@
-{
- "training": [{
- "image": "./imagesTr/spleen_40.nii.gz",
- "label": "./labelsTr/spleen_40.nii.gz"
- }, {
- "image": "./imagesTr/spleen_44.nii.gz",
- "label": "./labelsTr/spleen_44.nii.gz"
- }, {
- "image": "./imagesTr/spleen_38.nii.gz",
- "label": "./labelsTr/spleen_38.nii.gz"
- }, {
- "image": "./imagesTr/spleen_25.nii.gz",
- "label": "./labelsTr/spleen_25.nii.gz"
- }, {
- "image": "./imagesTr/spleen_13.nii.gz",
- "label": "./labelsTr/spleen_13.nii.gz"
- }, {
- "image": "./imagesTr/spleen_6.nii.gz",
- "label": "./labelsTr/spleen_6.nii.gz"
- }, {
- "image": "./imagesTr/spleen_19.nii.gz",
- "label": "./labelsTr/spleen_19.nii.gz"
- }, {
- "image": "./imagesTr/spleen_24.nii.gz",
- "label": "./labelsTr/spleen_24.nii.gz"
- },{
- "image": "./imagesTr/spleen_31.nii.gz",
- "label": "./labelsTr/spleen_31.nii.gz"
- }],
- "validation": [{
- "image": "./imagesTr/spleen_22.nii.gz",
- "label": "./labelsTr/spleen_22.nii.gz"
- }, {
- "image": "./imagesTr/spleen_2.nii.gz",
- "label": "./labelsTr/spleen_2.nii.gz"
- }, {
- "image": "./imagesTr/spleen_3.nii.gz",
- "label": "./labelsTr/spleen_3.nii.gz"
- }, {
- "image": "./imagesTr/spleen_45.nii.gz",
- "label": "./labelsTr/spleen_45.nii.gz"
- }, {
- "image": "./imagesTr/spleen_32.nii.gz",
- "label": "./labelsTr/spleen_32.nii.gz"
- }]
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_6.json b/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_6.json
deleted file mode 100644
index 626f6a51dd..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_6.json
+++ /dev/null
@@ -1,46 +0,0 @@
-{
- "training": [{
- "image": "./imagesTr/spleen_12.nii.gz",
- "label": "./labelsTr/spleen_12.nii.gz"
- }, {
- "image": "./imagesTr/spleen_47.nii.gz",
- "label": "./labelsTr/spleen_47.nii.gz"
- }, {
- "image": "./imagesTr/spleen_28.nii.gz",
- "label": "./labelsTr/spleen_28.nii.gz"
- }, {
- "image": "./imagesTr/spleen_61.nii.gz",
- "label": "./labelsTr/spleen_61.nii.gz"
- }, {
- "image": "./imagesTr/spleen_29.nii.gz",
- "label": "./labelsTr/spleen_29.nii.gz"
- }, {
- "image": "./imagesTr/spleen_14.nii.gz",
- "label": "./labelsTr/spleen_14.nii.gz"
- }, {
- "image": "./imagesTr/spleen_63.nii.gz",
- "label": "./labelsTr/spleen_63.nii.gz"
- }, {
- "image": "./imagesTr/spleen_59.nii.gz",
- "label": "./labelsTr/spleen_59.nii.gz"
- }, {
- "image": "./imagesTr/spleen_33.nii.gz",
- "label": "./labelsTr/spleen_33.nii.gz"
- }],
- "validation": [{
- "image": "./imagesTr/spleen_22.nii.gz",
- "label": "./labelsTr/spleen_22.nii.gz"
- }, {
- "image": "./imagesTr/spleen_2.nii.gz",
- "label": "./labelsTr/spleen_2.nii.gz"
- }, {
- "image": "./imagesTr/spleen_3.nii.gz",
- "label": "./labelsTr/spleen_3.nii.gz"
- }, {
- "image": "./imagesTr/spleen_45.nii.gz",
- "label": "./labelsTr/spleen_45.nii.gz"
- }, {
- "image": "./imagesTr/spleen_32.nii.gz",
- "label": "./labelsTr/spleen_32.nii.gz"
- }]
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_7.json b/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_7.json
deleted file mode 100644
index c786fb4bc9..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_7.json
+++ /dev/null
@@ -1,46 +0,0 @@
-{
- "training": [{
- "image": "./imagesTr/spleen_52.nii.gz",
- "label": "./labelsTr/spleen_52.nii.gz"
- }, {
- "image": "./imagesTr/spleen_9.nii.gz",
- "label": "./labelsTr/spleen_9.nii.gz"
- }, {
- "image": "./imagesTr/spleen_10.nii.gz",
- "label": "./labelsTr/spleen_10.nii.gz"
- }, {
- "image": "./imagesTr/spleen_41.nii.gz",
- "label": "./labelsTr/spleen_41.nii.gz"
- }, {
- "image": "./imagesTr/spleen_60.nii.gz",
- "label": "./labelsTr/spleen_60.nii.gz"
- }, {
- "image": "./imagesTr/spleen_56.nii.gz",
- "label": "./labelsTr/spleen_56.nii.gz"
- }, {
- "image": "./imagesTr/spleen_26.nii.gz",
- "label": "./labelsTr/spleen_26.nii.gz"
- }, {
- "image": "./imagesTr/spleen_17.nii.gz",
- "label": "./labelsTr/spleen_17.nii.gz"
- }, {
- "image": "./imagesTr/spleen_8.nii.gz",
- "label": "./labelsTr/spleen_8.nii.gz"
- }],
- "validation": [{
- "image": "./imagesTr/spleen_22.nii.gz",
- "label": "./labelsTr/spleen_22.nii.gz"
- }, {
- "image": "./imagesTr/spleen_2.nii.gz",
- "label": "./labelsTr/spleen_2.nii.gz"
- }, {
- "image": "./imagesTr/spleen_3.nii.gz",
- "label": "./labelsTr/spleen_3.nii.gz"
- }, {
- "image": "./imagesTr/spleen_45.nii.gz",
- "label": "./labelsTr/spleen_45.nii.gz"
- }, {
- "image": "./imagesTr/spleen_32.nii.gz",
- "label": "./labelsTr/spleen_32.nii.gz"
- }]
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_8.json b/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_8.json
deleted file mode 100644
index 4dc6e4c0c8..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/data/dataset_8.json
+++ /dev/null
@@ -1,46 +0,0 @@
-{
- "training": [{
- "image": "./imagesTr/spleen_16.nii.gz",
- "label": "./labelsTr/spleen_16.nii.gz"
- }, {
- "image": "./imagesTr/spleen_20.nii.gz",
- "label": "./labelsTr/spleen_20.nii.gz"
- }, {
- "image": "./imagesTr/spleen_18.nii.gz",
- "label": "./labelsTr/spleen_18.nii.gz"
- }, {
- "image": "./imagesTr/spleen_46.nii.gz",
- "label": "./labelsTr/spleen_46.nii.gz"
- }, {
- "image": "./imagesTr/spleen_27.nii.gz",
- "label": "./labelsTr/spleen_27.nii.gz"
- }, {
- "image": "./imagesTr/spleen_49.nii.gz",
- "label": "./labelsTr/spleen_49.nii.gz"
- }, {
- "image": "./imagesTr/spleen_62.nii.gz",
- "label": "./labelsTr/spleen_62.nii.gz"
- }, {
- "image": "./imagesTr/spleen_53.nii.gz",
- "label": "./labelsTr/spleen_53.nii.gz"
- }, {
- "image": "./imagesTr/spleen_21.nii.gz",
- "label": "./labelsTr/spleen_21.nii.gz"
- }],
- "validation": [{
- "image": "./imagesTr/spleen_22.nii.gz",
- "label": "./labelsTr/spleen_22.nii.gz"
- }, {
- "image": "./imagesTr/spleen_2.nii.gz",
- "label": "./labelsTr/spleen_2.nii.gz"
- }, {
- "image": "./imagesTr/spleen_3.nii.gz",
- "label": "./labelsTr/spleen_3.nii.gz"
- }, {
- "image": "./imagesTr/spleen_45.nii.gz",
- "label": "./labelsTr/spleen_45.nii.gz"
- }, {
- "image": "./imagesTr/spleen_32.nii.gz",
- "label": "./labelsTr/spleen_32.nii.gz"
- }]
-}
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/data/download_dataset.py b/federated_learning/nvflare/nvflare_example/spleen_example/data/download_dataset.py
deleted file mode 100644
index 26e34c9ee7..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/data/download_dataset.py
+++ /dev/null
@@ -1,29 +0,0 @@
-import os
-from argparse import ArgumentDefaultsHelpFormatter, ArgumentParser
-
-from monai.apps.utils import download_and_extract
-
-
-def download_spleen_dataset(root_dir: str):
- """
- This function is used to download Spleen dataset for this example.
- If you'd like to download other Decathlon datasets, please check
- ``monai.apps.datasets.DecathlonDataset`` for more details.
- """
- url = "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar"
- md5 = "410d4a301da4e5b2f6f86ec3ddba524e"
- task = "Task09_Spleen"
- dataset_dir = os.path.join(root_dir, task)
- tarfile_name = f"{dataset_dir}.tar"
- download_and_extract(
- url=url, filepath=tarfile_name, output_dir=root_dir, hash_val=md5
- )
-
-
-if __name__ == "__main__":
- parser = ArgumentParser(formatter_class=ArgumentDefaultsHelpFormatter)
- parser.add_argument(
- "-root_dir", type=str, help="the root path to put downloaded file."
- )
- args = parser.parse_args()
- download_spleen_dataset(root_dir=args.root_dir)
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/data/download_dataset.sh b/federated_learning/nvflare/nvflare_example/spleen_example/data/download_dataset.sh
deleted file mode 100755
index 16c0ac61a2..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/data/download_dataset.sh
+++ /dev/null
@@ -1,6 +0,0 @@
-#!/usr/bin/env bash
-DATASET_DOWNLOAD_PATH="${projectpath}/data"
-mkdir ${DATASET_DOWNLOAD_PATH}
-python3 ${projectpath}/spleen_example/data/download_dataset.py -root_dir ${DATASET_DOWNLOAD_PATH}
-echo "copy datalist files to ${DATASET_DOWNLOAD_PATH}/Task09_Spleen/."
-cp ${projectpath}/spleen_example/data/dataset_*.json ${DATASET_DOWNLOAD_PATH}/Task09_Spleen/.
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/resources/log.config b/federated_learning/nvflare/nvflare_example/spleen_example/resources/log.config
deleted file mode 100644
index 6b9761569b..0000000000
--- a/federated_learning/nvflare/nvflare_example/spleen_example/resources/log.config
+++ /dev/null
@@ -1,27 +0,0 @@
-[loggers]
-keys=root,modelLogger
-
-[handlers]
-keys=consoleHandler
-
-[formatters]
-keys=fullFormatter
-
-[logger_root]
-level=INFO
-handlers=consoleHandler
-
-[logger_modelLogger]
-level=DEBUG
-handlers=consoleHandler
-qualname=modelLogger
-propagate=0
-
-[handler_consoleHandler]
-class=StreamHandler
-level=DEBUG
-formatter=fullFormatter
-args=(sys.stdout,)
-
-[formatter_fullFormatter]
-format=%(asctime)s - %(name)s - %(levelname)s - %(message)s
diff --git a/federated_learning/nvflare/nvflare_example/tensorboard.png b/federated_learning/nvflare/nvflare_example/tensorboard.png
deleted file mode 100644
index d6407352d0..0000000000
Binary files a/federated_learning/nvflare/nvflare_example/tensorboard.png and /dev/null differ
diff --git a/federated_learning/nvflare/nvflare_example/virtualenv/requirements.txt b/federated_learning/nvflare/nvflare_example/virtualenv/requirements.txt
deleted file mode 100644
index b22c2a971d..0000000000
--- a/federated_learning/nvflare/nvflare_example/virtualenv/requirements.txt
+++ /dev/null
@@ -1,6 +0,0 @@
-monai==0.7.0
-nvflare==1.0.2
-pytorch-ignite==0.4.5
-tqdm==4.61.2
-nibabel==3.2.1
-tensorboard==2.5.0
diff --git a/federated_learning/nvflare/nvflare_example/virtualenv/set_env.sh b/federated_learning/nvflare/nvflare_example/virtualenv/set_env.sh
deleted file mode 100755
index 4ef5610490..0000000000
--- a/federated_learning/nvflare/nvflare_example/virtualenv/set_env.sh
+++ /dev/null
@@ -1,7 +0,0 @@
-#!/usr/bin/env bash
-
-export projectname='nvflare_monai'
-export projectpath="."
-
-python3 -m venv ${projectname}
-source ${projectname}/bin/activate
diff --git a/federated_learning/nvflare/nvflare_example_docker/.gitignore b/federated_learning/nvflare/nvflare_example_docker/.gitignore
deleted file mode 100644
index 81ac0b529b..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/.gitignore
+++ /dev/null
@@ -1,12 +0,0 @@
-# data
-Task09_Spleen
-*.tar
-*.nii
-*.nii.gz
-
-# virtual environments
-nvflare_monai
-
-# demo artifacts
-demo_workspace
-expr_files
diff --git a/federated_learning/nvflare/nvflare_example_docker/1-Startup.ipynb b/federated_learning/nvflare/nvflare_example_docker/1-Startup.ipynb
deleted file mode 100644
index c4896ecbed..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/1-Startup.ipynb
+++ /dev/null
@@ -1,217 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Introduction"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "In `Provision Package Preparation` step of the README, we created `audit.pkl` and `zip` files for all the provisioned parties (server, clients, and admins) in `expr_files/`. The zip files are encrypted and the passwords are saved in `audit.pkl`.\n",
- "\n",
- "In an experiment, you need to send decrypted folders to each site so they could run it on their system. Therefore, in this notebook, we would decrypt and send folders to all the provisioned parties. After running this notebook."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [],
- "source": [
- "import shutil\n",
- "from zipfile import ZipFile\n",
- "import pickle\n",
- "import os"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "['project.yml',\n",
- " 'researcher@nvidia.com.zip',\n",
- " 'download_dataset.py',\n",
- " 'org1-b.zip',\n",
- " 'researcher@org2.com.zip',\n",
- " 'admin@nvidia.com.zip',\n",
- " 'org1-a.zip',\n",
- " 'org3.zip',\n",
- " 'server.zip',\n",
- " 'researcher@org1.com.zip',\n",
- " 'org2.zip',\n",
- " 'it@org2.com.zip',\n",
- " 'prepare_expr_files.sh']"
- ]
- },
- "execution_count": 2,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "os.listdir(\"expr_files/\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "In this example, `server.zip` will be used to create the server, `org1-a.zip` and `org1-b.zip` will be used to create two clients, and `admin@nvidia.com.zip` will be used to create an admin client to operate the FL experiment.\n",
- "\n",
- "First, unzip all the packages with the following code:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {
- "lines_to_next_cell": 2
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "demo_workspace created!\n",
- "unzip: server finished.\n",
- "unzip: org1-a finished.\n",
- "unzip: org1-b finished.\n",
- "unzip: admin@nvidia.com finished.\n"
- ]
- }
- ],
- "source": [
- "startup_path = \"expr_files\" # this is the path that contains `audit.pkl` and zip files\n",
- "workspace = \"demo_workspace\" # this is the folder that will be created to contain all experiment related files\n",
- "\n",
- "if not os.path.exists(workspace):\n",
- " os.makedirs(workspace)\n",
- " print(workspace, \" created!\")\n",
- "\n",
- "used_file_names = [\"server\", \"org1-a\", \"org1-b\", \"admin@nvidia.com\"]\n",
- "\n",
- "for name in used_file_names:\n",
- " zip_file_path = os.path.join(startup_path, name + \".zip\")\n",
- " dst_file_path = os.path.join(workspace, name)\n",
- " if not os.path.exists(dst_file_path):\n",
- " os.makedirs(dst_file_path)\n",
- " with ZipFile(zip_file_path, 'r') as zip_ref:\n",
- " zip_ref.extractall(path=dst_file_path)\n",
- " # change permissions\n",
- " if \".com\" in name:\n",
- " sub_file_list = [\"docker.sh\", \"fl_admin.sh\"]\n",
- " else:\n",
- " sub_file_list = [\"start.sh\", \"sub_start.sh\", \"docker.sh\"]\n",
- " for file in sub_file_list:\n",
- " os.chmod(os.path.join(dst_file_path, \"startup\", file), 0o755)\n",
- " print(\"unzip: {} finished.\".format(name))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "['org1-b', 'server', 'admin@nvidia.com', 'org1-a']"
- ]
- },
- "execution_count": 4,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "# check the created workspace\n",
- "os.listdir(workspace)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "With default settings, the experiment related config folder `spleen_example` should be copied into the `transfer` folder within the admin package:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "copied spleen_example into demo_workspace/admin@nvidia.com/transfer/.\n"
- ]
- }
- ],
- "source": [
- "config_folder = \"spleen_example\"\n",
- "\n",
- "transfer_path = os.path.join(workspace, \"admin@nvidia.com\", \"transfer/\")\n",
- "if not os.path.exists(transfer_path):\n",
- " os.makedirs(transfer_path)\n",
- "shutil.copytree(config_folder, os.path.join(transfer_path, config_folder))\n",
- "print(\"copied {} into {}.\".format(config_folder, transfer_path))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "So far, all required files are created in the workspace. Before starting the docker images, we can update the permissions for these files:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "metadata": {},
- "outputs": [],
- "source": [
- "!chown -R 1000:1000 demo_workspace/*"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Next Steps\n",
- "\n",
- "You have now finished unzipping the provisioning files and copying the experiment folder to the admin's transfer folder.\n",
- "In the next notebook, [Server Startup Notebook](2-Server.ipynb), you will start the server container."
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.8.5"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
diff --git a/federated_learning/nvflare/nvflare_example_docker/2-Server.ipynb b/federated_learning/nvflare/nvflare_example_docker/2-Server.ipynb
deleted file mode 100644
index 4ab15722b8..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/2-Server.ipynb
+++ /dev/null
@@ -1,241 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {
- "pycharm": {
- "metadata": false
- }
- },
- "source": [
- "# FL Server Joining FL experiment\n",
- "\n",
- "The purpose of this notebook is to show how to start a server to participate in an FL experiment."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "pycharm": {
- "metadata": false
- }
- },
- "source": [
- "## Prerequisites\n",
- "- The [Startup notebook](1-Startup.ipynb) has been run successfully."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "pycharm": {
- "metadata": false
- }
- },
- "source": [
- "### Start Server Docker"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [],
- "source": [
- "import os\n",
- "from IPython.display import HTML\n",
- "from multiprocessing import Process"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "running cmd demo_workspace/server/startup/docker.sh\n",
- "Starting docker with monai_nvflare:latest\n",
- "\n",
- "=============\n",
- "== PyTorch ==\n",
- "=============\n",
- "\n",
- "NVIDIA Release 21.08 (build 26011915)\n",
- "PyTorch Version 1.10.0a0+3fd9dcf\n",
- "\n",
- "Container image Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.\n",
- "\n",
- "Copyright (c) 2014-2021 Facebook Inc.\n",
- "Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)\n",
- "Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)\n",
- "Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)\n",
- "Copyright (c) 2011-2013 NYU (Clement Farabet)\n",
- "Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)\n",
- "Copyright (c) 2006 Idiap Research Institute (Samy Bengio)\n",
- "Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)\n",
- "Copyright (c) 2015 Google Inc.\n",
- "Copyright (c) 2015 Yangqing Jia\n",
- "Copyright (c) 2013-2016 The Caffe contributors\n",
- "All rights reserved.\n",
- "\n",
- "NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.\n",
- "\n",
- "Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.\n",
- "\n",
- "This container image and its contents are governed by the NVIDIA Deep Learning Container License.\n",
- "By pulling and using the container, you accept the terms and conditions of this license:\n",
- "https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license\n",
- "\n",
- "NOTE: MOFED driver for multi-node communication was not detected.\n",
- " Multi-node communication performance may be reduced.\n",
- "\n",
- "\u001b]0;root@sys: /workspace\u0007root@sys:/workspace# "
- ]
- }
- ],
- "source": [
- "server_name = \"server\"\n",
- "workspace = \"demo_workspace\"\n",
- "\n",
- "server_startup_path = os.path.join(workspace, server_name, \"startup\")\n",
- "cmd = server_startup_path + \"/docker.sh\"\n",
- "\n",
- "\n",
- "def run_server():\n",
- " cmd = server_startup_path + \"/docker.sh\"\n",
- " print(\"running cmd \" + cmd)\n",
- " !$cmd\n",
- "\n",
- "\n",
- "p1 = Process(target=run_server)\n",
- "\n",
- "p1.start()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Check Started Containers"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\n",
- "08492335d003 monai_nvflare:latest \"/usr/local/bin/nvid…\" 2 seconds ago Up 1 second flserver\n"
- ]
- }
- ],
- "source": [
- "!docker ps -a"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Start Server"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "To start a server, you should:\n",
- "\n",
- "- open a terminal and enter the container named `flserver`.\n",
- "- run `start.sh` under `startup/`."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- " Open a new terminal"
- ],
- "text/plain": [
- ""
- ]
- },
- "execution_count": 4,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "# You can click the following link, or manually open a new terminal.\n",
- "HTML(' Open a new terminal')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "The commands can be:\n",
- "\n",
- "```\n",
- "docker exec -it flserver bash\n",
- "cd startup/\n",
- "sh start.sh\n",
- "```\n",
- "\n",
- "A successfully started server will print logs as follow:\n",
- "

"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Next Steps\n",
- "\n",
- "You have now started the server container.\n",
- "In the next notebook, [Client Startup Notebook](3-Client.ipynb), you'll start two clients participating in the FL experiment."
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.8.5"
- },
- "stem_cell": {
- "cell_type": "raw",
- "metadata": {
- "pycharm": {
- "metadata": false
- }
- },
- "source": ""
- }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
diff --git a/federated_learning/nvflare/nvflare_example_docker/3-Client.ipynb b/federated_learning/nvflare/nvflare_example_docker/3-Client.ipynb
deleted file mode 100644
index 88928ad542..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/3-Client.ipynb
+++ /dev/null
@@ -1,417 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {
- "pycharm": {
- "metadata": false
- }
- },
- "source": [
- "# FL client Joining FL experiment \n",
- "\n",
- "The purpose of this notebook is to show how to start clients to participate in an FL experiment."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "pycharm": {
- "metadata": false
- }
- },
- "source": [
- "## Prerequisites\n",
- "- The [Startup notebook](1-Startup.ipynb) has been run successfully.\n",
- "- A server has been started."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Edit Docker Script\n",
- "\n",
- "Before starting the docker script, you need to edit it to ensure that the environments (such as dataset, GPUs, memory, ...) meet your requirement."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [],
- "source": [
- "import os\n",
- "from IPython.display import HTML\n",
- "from multiprocessing import Process"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [],
- "source": [
- "client_name_1 = \"org1-a\"\n",
- "client_name_2 = \"org1-b\"\n",
- "workspace = \"demo_workspace\"\n",
- "\n",
- "client1_startup_path = os.path.join(workspace, client_name_1, 'startup')\n",
- "client2_startup_path = os.path.join(workspace, client_name_2, 'startup')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "[Please click here to check the org1-a script](demo_workspace/org1-a/startup/docker.sh)\n",
- "\n",
- "[Please click here to check the org1-b script](demo_workspace/org1-b/startup/docker.sh)\n",
- "\n",
- "As we can see, the default data directory is `MY_DATA_DIR=/home/flclient/data/msd-data/Task09_Spleen`.\n",
- "\n",
- "Please modify it and ensure that it equals to the actual path that contains `Task09_Spleen`. If you downloaded the dataset via using `prerpare_expr_files.sh`, `MY_DATA_DIR` can be set to `$(pwd)`. In addition, you may also need to change `GPU2USE` and add `shm-size` argument for `docker run` command.\n",
- "\n",
- "The following is my modified script for `org1-a`:\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "```\n",
- "#!/usr/bin/env bash\n",
- "DIR=\"$( cd \"$( dirname \"${BASH_SOURCE[0]}\" )\" >/dev/null 2>&1 && pwd )\"\n",
- "\n",
- "# docker run script for FL client\n",
- "# local data directory\n",
- "MY_DATA_DIR=$(pwd)\n",
- "# for all gpus use line below \n",
- "#GPU2USE=all \n",
- "# for 2 gpus use line below\n",
- "#GPU2USE=2 \n",
- "# for specific gpus as gpu#0 and gpu#2 use line below\n",
- "GPU2USE='\"device=0\"'\n",
- "# to use host network, use line below\n",
- "NETARG=\"--net=host\"\n",
- "# FL clients do not need to open ports, so the following line is not needed.\n",
- "#NETARG=\"-p 443:443 -p 8003:8003\"\n",
- "DOCKER_IMAGE=monai_nvflare:latest\n",
- "echo \"Starting docker with $DOCKER_IMAGE\"\n",
- "docker run --rm -it --shm-size 16G --name=org1-a --gpus=$GPU2USE -u $(id -u):$(id -g) -v /etc/passwd:/etc/passwd -v /etc/group:/etc/group -v $DIR/..:/workspace/ -v $MY_DATA_DIR:/data/:ro -w /workspace/ --ipc=host $NETARG $DOCKER_IMAGE /bin/bash\n",
- "```"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Check the Connection to Server\n",
- "\n",
- "Run telnet with the server port. A successful connection should give a message as shown below.\n",
- "\n",
- "```\n",
- "Trying 127.0.0.1...\n",
- "Connected to localhost.\n",
- "Escape character is '^]'.\n",
- "Connection closed by foreign host.\n",
- "```"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {
- "pycharm": {
- "metadata": false,
- "name": "#%%\n"
- }
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Trying 127.0.0.1...\n",
- "Connected to localhost.\n",
- "Escape character is '^]'.\n",
- "^C\n",
- "Connection closed by foreign host.\n"
- ]
- }
- ],
- "source": [
- "!telnet localhost 8002"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "pycharm": {
- "metadata": false
- }
- },
- "source": [
- "### Start Clients Docker\n",
- "\n",
- "After modifying `docker.sh` for `org1-a` and `org1-b`, we are able to start:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {
- "pycharm": {
- "metadata": false,
- "name": "#%%\n"
- }
- },
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "running cmd demo_workspace/org1-b/startup/docker.sh\n",
- "running cmd demo_workspace/org1-a/startup/docker.sh\n",
- "Starting docker with monai_nvflare:latest\n",
- "Starting docker with monai_nvflare:latest\n",
- "\n",
- "=============\n",
- "== PyTorch ==\n",
- "=============\n",
- "\n",
- "NVIDIA Release 21.08 (build 26011915)\n",
- "PyTorch Version 1.10.0a0+3fd9dcf\n",
- "\n",
- "Container image Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.\n",
- "\n",
- "Copyright (c) 2014-2021 Facebook Inc.\n",
- "Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)\n",
- "Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)\n",
- "Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)\n",
- "Copyright (c) 2011-2013 NYU (Clement Farabet)\n",
- "Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)\n",
- "Copyright (c) 2006 Idiap Research Institute (Samy Bengio)\n",
- "Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)\n",
- "Copyright (c) 2015 Google Inc.\n",
- "Copyright (c) 2015 Yangqing Jia\n",
- "Copyright (c) 2013-2016 The Caffe contributors\n",
- "All rights reserved.\n",
- "\n",
- "NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.\n",
- "\n",
- "Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.\n",
- "\n",
- "This container image and its contents are governed by the NVIDIA Deep Learning Container License.\n",
- "By pulling and using the container, you accept the terms and conditions of this license:\n",
- "https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license\n",
- "\n",
- "=============\n",
- "== PyTorch ==\n",
- "=============\n",
- "\n",
- "NVIDIA Release 21.08 (build 26011915)\n",
- "PyTorch Version 1.10.0a0+3fd9dcf\n",
- "\n",
- "Container image Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.\n",
- "\n",
- "Copyright (c) 2014-2021 Facebook Inc.\n",
- "Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)\n",
- "Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)\n",
- "Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)\n",
- "Copyright (c) 2011-2013 NYU (Clement Farabet)\n",
- "Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)\n",
- "Copyright (c) 2006 Idiap Research Institute (Samy Bengio)\n",
- "Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)\n",
- "Copyright (c) 2015 Google Inc.\n",
- "Copyright (c) 2015 Yangqing Jia\n",
- "Copyright (c) 2013-2016 The Caffe contributors\n",
- "All rights reserved.\n",
- "\n",
- "NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.\n",
- "\n",
- "Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.\n",
- "\n",
- "This container image and its contents are governed by the NVIDIA Deep Learning Container License.\n",
- "By pulling and using the container, you accept the terms and conditions of this license:\n",
- "https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license\n",
- "\n",
- "NOTE: MOFED driver for multi-node communication was not detected.\n",
- " Multi-node communication performance may be reduced.\n",
- "\n",
- "venn@sys:/workspace$ \n",
- "NOTE: MOFED driver for multi-node communication was not detected.\n",
- " Multi-node communication performance may be reduced.\n",
- "\n",
- "venn@sys:/workspace$ "
- ]
- }
- ],
- "source": [
- "def run_client1():\n",
- " cmd = client1_startup_path + \"/docker.sh\"\n",
- " print(\"running cmd \" + cmd)\n",
- " !$cmd\n",
- "\n",
- "\n",
- "def run_client2():\n",
- " cmd = client2_startup_path + \"/docker.sh\"\n",
- " print(\"running cmd \" + cmd)\n",
- " !$cmd\n",
- "\n",
- "\n",
- "p1 = Process(target=run_client1)\n",
- "p2 = Process(target=run_client2)\n",
- "\n",
- "p1.start()\n",
- "p2.start()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Check Started Containers"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\n",
- "96c8f9406303 monai_nvflare:latest \"/usr/local/bin/nvid…\" 2 seconds ago Up 1 second org1-a\n",
- "6b5d9eefc699 monai_nvflare:latest \"/usr/local/bin/nvid…\" 2 seconds ago Up 1 second org1-b\n",
- "08492335d003 monai_nvflare:latest \"/usr/local/bin/nvid…\" 40 seconds ago Up 39 seconds flserver\n"
- ]
- }
- ],
- "source": [
- "!docker ps -a"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Start Clients"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "To start a client, you should:\n",
- "\n",
- "- open a terminal and enter the container named `org1-a`.\n",
- "- run `start.sh` under `startup/`."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- " Open a new terminal"
- ],
- "text/plain": [
- ""
- ]
- },
- "execution_count": 5,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "# You can click the following link, or manually open a new terminal.\n",
- "HTML(' Open a new terminal')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "docker exec -it org1-a bash\n",
- "cd startup/\n",
- "sh start.shThe commands can be:\n",
- "\n",
- "```\n",
- "docker exec -it org1-a bash\n",
- "cd startup/\n",
- "sh start.sh\n",
- "```\n",
- "\n",
- "A successfully started client will print logs as follow:\n",
- "

"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "To start the second client, please open a new terminal again, and just use the same commands but for the `org1-b` Docker container:\n",
- "```\n",
- "docker exec -it org1-b bash\n",
- "cd startup/\n",
- "sh start.sh\n",
- "```"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "After a client has been successfully started, the server side will show the following information:\n",
- "

"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Next Steps\n",
- "\n",
- "You have now started two client containers.\n",
- "In the next notebook, [Admin Startup Notebook](4-Admin.ipynb), you'll start an admin participating in the FL experiment."
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.8.5"
- },
- "stem_cell": {
- "cell_type": "raw",
- "metadata": {
- "pycharm": {
- "metadata": false
- }
- },
- "source": ""
- }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
diff --git a/federated_learning/nvflare/nvflare_example_docker/4-Admin.ipynb b/federated_learning/nvflare/nvflare_example_docker/4-Admin.ipynb
deleted file mode 100644
index 749be69c26..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/4-Admin.ipynb
+++ /dev/null
@@ -1,439 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {
- "pycharm": {
- "metadata": false
- }
- },
- "source": [
- "# Admin Startup\n",
- "\n",
- "The purpose of this notebook is to show how to start an admin client to operate an FL experiment with a server and at least one client started."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "pycharm": {
- "metadata": false
- }
- },
- "source": [
- "## Prerequisites\n",
- "- The [Startup notebook](Startup.ipynb) has been run successfully.\n",
- "- A server and at least one client has been started."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Check the Config Folder\n",
- "\n",
- "The config folder should be in the `transfer/` directory:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "['spleen_example']"
- ]
- },
- "execution_count": 1,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "import os\n",
- "from IPython.display import HTML\n",
- "from multiprocessing import Process\n",
- "\n",
- "workspace = \"demo_workspace/\"\n",
- "admin_name = \"admin@nvidia.com\"\n",
- "\n",
- "transfer_path = os.path.join(workspace, admin_name, \"transfer\")\n",
- "os.listdir(transfer_path)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Edit Docker Script\n",
- "\n",
- "Before starting the docker script, you need to edit it to ensure that the environments (such as dataset, GPUs, memory, ...) meet your requirement."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "[Please click here to check the admin@nvidia.com script](demo_workspace/admin@nvidia.com/startup/docker.sh)\n",
- "\n",
- "For this experiment, please modify the script in order to use the host network:\n",
- "\n",
- "```\n",
- "#!/usr/bin/env bash\n",
- "DIR=\"$( cd \"$( dirname \"${BASH_SOURCE[0]}\" )\" >/dev/null 2>&1 && pwd )\"\n",
- "# docker run script for FL admin\n",
- "# to use host network, use line below\n",
- "NETARG=\"--net=host\"\n",
- "# Admin clients do not need to open ports, so the following line is not needed.\n",
- "#NETARG=\"-p 8003:8003\"\n",
- "DOCKER_IMAGE=monai_nvflare:latest\n",
- "echo \"Starting docker with $DOCKER_IMAGE\"\n",
- "docker run --rm -it --name=fladmin -v $DIR/..:/workspace/ -w /workspace/ --ipc=host $NETARG $DOCKER_IMAGE /bin/bash\n",
- "```"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "tags": []
- },
- "source": [
- "### Start Admin Docker"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "running cmd demo_workspace/admin@nvidia.com/startup/docker.sh\n",
- "Starting docker with monai_nvflare:latest\n",
- "\n",
- "=============\n",
- "== PyTorch ==\n",
- "=============\n",
- "\n",
- "NVIDIA Release 21.08 (build 26011915)\n",
- "PyTorch Version 1.10.0a0+3fd9dcf\n",
- "\n",
- "Container image Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.\n",
- "\n",
- "Copyright (c) 2014-2021 Facebook Inc.\n",
- "Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)\n",
- "Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)\n",
- "Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)\n",
- "Copyright (c) 2011-2013 NYU (Clement Farabet)\n",
- "Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)\n",
- "Copyright (c) 2006 Idiap Research Institute (Samy Bengio)\n",
- "Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)\n",
- "Copyright (c) 2015 Google Inc.\n",
- "Copyright (c) 2015 Yangqing Jia\n",
- "Copyright (c) 2013-2016 The Caffe contributors\n",
- "All rights reserved.\n",
- "\n",
- "NVIDIA Deep Learning Profiler (dlprof) Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.\n",
- "\n",
- "Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.\n",
- "\n",
- "This container image and its contents are governed by the NVIDIA Deep Learning Container License.\n",
- "By pulling and using the container, you accept the terms and conditions of this license:\n",
- "https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license\n",
- "\n",
- "NOTE: MOFED driver for multi-node communication was not detected.\n",
- " Multi-node communication performance may be reduced.\n",
- "\n",
- "\u001b]0;root@sys: /workspace\u0007root@sys:/workspace# "
- ]
- }
- ],
- "source": [
- "admin_startup_path = os.path.join(workspace, admin_name, \"startup\")\n",
- "cmd = admin_startup_path + \"/docker.sh\"\n",
- "\n",
- "\n",
- "def run_admin():\n",
- " cmd = admin_startup_path + \"/docker.sh\"\n",
- " print(\"running cmd \" + cmd)\n",
- " !$cmd\n",
- "\n",
- "\n",
- "p1 = Process(target=run_admin)\n",
- "\n",
- "p1.start()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Check Started Containers"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\n",
- "3de0f6690d8e monai_nvflare:latest \"/usr/local/bin/nvid…\" 6 seconds ago Up 6 seconds fladmin\n",
- "96c8f9406303 monai_nvflare:latest \"/usr/local/bin/nvid…\" 52 seconds ago Up 52 seconds org1-a\n",
- "6b5d9eefc699 monai_nvflare:latest \"/usr/local/bin/nvid…\" 52 seconds ago Up 52 seconds org1-b\n",
- "08492335d003 monai_nvflare:latest \"/usr/local/bin/nvid…\" About a minute ago Up About a minute flserver\n"
- ]
- }
- ],
- "source": [
- "!docker ps -a"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Start Admin"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "To start an admin, you should:\n",
- "\n",
- "- open a terminal and enter the container named `fladmin`.\n",
- "- run `fl_admin.sh` under `startup/`.\n",
- "- input admin name `admin@nvidia.com`."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- " Open a new terminal"
- ],
- "text/plain": [
- ""
- ]
- },
- "execution_count": 4,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "HTML(' Open a new terminal')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "You can accomplish these steps by running:\n",
- "\n",
- "```\n",
- "docker exec -it fladmin bash\n",
- "cd startup/\n",
- "bash fl_admin.sh\n",
- "admin@nvidia.com\n",
- "```"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Prepare for the experiment\n",
- "\n",
- "You need to execute the following steps to prepare for the experiment:\n",
- "\n",
- "- upload pipeline config folder\n",
- "- set FL training number\n",
- "- deploy the folder to client(s) and server\n",
- "\n",
- "The commands can be:\n",
- "```\n",
- "upload_app spleen_example\n",
- "set_run_number 1\n",
- "deploy_app spleen_example server\n",
- "deploy_app spleen_example client\n",
- "```\n",
- "\n",
- "Now, let's check if the folder has been distributed into the server and all client(s):"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "config files on server: ['mmar_server', 'mmar_org1-a', 'mmar_org1-b']\n",
- " \n",
- "config files on org1-a: ['mmar_org1-a']\n",
- " \n",
- "config files on org1-b: ['mmar_org1-b']\n",
- " \n"
- ]
- }
- ],
- "source": [
- "run_file = 'run_1'\n",
- "\n",
- "for name in ['server', 'org1-a', 'org1-b']:\n",
- " path = os.path.join(workspace, name, run_file)\n",
- " if os.path.exists(path):\n",
- " print(\"config files on {}: {}\".\n",
- " format(name, os.listdir(path)))\n",
- " print(\" \")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "This example prepares two different data list files: `dataset_part1.json` and `dataset_part2.json`, and they have the same validation set and totally different training set. The default file used in `spleen_example/config_train.json` is `dataset_part1.json`. Therefore, if you want to let two clients train on different data, you can switch to use `dataset_part2.json` for `org1-b`.\n",
- "\n",
- "[Link to org1-a config](demo_workspace/org1-a/run_1/mmar_org1-a/config/config_train.json)\n",
- "\n",
- "[Link to org1-b config](demo_workspace/org1-b/run_1/mmar_org1-b/config/config_train.json)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Start Training"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Now, you can start training with:\n",
- "```\n",
- "start_app server\n",
- "start_app client\n",
- "```\n",
- "You can check the status by running:\n",
- "```\n",
- "check_status server\n",
- "check_status client\n",
- "```"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Stop Training\n",
- "\n",
- "You can stop training for the server and/or client(s) by running:\n",
- "```\n",
- "abort client\n",
- "abort server\n",
- "```\n",
- "If you only want to stop org1-b, you can use:\n",
- "```\n",
- "abort client org1-b\n",
- "```"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Start Training (multi-gpus)\n",
- "\n",
- "If you would like to use multiple gpus, you can start training with n gpus (where n is the number of gpus) by running:\n",
- "```\n",
- "start_mgpu client n\n",
- "```\n",
- "\n",
- "The default `multi_gpu` flag is `False` in `spleen_example/config/config_train.json`, if you need to use multiple gpus, you have to change it before the `Startup` step."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Shutdown\n",
- "\n",
- "You can shutdown the experiment for the server and/or client(s) by running:\n",
- "\n",
- "`shutdown client` and/or `shutdown server`\n",
- "\n",
- "If you only want to shutdown a specific client, you can specify the client in the command like follows:\n",
- "```\n",
- "shutdown client fl_clients_2\n",
- "```\n",
- "This command will kill the client/server connection, and this command will need input of the admin name for confirmation. If you need to shutdown the server, all active clients need to be shutdown first.\n",
- "\n",
- "After the shutdown commands, you are safe to shutdown `Startup.ipynb` to stop all containers."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Other Commands\n",
- "\n",
- "Please type `?` to learn more about all commands."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Next Steps\n",
- "\n",
- "You have now started the admin client and learnt the commands to control your FL experiment. You're now ready to create your own FL experiment!"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.8.5"
- },
- "stem_cell": {
- "cell_type": "raw",
- "metadata": {
- "pycharm": {
- "metadata": false
- }
- },
- "source": ""
- }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
diff --git a/federated_learning/nvflare/nvflare_example_docker/README.md b/federated_learning/nvflare/nvflare_example_docker/README.md
deleted file mode 100644
index 69a43c619c..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/README.md
+++ /dev/null
@@ -1,53 +0,0 @@
-# Federated Learning with MONAI using NVFLare
-
-## Brief Introduction
-
-This repository contains an end-to-end Federated training example based on MONAI trainers and [NVFlare](https://pypi.org/project/nvflare/).
-
-Inside this folder:
-- All Jupiter notebooks are used to build an FL experiment step-by-step.
-- `demo_figs` is a folder containing all referred example figures in the notebooks.
-- `spleen_example` is the experiment config folder. Some of the experiment related hyperparameters are set in `spleen_example/config/config_train.json`. You
-may need to modify `multi_gpu` and some other parameters. Please check the docstrings in `spleen_example/custom/train_configer.py` for more details.
-- `build_docker_provision.sh` is the script to build the docker image and do provision.
-- `docker_files` is a folder containing all files to build the docker image.
-- `expr_files` is a folder containing all files to be used for the experiment.
-
-Inside `expr_files`:
-
-`project.yml` is the project yml file to describe the FL project, it defines project name, participants, server name and othe settings. You can keep the default settings, but may need to change the `cn` name to the server name as:
-```
-server:
- cn:
-```
-`authz_config.json` is the authorization configuration json file, it defines groups, roles and rights for all users, organizations and sites. If you modified `project.yml`, please change this file to keep the consistency.
-`prerpare_expr_files.sh` is the script to do provision and (optional) download the Decathlon Spleen dataset for this experiment.
-
-
-## Provision Package Preparation
-
-We need to build a docker image for the experiment. It will be based on MONAI's latest docker image in Docker Hub as well as `nvflare` library in PyPI.
-
-Please ensure that you have installed Docker (https://docs.docker.com/engine/install/).
-
-Please run `bash build_docker_provision.sh`.
-
-
-## Build Experiment
-
-Please ensure that you have installed JupyterLab (https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html). The following is an example to install JupyterLab in Python Virtual Environment:
-```
-python3 -m venv venv/fl_startup
-source venv/fl_startup/bin/activate
-pip install --upgrade pip
-pip install wheel
-pip install jupyterlab
-```
-
-Please run the following command:
-
-`jupyter lab --ip 0.0.0.0 --port 8888 --allow-root --no-browser --NotebookApp.token=MONAIFLExample`
-
-The link is: `http://localhost:8888/?token=MONAIFLExample`
-
-Then run `1-Startup.ipynb`. You should follow the steps in the notebook, which will guide you through the process of building an FL experiment based on 2 clients and 1 server.
diff --git a/federated_learning/nvflare/nvflare_example_docker/build_docker_provision.sh b/federated_learning/nvflare/nvflare_example_docker/build_docker_provision.sh
deleted file mode 100644
index 1c9302a687..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/build_docker_provision.sh
+++ /dev/null
@@ -1,26 +0,0 @@
-#!/bin/bash
-
-DOCKER_IMAGE=monai_nvflare
-
-# check if image exist, if not build it
-docker images |grep ${DOCKER_IMAGE}
-dockerNameExist=$?
-if ((${dockerNameExist}==0)) ;then
- echo --- docker image ${DOCKER_Run_Name} exist
-else
- echo --- docker image ${DOCKER_Run_Name} does not exist, building it
- docker build -f docker_files/Dockerfile --tag ${DOCKER_IMAGE} .
- echo ----------- docker image ${DOCKER_Run_Name} built
-fi
-# using the built docker image to provision
-cmd2run="bash expr_files/prepare_expr_files.sh"
-docker run \
- --rm \
- --shm-size 1G \
- -v ${PWD}/:/fl_workspace/ \
- -w /fl_workspace/ \
- -it \
- ${DOCKER_IMAGE} \
- ${cmd2run}
-
-echo ------------------ exited from docker image
diff --git a/federated_learning/nvflare/nvflare_example_docker/demo_figs/enter_client_success.png b/federated_learning/nvflare/nvflare_example_docker/demo_figs/enter_client_success.png
deleted file mode 100644
index 8994afbff2..0000000000
Binary files a/federated_learning/nvflare/nvflare_example_docker/demo_figs/enter_client_success.png and /dev/null differ
diff --git a/federated_learning/nvflare/nvflare_example_docker/demo_figs/enter_server_success.png b/federated_learning/nvflare/nvflare_example_docker/demo_figs/enter_server_success.png
deleted file mode 100644
index 664d26df3f..0000000000
Binary files a/federated_learning/nvflare/nvflare_example_docker/demo_figs/enter_server_success.png and /dev/null differ
diff --git a/federated_learning/nvflare/nvflare_example_docker/demo_figs/successful_regist_clients.png b/federated_learning/nvflare/nvflare_example_docker/demo_figs/successful_regist_clients.png
deleted file mode 100644
index 70333d4272..0000000000
Binary files a/federated_learning/nvflare/nvflare_example_docker/demo_figs/successful_regist_clients.png and /dev/null differ
diff --git a/federated_learning/nvflare/nvflare_example_docker/docker_files/Dockerfile b/federated_learning/nvflare/nvflare_example_docker/docker_files/Dockerfile
deleted file mode 100644
index 61d3f78ed7..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/docker_files/Dockerfile
+++ /dev/null
@@ -1,9 +0,0 @@
-FROM projectmonai/monai:0.7.0
-
-ENV DEBIAN_FRONTEND noninteractive
-
-RUN apt-get -qq update
-RUN apt-get install -qq -y zip
-
-RUN python -m pip install --upgrade pip
-RUN python -m pip install nvflare==1.1
diff --git a/federated_learning/nvflare/nvflare_example_docker/expr_files/download_dataset.py b/federated_learning/nvflare/nvflare_example_docker/expr_files/download_dataset.py
deleted file mode 100644
index 26e34c9ee7..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/expr_files/download_dataset.py
+++ /dev/null
@@ -1,29 +0,0 @@
-import os
-from argparse import ArgumentDefaultsHelpFormatter, ArgumentParser
-
-from monai.apps.utils import download_and_extract
-
-
-def download_spleen_dataset(root_dir: str):
- """
- This function is used to download Spleen dataset for this example.
- If you'd like to download other Decathlon datasets, please check
- ``monai.apps.datasets.DecathlonDataset`` for more details.
- """
- url = "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar"
- md5 = "410d4a301da4e5b2f6f86ec3ddba524e"
- task = "Task09_Spleen"
- dataset_dir = os.path.join(root_dir, task)
- tarfile_name = f"{dataset_dir}.tar"
- download_and_extract(
- url=url, filepath=tarfile_name, output_dir=root_dir, hash_val=md5
- )
-
-
-if __name__ == "__main__":
- parser = ArgumentParser(formatter_class=ArgumentDefaultsHelpFormatter)
- parser.add_argument(
- "-root_dir", type=str, help="the root path to put downloaded file."
- )
- args = parser.parse_args()
- download_spleen_dataset(root_dir=args.root_dir)
diff --git a/federated_learning/nvflare/nvflare_example_docker/expr_files/prepare_expr_files.sh b/federated_learning/nvflare/nvflare_example_docker/expr_files/prepare_expr_files.sh
deleted file mode 100644
index e8bba8b496..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/expr_files/prepare_expr_files.sh
+++ /dev/null
@@ -1,17 +0,0 @@
-# prepare provision files
-# refer to: https://nvidia.github.io/NVFlare/user_guide/provisioning_tool.html
-DEMO_PROVISION_PATH="expr_files"
-NVFL_DOCKER_IMAGE=monai_nvflare:latest provision -n -p $DEMO_PROVISION_PATH/project.yml -o $DEMO_PROVISION_PATH
-cd /fl_workspace/; chown -R 1000:1000 *
-
-# if you do not need to download the spleen dataset, please comment the following lines.
-
-# The docker run command in `build_docker_provision.sh` mounts the path of the
-# current folder into `/fl_workspace`, thus the downloaded Spleen dataset will be
-# in the current folder.
-DATASET_DOWNLOAD_PATH="/fl_workspace/"
-python expr_files/download_dataset.py -root_dir $DATASET_DOWNLOAD_PATH
-
-# prepare modified data list files, if your Spleen dataset path is different, please
-# modify the following line.
-cp spleen_example/config/dataset_part*.json $DATASET_DOWNLOAD_PATH/Task09_Spleen/
diff --git a/federated_learning/nvflare/nvflare_example_docker/expr_files/project.yml b/federated_learning/nvflare/nvflare_example_docker/expr_files/project.yml
deleted file mode 100644
index fb29d5e985..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/expr_files/project.yml
+++ /dev/null
@@ -1,141 +0,0 @@
-api_version: 1
-
-# org is to describe each participant's organization
-# schema change: org is now mandatory
-
-# the name of this project
-name: example_project
-
-# homomorphic encryption
-he_config:
- poly_modulus_degree: 8192
- coeff_mod_bit_sizes: [60, 40, 40]
- scale_bits: 40
- scheme: CKKS
-
-config_folder: config
-
-# Server enforcing role-based rights on admin users. true means all admin users are in role super
-disable_authz: false
-
-server:
- org: nvidia
-
- # set cn to the server's fully qualified domain name
- # never set it to example.com
- cn: localhost
-
- # replace the number with that all clients can reach out to and that the server can open to listen to
- fed_learn_port: 8002
-
- # again replace the number with that all clients can reach out to and that the server can open to listen to
- # the value must be different from fed_learn_port
- admin_port: 8003
-
- # admin_storage is the mmar upload folder name on the server
- admin_storage: transfer
-
- min_num_clients: 1
- max_num_clients: 100
-
- # The configuration validator class path.
- # This line must have ONE indentation. That is, it must be
- # inside server section.
- #
- # Server does not load configuration validator when it's commented out.
- #
- # Users can specifiy their own validator. For example:
- # config_validator:
- # hello.world.BestValdator
- #
- # User can also provide args for their own validator. For example:
- # config_validator:
- # hello.world.BestValidator:
- # arg1: abc
- # arg2: 123
-
- # config_validator:
-
-# The following values under fl_clients and admin_clients are for demo purpose only.
-# Please change them according to the information of actual project.
-fl_clients:
- # client_name must be unique
- # email is optional
- - org: org1
- site: org1-a
- - org: org1
- site: org1-b
- - org: org2
- site: org2
- - org: org3
- site: org3
-
-admin_clients:
- # email is the user name for admin authentication. Hence it must be unique within the project
- - org: nvidia
- email: admin@nvidia.com
- roles:
- - super
- - org: nvidia
- email: researcher@nvidia.com
- roles:
- - lead_it
- - site_researcher
- - org: org1
- email: researcher@org1.com
- roles:
- - site_researcher
- - org: org2
- email: researcher@org2.com
- roles:
- - lead_researcher
- - org: org2
- email: it@org2.com
- roles:
- - lead_it
-
-authz_policy:
- orgs:
- org1:
- - strict
- - general
- org2:
- - relaxed
- - general
- nvidia:
- - general
- - relaxed
- org3:
- - general
- roles:
- super: super user of system
- lead_researcher: lead researcher of the study
- site_researcher: site researcher of the study
- site_it: site IT of the study
- lead_it: lead IT of the study
- groups:
- relaxed:
- desc: org group with relaxed policies
- rules:
- byoc: true
- custom_datalist: true
- strict:
- desc: org group with strict policies
- rules:
- byoc: false
- custom_datalist: false
- general:
- desc: general group user rights
- role_rights:
- lead_researcher:
- train_all: true
- view_all: true
- site_researcher:
- train_self: true
- view_self: true
- lead_it:
- operate_all: true
- view_all: true
- site_it:
- operate_self: true
- view_self: true
diff --git a/federated_learning/nvflare/nvflare_example_docker/spleen_example/config/config_fed_client.json b/federated_learning/nvflare/nvflare_example_docker/spleen_example/config/config_fed_client.json
deleted file mode 100644
index ed09a110a3..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/spleen_example/config/config_fed_client.json
+++ /dev/null
@@ -1,17 +0,0 @@
-{
- "format_version": 1,
-
- "client": {
- "outbound_filters": [
- ],
- "inbound_filters": [
- ]
- },
- "client_trainer": {
- "path": "monai_trainer.MONAITrainer",
- "args": {
- "aggregation_epochs": 5,
- "aggregation_iters": 0
- }
- }
-}
diff --git a/federated_learning/nvflare/nvflare_example_docker/spleen_example/config/config_fed_server.json b/federated_learning/nvflare/nvflare_example_docker/spleen_example/config/config_fed_server.json
deleted file mode 100644
index 3853030b77..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/spleen_example/config/config_fed_server.json
+++ /dev/null
@@ -1,53 +0,0 @@
-{
- "format_version": 1,
-
- "servers": [
- {
- "min_num_clients": 1,
- "max_num_clients": 100,
- "wait_after_min_clients": 10,
- "heart_beat_timeout": 600,
- "start_round": 0,
- "num_rounds": 200
- }
- ],
- "aggregator":
- {
- "name": "AccumulateWeightedAggregator",
- "args": {
- "exclude_vars": "dummy",
- "aggregation_weights":
- {
- "client0": 1,
- "client1": 1.5,
- "client2": 0.8
- }
- }
- },
- "outbound_filters": [
- ],
- "inbound_filters": [
- ],
- "model_persistor":
- {
- "name": "PTFileModelPersistor",
- "args": {
- "exclude_vars": "dummy",
- "model": {
- "path": "monai.networks.nets.unet.UNet",
- "args": {
- "dimensions": 3,
- "in_channels": 1,
- "out_channels": 2,
- "channels": [16, 32, 64, 128, 256],
- "strides": [2, 2, 2, 2],
- "num_res_units": 2,
- "norm": "batch"
- }
- }
- }
- },
- "shareable_generator": {
- "name": "FullModelShareableGenerator"
- }
-}
diff --git a/federated_learning/nvflare/nvflare_example_docker/spleen_example/custom/monai_trainer.py b/federated_learning/nvflare/nvflare_example_docker/spleen_example/custom/monai_trainer.py
deleted file mode 100644
index 3149c9ecf0..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/spleen_example/custom/monai_trainer.py
+++ /dev/null
@@ -1,183 +0,0 @@
-# Copyright 2020 MONAI Consortium
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-# http://www.apache.org/licenses/LICENSE-2.0
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import logging
-
-import torch.distributed as dist
-from nvflare.apis.event_type import EventType
-from nvflare.apis.fl_constant import FLConstants, ShareableKey
-from nvflare.apis.fl_context import FLContext
-from nvflare.apis.shareable import Shareable
-from nvflare.apis.trainer import Trainer
-from nvflare.common.signal import Signal
-from nvflare.utils.fed_utils import generate_failure
-
-from train_configer import TrainConfiger
-from utils import (
- IterAggregateHandler,
- MONAIModelManager,
- TrainContext,
- get_lr_values,
- set_engine_state,
-)
-
-
-class MONAITrainer(Trainer):
- """
- This class implements a MONAI based trainer that can be used for Federated Learning.
-
- Args:
- aggregation_epochs: the number of training epochs for a round.
- This parameter only works when `aggregation_iters` is 0. Defaults to 1.
- aggregation_iters: the number of training iterations for a round.
- If the value is larger than 0, the trainer will use iteration based aggregation
- rather than epoch based aggregation. Defaults to 0.
-
- """
-
- def __init__(self, aggregation_epochs: int = 1, aggregation_iters: int = 0):
- super().__init__()
- self.aggregation_epochs = aggregation_epochs
- self.aggregation_iters = aggregation_iters
- self.model_manager = MONAIModelManager()
- self.logger = logging.getLogger(self.__class__.__name__)
-
- def _initialize_trainer(self, fl_ctx: FLContext):
- """
- The trainer's initialization function. At the beginning of a FL experiment,
- the train and evaluate engines, as well as train context and FL context
- should be initialized.
- """
- # Initialize train and evaluation engines.
- config_root = fl_ctx.get_prop(FLConstants.TRAIN_ROOT)
- fl_args = fl_ctx.get_prop(FLConstants.ARGS)
-
- conf = TrainConfiger(
- config_root=config_root,
- wf_config_file_name=fl_args.train_config,
- local_rank=fl_args.local_rank,
- )
- conf.configure()
-
- self.train_engine = conf.train_engine
- self.eval_engine = conf.eval_engine
- self.multi_gpu = conf.multi_gpu
-
- # for iterations based aggregation, the train engine should attach
- # the following handler.
- if self.aggregation_iters > 0:
- IterAggregateHandler(interval=self.aggregation_iters).attach(
- self.train_engine
- )
-
- # Instantiate a train context class. This instance is used to
- # save training related information such as current epochs, iterations
- # and the learning rate.
- self.train_ctx = TrainContext()
- self.train_ctx.initial_learning_rate = get_lr_values(
- self.train_engine.optimizer
- )
-
- # Initialize the FL context.
- fl_ctx.set_prop(FLConstants.MY_RANK, self.train_engine.state.rank)
- fl_ctx.set_prop(FLConstants.MODEL_NETWORK, self.train_engine.network)
- fl_ctx.set_prop(FLConstants.MULTI_GPU, self.multi_gpu)
- fl_ctx.set_prop(FLConstants.DEVICE, self.train_engine.state.device)
-
- def handle_event(self, event_type: str, fl_ctx: FLContext):
- """
- This function is an extended function from the super class.
- It is used to perform the handler process based on the
- event_type. At the start point of a FL experiment, necessary
- components should be initialized. At the end of the experiment,
- the running engines should be terminated.
-
- Args:
- event_type: the type of event that will be fired. In MONAITrainer,
- only `START_RUN` and `END_RUN` need to be handled.
- fl_ctx: an `FLContext` object.
-
- """
- if event_type == EventType.START_RUN:
- self._initialize_trainer(fl_ctx)
- elif event_type == EventType.END_RUN:
- try:
- self.train_engine.terminate()
- self.eval_engine.terminate()
- except BaseException as e:
- self.logger.info(f"exception in closing engines {e}")
-
- def train(
- self, shareable: Shareable, fl_ctx: FLContext, abort_signal: Signal
- ) -> Shareable:
- """
- This function is an extended function from the super class.
- As a supervised learning based trainer, the train function will run
- evaluate and train engines based on model weights from `shareable`.
- After fininshing training, a new `Shareable` object will be submitted
- to server for aggregation.
-
- Args:
- shareable: the `Shareable` object acheived from server.
- fl_ctx: the `FLContext` object achieved from server.
- abort_signal: if triggered, the training will be aborted.
-
- Returns:
- a new `Shareable` object to be submitted to server for aggregation.
- """
- # check abort signal
- self.logger.info(f"MonaiTrainer abort signal: {abort_signal.triggered}")
- if abort_signal.triggered:
- self.finalize(fl_ctx)
- shareable = generate_failure(fl_ctx=fl_ctx, reason="abort signal triggered")
- return shareable
- # achieve model weights
- if self.train_engine.state.rank == 0:
- model_weights = shareable[ShareableKey.MODEL_WEIGHTS]
- # load achieved model weights for the network (saved in fl_ctx)
- self.model_manager.assign_current_model(model_weights, fl_ctx)
- # for multi-gpu training, only rank 0 process will achieve the model weights.
- # Thus, it should be broadcasted to all processes.
- if self.multi_gpu:
- net = fl_ctx.get_prop(FLConstants.MODEL_NETWORK)
- for _, v in net.state_dict().items():
- dist.broadcast(v, src=0)
-
- # set engine state parameters, like number of training epochs/iterations.
- self.train_engine = set_engine_state(
- self.train_engine, self.aggregation_epochs, self.aggregation_iters
- )
- # get current epoch and iteration when a round starts
- self.train_ctx.epoch_of_start_time = self.train_engine.state.epoch
- self.train_ctx.iter_of_start_time = self.train_engine.state.iteration
- # execute validation at the beginning of every round
- self.eval_engine.run(self.train_engine.state.epoch + 1)
- self.train_ctx.fl_init_validation_metric = self.eval_engine.state.metrics.get(
- self.eval_engine.state.key_metric_name, -1
- )
- self.train_engine.run()
- # calculate current iteration and epoch data after training
- self.train_ctx.current_iters = (
- self.train_engine.state.iteration - self.train_ctx.iter_of_start_time
- )
- self.train_ctx.current_executed_epochs = (
- self.train_engine.state.epoch - self.train_ctx.epoch_of_start_time
- )
- # create a new `Shareable` object
- if self.train_engine.state.rank == 0:
- self.train_ctx.set_context(self.train_engine, self.eval_engine)
- shareable = self.model_manager.generate_shareable(
- self.train_ctx,
- fl_ctx,
- )
- # update train context into FL context.
- fl_ctx.set_prop(FLConstants.TRAIN_CONTEXT, self.train_ctx)
- return shareable
diff --git a/federated_learning/nvflare/nvflare_example_docker/spleen_example/custom/train_configer.py b/federated_learning/nvflare/nvflare_example_docker/spleen_example/custom/train_configer.py
deleted file mode 100644
index 1a50ec9af6..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/spleen_example/custom/train_configer.py
+++ /dev/null
@@ -1,301 +0,0 @@
-# Copyright 2020 MONAI Consortium
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-# http://www.apache.org/licenses/LICENSE-2.0
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import json
-import logging
-import os
-
-import torch
-import torch.distributed as dist
-from monai.data import (
- CacheDataset,
- DataLoader,
- load_decathlon_datalist,
- partition_dataset,
-)
-from monai.engines import SupervisedEvaluator, SupervisedTrainer
-from monai.handlers import (
- CheckpointSaver,
- LrScheduleHandler,
- MeanDice,
- StatsHandler,
- TensorBoardStatsHandler,
- ValidationHandler,
-)
-from monai.inferers import SimpleInferer, SlidingWindowInferer
-from monai.losses import DiceLoss
-from monai.networks.layers import Norm
-from monai.networks.nets import UNet
-from monai.transforms import (
- Activationsd,
- AsDiscreted,
- Compose,
- CropForegroundd,
- EnsureChannelFirstd,
- LoadImaged,
- Orientationd,
- RandCropByPosNegLabeld,
- ScaleIntensityRanged,
- Spacingd,
- ToTensord,
-)
-from torch.nn.parallel import DistributedDataParallel
-from monai.handlers import from_engine
-
-
-class TrainConfiger:
- """
- This class is used to config the necessary components of train and evaluate engines
- for MONAI trainer.
- Please check the implementation of `SupervisedEvaluator` and `SupervisedTrainer`
- from `monai.engines` and determine which components can be used.
- Args:
- config_root: root folder path of config files.
- wf_config_file_name: json file name of the workflow config file.
- """
-
- def __init__(
- self,
- config_root: str,
- wf_config_file_name: str,
- local_rank: int = 0,
- ):
- with open(os.path.join(config_root, wf_config_file_name)) as file:
- wf_config = json.load(file)
-
- self.wf_config = wf_config
- """
- config Args:
- max_epochs: the total epoch number for trainer to run.
- learning_rate: the learning rate for optimizer.
- data_list_base_dir: the directory containing the data list json file.
- data_list_json_file: the data list json file.
- val_interval: the interval (number of epochs) to do validation.
- ckpt_dir: the directory to save the checkpoint.
- amp: whether to enable auto-mixed-precision training.
- use_gpu: whether to use GPU in training.
- multi_gpu: whether to use multiple GPUs for distributed training.
- """
- self.max_epochs = wf_config["max_epochs"]
- self.learning_rate = wf_config["learning_rate"]
- self.data_list_base_dir = wf_config["data_list_base_dir"]
- self.data_list_json_file = wf_config["data_list_json_file"]
- self.val_interval = wf_config["val_interval"]
- self.ckpt_dir = wf_config["ckpt_dir"]
- self.amp = wf_config["amp"]
- self.use_gpu = wf_config["use_gpu"]
- self.multi_gpu = wf_config["multi_gpu"]
- self.local_rank = local_rank
-
- def set_device(self):
- if self.multi_gpu:
- # initialize distributed training
- dist.init_process_group(backend="nccl", init_method="env://")
- device = torch.device(f"cuda:{self.local_rank}")
- torch.cuda.set_device(device)
- else:
- device = torch.device("cuda" if self.use_gpu else "cpu")
- self.device = device
-
- def configure(self):
- self.set_device()
- network = UNet(
- dimensions=3,
- in_channels=1,
- out_channels=2,
- channels=(16, 32, 64, 128, 256),
- strides=(2, 2, 2, 2),
- num_res_units=2,
- norm=Norm.BATCH,
- ).to(self.device)
- if self.multi_gpu:
- network = DistributedDataParallel(
- module=network,
- device_ids=[self.device],
- find_unused_parameters=False,
- )
-
- train_transforms = Compose(
- [
- LoadImaged(keys=("image", "label")),
- EnsureChannelFirstd(keys=("image", "label")),
- Spacingd(
- keys=["image", "label"],
- pixdim=(1.5, 1.5, 2.0),
- mode=("bilinear", "nearest"),
- ),
- Orientationd(keys=["image", "label"], axcodes="RAS"),
- ScaleIntensityRanged(
- keys="image",
- a_min=-57,
- a_max=164,
- b_min=0.0,
- b_max=1.0,
- clip=True,
- ),
- CropForegroundd(keys=("image", "label"), source_key="image"),
- RandCropByPosNegLabeld(
- keys=("image", "label"),
- label_key="label",
- spatial_size=(64, 64, 64),
- pos=1,
- neg=1,
- num_samples=4,
- image_key="image",
- image_threshold=0,
- ),
- ToTensord(keys=("image", "label")),
- ]
- )
- # set datalist
- train_datalist = load_decathlon_datalist(
- os.path.join(self.data_list_base_dir, self.data_list_json_file),
- is_segmentation=True,
- data_list_key="training",
- base_dir=self.data_list_base_dir,
- )
- val_datalist = load_decathlon_datalist(
- os.path.join(self.data_list_base_dir, self.data_list_json_file),
- is_segmentation=True,
- data_list_key="validation",
- base_dir=self.data_list_base_dir,
- )
- if self.multi_gpu:
- train_datalist = partition_dataset(
- data=train_datalist,
- shuffle=True,
- num_partitions=dist.get_world_size(),
- even_divisible=True,
- )[dist.get_rank()]
- train_ds = CacheDataset(
- data=train_datalist,
- transform=train_transforms,
- cache_rate=1.0,
- num_workers=4,
- )
- train_data_loader = DataLoader(
- train_ds,
- batch_size=2,
- shuffle=True,
- num_workers=4,
- )
- val_transforms = Compose(
- [
- LoadImaged(keys=("image", "label")),
- EnsureChannelFirstd(keys=("image", "label")),
- Spacingd(
- keys=["image", "label"],
- pixdim=(1.5, 1.5, 2.0),
- mode=("bilinear", "nearest"),
- ),
- Orientationd(keys=["image", "label"], axcodes="RAS"),
- ScaleIntensityRanged(
- keys="image",
- a_min=-57,
- a_max=164,
- b_min=0.0,
- b_max=1.0,
- clip=True,
- ),
- CropForegroundd(keys=("image", "label"), source_key="image"),
- ToTensord(keys=("image", "label")),
- ]
- )
-
- val_ds = CacheDataset(
- data=val_datalist, transform=val_transforms, cache_rate=0.0, num_workers=4
- )
- val_data_loader = DataLoader(
- val_ds,
- batch_size=1,
- shuffle=False,
- num_workers=4,
- )
- post_transform = Compose(
- [
- Activationsd(keys="pred", softmax=True),
- AsDiscreted(
- keys=["pred", "label"],
- argmax=[True, False],
- to_onehot=True,
- num_classes=2,
- ),
- ]
- )
- # metric
- key_val_metric = {
- "val_mean_dice": MeanDice(
- include_background=False,
- output_transform=from_engine(["pred", "label"]),
- #device=self.device,
- )
- }
- val_handlers = [
- StatsHandler(output_transform=lambda x: None),
- CheckpointSaver(
- save_dir=self.ckpt_dir,
- save_dict={"model": network},
- save_key_metric=True,
- ),
- TensorBoardStatsHandler(
- log_dir=self.ckpt_dir, output_transform=lambda x: None
- ),
- ]
- self.eval_engine = SupervisedEvaluator(
- device=self.device,
- val_data_loader=val_data_loader,
- network=network,
- inferer=SlidingWindowInferer(
- roi_size=[160, 160, 160],
- sw_batch_size=4,
- overlap=0.5,
- ),
- postprocessing=post_transform,
- key_val_metric=key_val_metric,
- val_handlers=val_handlers,
- amp=self.amp,
- )
-
- optimizer = torch.optim.Adam(network.parameters(), self.learning_rate)
- loss_function = DiceLoss(to_onehot_y=True, softmax=True)
- lr_scheduler = torch.optim.lr_scheduler.StepLR(
- optimizer, step_size=5000, gamma=0.1
- )
- train_handlers = [
- LrScheduleHandler(lr_scheduler=lr_scheduler, print_lr=True),
- ValidationHandler(
- validator=self.eval_engine, interval=self.val_interval, epoch_level=True
- ),
- StatsHandler(tag_name="train_loss", output_transform=from_engine("loss", first=True)),
- TensorBoardStatsHandler(
- log_dir=self.ckpt_dir,
- tag_name="train_loss",
- output_transform=from_engine("loss", first=True),
- ),
- ]
-
- self.train_engine = SupervisedTrainer(
- device=self.device,
- max_epochs=self.max_epochs,
- train_data_loader=train_data_loader,
- network=network,
- optimizer=optimizer,
- loss_function=loss_function,
- inferer=SimpleInferer(),
- postprocessing=post_transform,
- key_train_metric=None,
- train_handlers=train_handlers,
- amp=self.amp,
- )
-
- if self.local_rank > 0:
- self.train_engine.logger.setLevel(logging.WARNING)
- self.eval_engine.logger.setLevel(logging.WARNING)
diff --git a/federated_learning/nvflare/nvflare_example_docker/spleen_example/custom/utils.py b/federated_learning/nvflare/nvflare_example_docker/spleen_example/custom/utils.py
deleted file mode 100644
index d0c1252a19..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/spleen_example/custom/utils.py
+++ /dev/null
@@ -1,187 +0,0 @@
-# Copyright 2020 MONAI Consortium
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-# http://www.apache.org/licenses/LICENSE-2.0
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import logging
-import math
-from typing import Dict
-
-import numpy as np
-import torch
-from ignite.engine import Engine, Events
-from nvflare.apis.fl_constant import FLConstants, ShareableKey, ShareableValue
-from nvflare.apis.fl_context import FLContext
-from nvflare.apis.shareable import Shareable
-from torch.optim import Optimizer
-
-
-class TrainContext:
- """
- Train Context class contains training related parameters/variables,
- such as learning rate, number of gpus and current training iterations.
- """
-
- def __init__(self):
- self.initial_learning_rate = 0
- self.current_learning_rate = 0
- self.current_iters = 0
- self.current_executed_epochs = 0
- self.fl_init_validation_metric = 0
- self.epoch_of_start_time = 0
- self.iter_of_start_time = 0
-
- def set_context(self, train_engine: Engine, eval_engine: Engine):
- """
- This function is usually called after train engine has finished running.
- The variables that updated here will add to the shareable object and then
- submit to server. You can add other variables in this function if they are
- needed to be shared.
- """
- self.current_learning_rate = get_lr_values(train_engine.optimizer)
-
-
-class MONAIModelManager:
- def __init__(self):
- self.logger = logging.getLogger("ModelShareableManager")
-
- def assign_current_model(
- self, model_weights: Dict[str, np.ndarray], fl_ctx: FLContext
- ):
- """
- This function is used to load provided weights for the network saved
- in FL context.
- Before loading weights, tensors might need to be reshaped to support HE for secure aggregation.
- More info of HE:
- https://github.com/NVIDIA/clara-train-examples/blob/master/PyTorch/NoteBooks/FL/Homomorphic_Encryption.ipynb
-
- """
- net = fl_ctx.get_prop(FLConstants.MODEL_NETWORK)
- if fl_ctx.get_prop(FLConstants.MULTI_GPU):
- net = net.module
-
- local_var_dict = net.state_dict()
- model_keys = model_weights.keys()
- for var_name in local_var_dict:
- if var_name in model_keys:
- weights = model_weights[var_name]
- try:
- local_var_dict[var_name] = torch.as_tensor(np.reshape(weights, local_var_dict[var_name].shape))
- except Exception as e:
- raise ValueError(
- "Convert weight from {} failed with error: {}".format(
- var_name, str(e)
- )
- )
-
- net.load_state_dict(local_var_dict)
-
- def extract_model(self, fl_ctx: FLContext) -> Dict[str, np.ndarray]:
- """
- This function is used to extract weights of the network saved in FL
- context.
- The extracted weights will be converted into a numpy array based dict.
- """
- net = fl_ctx.get_prop(FLConstants.MODEL_NETWORK)
- if fl_ctx.get_prop(FLConstants.MULTI_GPU):
- net = net.module
- local_state_dict = net.state_dict()
- local_model_dict = {}
- for var_name in local_state_dict:
- try:
- local_model_dict[var_name] = local_state_dict[var_name].cpu().numpy()
- except Exception as e:
- raise ValueError(
- "Convert weight from {} failed with error: {}".format(
- var_name, str(e)
- )
- )
-
- return local_model_dict
-
- def generate_shareable(self, train_ctx: TrainContext, fl_ctx: FLContext):
- """
- This function is used to generate a shareable instance according to
- the train context and FL context.
- A Shareable instance can not only contain model weights, but also
- some additional information that clients want to share. These information
- should be added into ShareableKey.META.
- """
-
- # input the initial metric into meta data. You can also add other parameters.
- meta_data = {}
- meta_data[FLConstants.INITIAL_METRICS] = train_ctx.fl_init_validation_metric
- meta_data[FLConstants.CURRENT_LEARNING_RATE] = train_ctx.current_learning_rate
- meta_data[FLConstants.NUM_STEPS_CURRENT_ROUND] = train_ctx.current_iters
-
- shareable = Shareable()
- shareable[ShareableKey.TYPE] = ShareableValue.TYPE_WEIGHT_DIFF
- shareable[ShareableKey.DATA_TYPE] = ShareableValue.DATA_TYPE_UNENCRYPTED
- shareable[ShareableKey.MODEL_WEIGHTS] = self.extract_model(fl_ctx)
- shareable[ShareableKey.META] = meta_data
-
- return shareable
-
-
-class IterAggregateHandler:
- """
- This class implements an event handler for iteration based aggregation.
- """
-
- def __init__(self, interval: int):
- self.interval = interval
-
- def attach(self, engine: Engine):
- engine.add_event_handler(Events.ITERATION_COMPLETED(every=self.interval), self)
-
- def __call__(self, engine: Engine):
- engine.terminate()
- # save current iteration for next round
- engine.state.dataloader_iter = engine._dataloader_iter
- if engine.state.iteration % engine.state.epoch_length == 0:
- # if current iteration is end of 1 epoch, manually trigger epoch completed event
- engine._fire_event(Events.EPOCH_COMPLETED)
-
-
-def get_lr_values(optimizer: Optimizer):
- """
- This function is used to get the learning rates of the optimizer.
- """
- return [group["lr"] for group in optimizer.state_dict()["param_groups"]]
-
-
-def set_engine_state(engine: Engine, aggregation_epochs: int, aggregation_iters: int):
- """
- This function is used to set the engine's state parameters according to
- the aggregation ways (iteration based or epoch based).
-
- Args:
- engine: the engine that to be processed.
- aggregation_epochs: the number of epochs before aggregation.
- This parameter only works when `aggregation_iters` is 0.
- aggregation_iters: the number of iterations before aggregation.
- If the value is larger than 0, the engine will use iteration based aggregation
- rather than epoch based aggregation.
-
- """
- if aggregation_iters > 0:
- next_aggr_iter = engine.state.iteration + aggregation_iters
- engine.state.max_epochs = math.ceil(next_aggr_iter / engine.state.epoch_length)
- previous_iter = engine.state.iteration % engine.state.epoch_length
- if engine.state.iteration > 0 and previous_iter != 0:
- # init to continue from previous epoch
- engine.state.epoch -= 1
- if hasattr(engine.state, "dataloader_iter"):
- # initialize to continue from previous iteration
- engine._init_iter.append(previous_iter)
- engine._dataloader_iter = engine.state.dataloader_iter
- else:
- engine.state.max_epochs = engine.state.epoch + aggregation_epochs
-
- return engine
diff --git a/federated_learning/nvflare/nvflare_example_docker/spleen_example/resources/log.config b/federated_learning/nvflare/nvflare_example_docker/spleen_example/resources/log.config
deleted file mode 100644
index 6b9761569b..0000000000
--- a/federated_learning/nvflare/nvflare_example_docker/spleen_example/resources/log.config
+++ /dev/null
@@ -1,27 +0,0 @@
-[loggers]
-keys=root,modelLogger
-
-[handlers]
-keys=consoleHandler
-
-[formatters]
-keys=fullFormatter
-
-[logger_root]
-level=INFO
-handlers=consoleHandler
-
-[logger_modelLogger]
-level=DEBUG
-handlers=consoleHandler
-qualname=modelLogger
-propagate=0
-
-[handler_consoleHandler]
-class=StreamHandler
-level=DEBUG
-formatter=fullFormatter
-args=(sys.stdout,)
-
-[formatter_fullFormatter]
-format=%(asctime)s - %(name)s - %(levelname)s - %(message)s
diff --git a/federated_learning/nvflare/nvflare_spleen_example/1-Server.ipynb b/federated_learning/nvflare/nvflare_spleen_example/1-Server.ipynb
new file mode 100644
index 0000000000..b280e88972
--- /dev/null
+++ b/federated_learning/nvflare/nvflare_spleen_example/1-Server.ipynb
@@ -0,0 +1,192 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "pycharm": {
+ "metadata": false
+ }
+ },
+ "source": [
+ "# FL Server Joining FL experiment\n",
+ "\n",
+ "The purpose of this notebook is to show how to start a server to participate in an FL experiment."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "from IPython.display import HTML"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Check Working Folder\n",
+ "\n",
+ "Before starting, let's check if the necessary folder is created:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "a working folder for the server exists!!!\n"
+ ]
+ }
+ ],
+ "source": [
+ "server_startup_path = \"poc/server/startup\"\n",
+ "\n",
+ "if os.path.exists(server_startup_path):\n",
+ " print(\"a working folder for the server exists!!!\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Then, let's check the catalogue:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "['log.config', 'fed_server.json', 'sub_start.sh', 'start.sh', 'stop_fl.sh']"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "os.listdir(server_startup_path)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "As we can see, a script named `start.sh` is in the path, to start a server, we only need to run this script."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Start Server"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Please open a new terminal (please run the following cell and click the link):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ " Open a new terminal"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# You can click the following link, or manually open a new terminal.\n",
+ "HTML(' Open a new terminal')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In the terminal, please run the following command:\n",
+ "```\n",
+ "source nvflare-env/bin/activate\n",
+ "bash poc/server/startup/start.sh\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### (Optional) Close Server\n",
+ "\n",
+ "Except using admin, you can close the started server directly via run the following command:\n",
+ "\n",
+ "```\n",
+ "bash poc/server/startup/stop_fl.sh\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Next Steps\n",
+ "\n",
+ "You have now started the server container.\n",
+ "In the next notebook, [Client Startup Notebook](2-Client.ipynb), you'll start two clients participating in the FL experiment."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.10"
+ },
+ "stem_cell": {
+ "cell_type": "raw",
+ "metadata": {
+ "pycharm": {
+ "metadata": false
+ }
+ },
+ "source": ""
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/federated_learning/nvflare/nvflare_spleen_example/2-Client.ipynb b/federated_learning/nvflare/nvflare_spleen_example/2-Client.ipynb
new file mode 100644
index 0000000000..a5813c7682
--- /dev/null
+++ b/federated_learning/nvflare/nvflare_spleen_example/2-Client.ipynb
@@ -0,0 +1,265 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "pycharm": {
+ "metadata": false
+ }
+ },
+ "source": [
+ "# FL client Joining FL experiment \n",
+ "\n",
+ "The purpose of this notebook is to show how to start clients to participate in an FL experiment."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "pycharm": {
+ "metadata": false
+ }
+ },
+ "source": [
+ "## Prerequisites\n",
+ "- A server has been started."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "from IPython.display import HTML"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Check Working Folder\n",
+ "\n",
+ "Before starting, let's check if the necessary folder is created:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "a working folder for the client1 exists!!!\n",
+ "a working folder for the client2 exists!!!\n"
+ ]
+ }
+ ],
+ "source": [
+ "client1_name = \"site-1\"\n",
+ "client2_name = \"site-2\"\n",
+ "\n",
+ "client1_startup_path = \"poc/{}/startup\".format(client1_name)\n",
+ "client2_startup_path = \"poc/{}/startup\".format(client2_name)\n",
+ "\n",
+ "if os.path.exists(client1_startup_path):\n",
+ " print(\"a working folder for the client1 exists!!!\")\n",
+ "if os.path.exists(client2_startup_path):\n",
+ " print(\"a working folder for the client2 exists!!!\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Then, let's check the catalogue:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "['log.config', 'fed_client.json', 'sub_start.sh', 'start.sh', 'stop_fl.sh']"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "os.listdir(client1_startup_path)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "['log.config', 'fed_client.json', 'sub_start.sh', 'start.sh', 'stop_fl.sh']"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "os.listdir(client2_startup_path)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "As we can see, a script named `start.sh` is in the path, to start a client, we only need to run this script."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "### (Optional) Assign GPU"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you'd like to assigin GPU(s) for a client, please modify the script first. You may need to uncomment and edit the following line:\n",
+ "\n",
+ "`# export CUDA_VISIBLE_DEVICES=`\n",
+ "\n",
+ "[Please click here to edit for client 1](poc/site-1/startup/start.sh)\n",
+ "\n",
+ "[Please click here to edit for client 2](poc/site-2/startup/start.sh)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Start Clients"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Please open a new terminal (please run the following cell and click the link):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ " Open a new terminal"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# You can click the following link, or manually open a new terminal.\n",
+ "HTML(' Open a new terminal')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In the terminal, please run the following command to start client `site-1` and `site-2` (two terminals are needed):"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "```\n",
+ "source nvflare-env/bin/activate\n",
+ "bash poc/site-1/startup/start.sh localhost site-1\n",
+ "```\n",
+ "and\n",
+ "```\n",
+ "source nvflare-env/bin/activate\n",
+ "bash poc/site-2/startup/start.sh localhost site-2\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### (Optional) Close Client\n",
+ "\n",
+ "Except using admin, you can close the started client(s) directly via run the following command:\n",
+ "\n",
+ "```\n",
+ "bash poc/site-1/startup/stop_fl.sh\n",
+ "bash poc/site-2/startup/stop_fl.sh\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Next Steps\n",
+ "\n",
+ "You have now started two client containers.\n",
+ "In the next notebook, [Admin Startup Notebook](3-Admin.ipynb), you'll start an admin participating in the FL experiment."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.10"
+ },
+ "stem_cell": {
+ "cell_type": "raw",
+ "metadata": {
+ "pycharm": {
+ "metadata": false
+ }
+ },
+ "source": ""
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/federated_learning/nvflare/nvflare_spleen_example/3-Admin.ipynb b/federated_learning/nvflare/nvflare_spleen_example/3-Admin.ipynb
new file mode 100644
index 0000000000..342a71b8f2
--- /dev/null
+++ b/federated_learning/nvflare/nvflare_spleen_example/3-Admin.ipynb
@@ -0,0 +1,380 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "pycharm": {
+ "metadata": false
+ }
+ },
+ "source": [
+ "# Admin Startup\n",
+ "\n",
+ "The purpose of this notebook is to show how to start an admin to operate an FL experiment with a server and at least one client started."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "pycharm": {
+ "metadata": false
+ }
+ },
+ "source": [
+ "## Prerequisites\n",
+ "- A server and at least one client has been started."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "from IPython.display import HTML"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Check Working Folder\n",
+ "\n",
+ "Before starting, let's check if the necessary folders are created. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "a working folder for the admin exists!!!\n"
+ ]
+ }
+ ],
+ "source": [
+ "admin_path = \"poc/admin/startup/\"\n",
+ "\n",
+ "if os.path.exists(admin_path):\n",
+ " print(\"a working folder for the admin exists!!!\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We also need to put necessary files (`hello-monai` for example) into admin's `transfer` folder (create first):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!mkdir -p poc/admin/transfer\n",
+ "!cp -r hello_monai/ poc/admin/transfer/"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "['hello_monai']"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "os.listdir(\"poc/admin/transfer/\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "['fl_admin.sh']"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "os.listdir(admin_path)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "As we can see, `hello-monai` is in `poc/admin/transfer`, and a script named `fl_admin.sh` is in `poc/admin/startup`, to start an admin, we only need to run this script."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Start Admin"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Please open a new terminal (please run the following cell and click the link):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ " Open a new terminal"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "HTML(' Open a new terminal')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In the terminal, please run the following command:\n",
+ "\n",
+ "```\n",
+ "source nvflare-env/bin/activate\n",
+ "bash poc/admin/startup/fl_admin.sh localhost\n",
+ "```\n",
+ "Then ,log in by entering `admin` for both the username and password.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Prepare for the experiment\n",
+ "\n",
+ "You need to execute the following steps to prepare for the experiment:\n",
+ "\n",
+ "- upload pipeline config folder\n",
+ "- set FL training number\n",
+ "- deploy the folder to client(s) and server\n",
+ "\n",
+ "The commands can be:\n",
+ "```\n",
+ "upload_app hello_monai\n",
+ "set_run_number 1\n",
+ "deploy_app hello_monai server\n",
+ "deploy_app hello_monai client\n",
+ "```\n",
+ "\n",
+ "Now, let's check if the folder has been distributed into the server and all client(s):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "config files on server: ['app_server', 'fl_app.txt']\n",
+ " \n",
+ "config files on site-1: ['app_site-1', 'fl_app.txt']\n",
+ " \n",
+ "config files on site-2: ['fl_app.txt', 'app_site-2']\n",
+ " \n"
+ ]
+ }
+ ],
+ "source": [
+ "run_file = \"run_1\"\n",
+ "\n",
+ "poc_path = \"poc/\"\n",
+ "\n",
+ "for name in [\"server\", \"site-1\", \"site-2\"]:\n",
+ " path = os.path.join(poc_path, name, run_file)\n",
+ " if os.path.exists(path):\n",
+ " print(\"config files on {}: {}\".\n",
+ " format(name, os.listdir(path)))\n",
+ " print(\" \")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This example prepares two different data list files: `dataset_part1.json` and `dataset_part2.json`, and they have the same validation set and totally different training set. The default file used in `config_train.json` is `config/dataset_part1.json`. Therefore, if you want to let two clients train on different data, you can switch to use `dataset_part2.json` for `org1-b`.\n",
+ "\n",
+ "[Link to site-1 config](poc/site-1/run_1/app_site-1/config/config_train.json)\n",
+ "\n",
+ "[Link to site-2 config](poc/site-2/run_1/app_site-2/config/config_train.json)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### (Optional) Copy Dataset\n",
+ "\n",
+ "After starting a client (for example `site-1`), the Spleen dataset will be downloaded into:\n",
+ "`run_1/app_site-1/Task09_Spleen.tar`.\n",
+ "\n",
+ "To prevent repeatedly downloading the dataset, you can copy the uncompressed `Task09_Spleen` into the corresponding place after running the `deploy_app` command.\n",
+ "For example:\n",
+ "\n",
+ "```\n",
+ "cp -r /path-to-dataset/Task09_Spleen poc/site-1/run_2/app_site-1/\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Start Training"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now, you can start training with:\n",
+ "```\n",
+ "start_app all\n",
+ "```\n",
+ "or can also start server and clients separately:\n",
+ "```\n",
+ "start_app server\n",
+ "```\n",
+ "\n",
+ "```\n",
+ "start_app client site-1\n",
+ "```\n",
+ "\n",
+ "You can check the status by running:\n",
+ "```\n",
+ "check_status server\n",
+ "check_status client\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Abort Training\n",
+ "\n",
+ "You can abort training for the server and/or client(s) by running:\n",
+ "```\n",
+ "abort client\n",
+ "abort server\n",
+ "```\n",
+ "If you only want to abort client `site-2`, you can use:\n",
+ "```\n",
+ "abort client site-2\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Shutdown\n",
+ "\n",
+ "You can close the server or client(s) by running:\n",
+ "\n",
+ "`shutdown client` or `shutdown server`\n",
+ "\n",
+ "or type `shutdown all` to close them all.\n",
+ "\n",
+ "If you only want to close one client, you can specify the client in the command like follows:\n",
+ "```\n",
+ "shutdown client site-1\n",
+ "```\n",
+ "\n",
+ "This command will kill the client/server connection, and this command will need input of the admin name for confirmation."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Other Commands\n",
+ "\n",
+ "Please type `?` to learn more about all commands, or you can refer to [the official guide](https://nvidia.github.io/NVFlare/user_guide/admin_commands.html) for more details."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Next Steps\n",
+ "\n",
+ "You have now started the admin client and learnt the commands to control your FL experiment. You're now ready to create your own FL experiment!"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.10"
+ },
+ "stem_cell": {
+ "cell_type": "raw",
+ "metadata": {
+ "pycharm": {
+ "metadata": false
+ }
+ },
+ "source": ""
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/federated_learning/nvflare/nvflare_spleen_example/README.md b/federated_learning/nvflare/nvflare_spleen_example/README.md
new file mode 100644
index 0000000000..9f27a8fdaa
--- /dev/null
+++ b/federated_learning/nvflare/nvflare_spleen_example/README.md
@@ -0,0 +1,54 @@
+# Federated learning with NVIDIA FLARE
+
+## Brief Introduction
+
+This repository contains an end-to-end Federated training example based on MONAI trainers and [NVIDIA FLARE](https://github.com/nvidia/nvflare).
+
+This example requires Python 3.8.10
+
+Inside this folder:
+- All Jupiter notebooks are used to build an FL experiment step-by-step.
+- `hello-monai` is a folder containing all required config files for the experiment (in `config/`) and the customized trainer (in `custom`) and its components.
+
+## Installation
+
+The following installation steps refer to the official installation guide of NVIDIA FLARE, [please check that guide for more details](https://nvidia.github.io/NVFlare/installation.html)
+
+### Virtual Environment
+
+It is recommended to create a virtual engironment via `venv` to install all packages:
+
+```
+python3 -m venv nvflare-env
+source nvflare-env/bin/activate
+```
+### Libraries
+
+Please run:
+```
+pip install -U -r requirements.txt
+```
+
+### Prepare Startup Kit
+
+NVIDIA FLARE provides the Open Provision API to build the startup kit flexibly, the corresponding guide is in [here](https://nvidia.github.io/NVFlare/user_guide/provisioning_tool.html).
+
+In this example, we simply use the `poc` command to create one startup kit, this way is also used in [an official example of NVIDIA FLARE](https://nvidia.github.io/NVFlare/examples/hello_cross_val.html?highlight=poc).
+
+Please run:
+```
+poc -n 2
+```
+and type `y`, then a working folder named `poc` will be created (the related readme file is in `poc/Readme.rst`), the folder works for one server, two clients and one admin.
+
+## Build Experiment
+
+The following step-by-step process will be shown in Jupyter Notebooks, please run:
+
+`jupyter lab --ip 0.0.0.0 --port 8888 --allow-root --no-browser --NotebookApp.token=MONAIFLExample`
+
+and enter the following link:
+
+`http://localhost:8888/?token=MONAIFLExample`
+
+Then run `1-Server.ipynb`. You should follow the steps in the notebook, which will guide you through the process of building an FL experiment based on 2 clients and 1 server.
diff --git a/federated_learning/nvflare/nvflare_spleen_example/hello_monai/config/config_fed_client.json b/federated_learning/nvflare/nvflare_spleen_example/hello_monai/config/config_fed_client.json
new file mode 100644
index 0000000000..98bb905fbc
--- /dev/null
+++ b/federated_learning/nvflare/nvflare_spleen_example/hello_monai/config/config_fed_client.json
@@ -0,0 +1,20 @@
+{
+ "format_version": 2,
+ "executors": [
+ {
+ "tasks": ["train"],
+ "executor": {
+ "path": "monai_trainer.MONAITrainer",
+ "args": {
+ "aggregation_epochs": 10
+ }
+ }
+ }
+ ],
+ "task_result_filters": [
+ ],
+ "task_data_filters": [
+ ],
+ "components": [
+ ]
+}
diff --git a/federated_learning/nvflare/nvflare_spleen_example/hello_monai/config/config_fed_server.json b/federated_learning/nvflare/nvflare_spleen_example/hello_monai/config/config_fed_server.json
new file mode 100644
index 0000000000..555b919dee
--- /dev/null
+++ b/federated_learning/nvflare/nvflare_spleen_example/hello_monai/config/config_fed_server.json
@@ -0,0 +1,63 @@
+{
+ "format_version": 2,
+
+ "server": {
+ "heart_beat_timeout": 600
+ },
+ "task_data_filters": [],
+ "task_result_filters": [],
+ "components": [
+ {
+ "id": "persistor",
+ "name": "PTFileModelPersistor",
+ "args": {
+ "model": {
+ "path": "monai.networks.nets.unet.UNet",
+ "args": {
+ "dimensions": 3,
+ "in_channels": 1,
+ "out_channels": 2,
+ "channels": [16, 32, 64, 128, 256],
+ "strides": [2, 2, 2, 2],
+ "num_res_units": 2,
+ "norm": "batch"
+ }
+ }
+ }
+ },
+ {
+ "id": "shareable_generator",
+ "name": "FullModelShareableGenerator",
+ "args": {}
+ },
+ {
+ "id": "aggregator",
+ "name": "AccumulateWeightedAggregator",
+ "args": {
+ "aggregation_weights": {
+ "site-1": 1.0,
+ "site-2": 0.5
+ },
+ "expected_data_kind": "WEIGHTS"
+ }
+ }
+ ],
+ "workflows": [
+ {
+ "id": "scatter_and_gather",
+ "name": "ScatterAndGather",
+ "args": {
+ "min_clients" : 1,
+ "num_rounds" : 100,
+ "start_round": 0,
+ "wait_time_after_min_received": 10,
+ "aggregator_id": "aggregator",
+ "persistor_id": "persistor",
+ "shareable_generator_id": "shareable_generator",
+ "train_task_name": "train",
+ "train_timeout": 0,
+ "ignore_result_error": true
+ }
+ }
+ ]
+}
diff --git a/federated_learning/nvflare/nvflare_example_docker/spleen_example/config/config_train.json b/federated_learning/nvflare/nvflare_spleen_example/hello_monai/config/config_train.json
similarity index 52%
rename from federated_learning/nvflare/nvflare_example_docker/spleen_example/config/config_train.json
rename to federated_learning/nvflare/nvflare_spleen_example/hello_monai/config/config_train.json
index 7fd42b425f..9724632477 100644
--- a/federated_learning/nvflare/nvflare_example_docker/spleen_example/config/config_train.json
+++ b/federated_learning/nvflare/nvflare_spleen_example/hello_monai/config/config_train.json
@@ -3,9 +3,7 @@
"learning_rate": 2e-4,
"amp": true,
"use_gpu": true,
- "multi_gpu": false,
"val_interval": 5,
- "data_list_base_dir": "/data/Task09_Spleen/",
- "data_list_json_file": "dataset_part1.json",
+ "data_list_json_file": "config/dataset_part1.json",
"ckpt_dir": "models"
}
diff --git a/federated_learning/nvflare/nvflare_example_docker/spleen_example/config/dataset_part1.json b/federated_learning/nvflare/nvflare_spleen_example/hello_monai/config/dataset_part1.json
similarity index 100%
rename from federated_learning/nvflare/nvflare_example_docker/spleen_example/config/dataset_part1.json
rename to federated_learning/nvflare/nvflare_spleen_example/hello_monai/config/dataset_part1.json
diff --git a/federated_learning/nvflare/nvflare_example_docker/spleen_example/config/dataset_part2.json b/federated_learning/nvflare/nvflare_spleen_example/hello_monai/config/dataset_part2.json
similarity index 100%
rename from federated_learning/nvflare/nvflare_example_docker/spleen_example/config/dataset_part2.json
rename to federated_learning/nvflare/nvflare_spleen_example/hello_monai/config/dataset_part2.json
diff --git a/federated_learning/nvflare/nvflare_spleen_example/hello_monai/custom/monai_trainer.py b/federated_learning/nvflare/nvflare_spleen_example/hello_monai/custom/monai_trainer.py
new file mode 100644
index 0000000000..6593897f91
--- /dev/null
+++ b/federated_learning/nvflare/nvflare_spleen_example/hello_monai/custom/monai_trainer.py
@@ -0,0 +1,262 @@
+# Copyright 2020 MONAI Consortium
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+# http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from typing import Dict
+
+import numpy as np
+import torch
+from nvflare.apis.dxo import DXO, DataKind, MetaKey, from_shareable
+from nvflare.apis.event_type import EventType
+from nvflare.apis.executor import Executor
+from nvflare.apis.fl_constant import FLContextKey, ReturnCode
+from nvflare.apis.fl_context import FLContext
+from nvflare.apis.shareable import Shareable
+from nvflare.apis.signal import Signal
+from nvflare.app_common.app_constant import AppConstants
+
+from train_configer import TrainConfiger
+
+
+class MONAITrainer(Executor):
+ """
+ This class implements a MONAI based trainer that can be used for Federated Learning with NVFLARE.
+
+ Args:
+ aggregation_epochs: the number of training epochs for a round. Defaults to 1.
+
+ """
+
+ def __init__(
+ self,
+ aggregation_epochs: int = 1,
+ train_task_name: str = AppConstants.TASK_TRAIN,
+ ):
+ super().__init__()
+ """
+ Trainer init happens at the very beginning, only the basic info regarding the trainer is set here,
+ and the actual run has not started at this point.
+ """
+ self.aggregation_epochs = aggregation_epochs
+ self._train_task_name = train_task_name
+
+ def _initialize_trainer(self, fl_ctx: FLContext):
+ """
+ The trainer's initialization function. At the beginning of a FL experiment,
+ the train and evaluate engines, as well as train context and FL context
+ should be initialized.
+ """
+ # Initialize train and evaluation engines.
+ app_root = fl_ctx.get_prop(FLContextKey.APP_ROOT)
+ fl_args = fl_ctx.get_prop(FLContextKey.ARGS)
+ # will update multi-gpu supports later
+ # num_gpus = fl_ctx.get_prop(AppConstants.NUMBER_OF_GPUS, 1)
+ # self.multi_gpu = num_gpus > 1
+ self.client_name = fl_ctx.get_identity_name()
+ self.log_info(
+ fl_ctx,
+ f"Client {self.client_name} initialized at \n {app_root} \n with args: {fl_args}",
+ )
+ conf = TrainConfiger(
+ app_root=app_root,
+ wf_config_file_name=fl_args.train_config,
+ local_rank=fl_args.local_rank,
+ )
+ conf.configure()
+
+ # train_engine, and eval_engine are MONAI engines that will be used for training and validation.
+ # The corresponding training/validation settings, such as transforms, network and dataset
+ # are contained in `TrainConfiger`.
+ # The engine will be started when `.run()` is called, and when `.terminate()` is called,
+ # it will be completely terminated after the current iteration is finished.
+ self.train_engine = conf.train_engine
+ self.eval_engine = conf.eval_engine
+
+ def assign_current_model(self, model_weights: Dict[str, np.ndarray]):
+ """
+ This function is used to load provided weights for the network.
+ Before loading weights, tensors might need to be reshaped to support HE for secure aggregation.
+ More info of HE:
+ https://github.com/NVIDIA/clara-train-examples/blob/master/PyTorch/NoteBooks/FL/Homomorphic_Encryption.ipynb
+
+ """
+ net = self.train_engine.network
+
+ local_var_dict = net.state_dict()
+ model_keys = model_weights.keys()
+ for var_name in local_var_dict:
+ if var_name in model_keys:
+ weights = model_weights[var_name]
+ try:
+ local_var_dict[var_name] = torch.as_tensor(
+ np.reshape(weights, local_var_dict[var_name].shape)
+ )
+ except Exception as e:
+ raise ValueError(
+ "Convert weight from {} failed with error: {}".format(
+ var_name, str(e)
+ )
+ )
+
+ net.load_state_dict(local_var_dict)
+
+ def extract_model(self) -> Dict[str, np.ndarray]:
+ """
+ This function is used to extract weights of the network.
+ The extracted weights will be converted into a numpy array based dict.
+ """
+ net = self.train_engine.network
+ local_state_dict = net.state_dict()
+ local_model_dict = {}
+ for var_name in local_state_dict:
+ try:
+ local_model_dict[var_name] = local_state_dict[var_name].cpu().numpy()
+ except Exception as e:
+ raise ValueError(
+ "Convert weight from {} failed with error: {}".format(
+ var_name, str(e)
+ )
+ )
+
+ return local_model_dict
+
+ def generate_shareable(self):
+ """
+ This function is used to generate a DXO instance.
+ The instance can contain not only model weights, but also
+ some additional information that clients want to share.
+ """
+ # update meta, NUM_STEPS_CURRENT_ROUND is needed for aggregation.
+ if self.achieved_meta is None:
+ meta = {MetaKey.NUM_STEPS_CURRENT_ROUND: self.current_iters}
+ else:
+ meta = self.achieved_meta
+ meta[MetaKey.NUM_STEPS_CURRENT_ROUND] = self.current_iters
+ return DXO(
+ data_kind=DataKind.WEIGHTS,
+ data=self.extract_model(),
+ meta=meta,
+ ).to_shareable()
+
+ def handle_event(self, event_type: str, fl_ctx: FLContext):
+ """
+ This function is an extended function from the super class.
+ It is used to handle two events:
+
+ 1) `START_RUN`. At the start point of a FL experiment,
+ necessary components should be initialized.
+ 2) `ABORT_TASK`, when this event is fired, the running engines
+ should be terminated (this example uses MONAI engines to do train
+ and validation, and the engines can be terminated from another thread.
+ If the solution does not provide any way to interrupt/end the execution,
+ handle this event is not feasible).
+
+
+ Args:
+ event_type: the type of event that will be fired. In MONAITrainer,
+ only `START_RUN` and `END_RUN` need to be handled.
+ fl_ctx: an `FLContext` object.
+
+ """
+ if event_type == EventType.START_RUN:
+ self._initialize_trainer(fl_ctx)
+ elif event_type == EventType.ABORT_TASK:
+ # This event is fired to abort the current execution task. We are using the ignite engine to run the task.
+ # Unfortunately the ignite engine does not support the abort of task right now. We have to wait until
+ # the current task finishes.
+ pass
+ elif event_type == EventType.END_RUN:
+ self.eval_engine.terminate()
+ self.train_engine.terminate()
+
+ def _abort_execution(self) -> Shareable:
+ shareable = Shareable()
+ shareable.set_return_code(ReturnCode.EXECUTION_EXCEPTION)
+ return shareable
+
+ def execute(
+ self,
+ task_name: str,
+ shareable: Shareable,
+ fl_ctx: FLContext,
+ abort_signal: Signal,
+ ) -> Shareable:
+ """
+ This function is an extended function from the super class.
+ As a supervised learning based trainer, the execute function will run
+ evaluate and train engines based on model weights from `shareable`.
+ After fininshing training, a new `Shareable` object will be submitted
+ to server for aggregation.
+
+ Args:
+ task_name: decide which task will be executed.
+ shareable: the `Shareable` object acheived from server.
+ fl_ctx: the `FLContext` object achieved from server.
+ abort_signal: if triggered, the training will be aborted. In order to interrupt the training/validation
+ state, a separate is used to check the signal information every few seconds. The implementation is
+ shown in the `handle_event` function.
+ Returns:
+ a new `Shareable` object to be submitted to server for aggregation.
+ """
+ if task_name == self._train_task_name:
+ # convert shareable into DXO instance
+ dxo = from_shareable(shareable)
+ # check if dxo is valid.
+ if not isinstance(dxo, DXO):
+ self.log_exception(
+ fl_ctx, f"dxo excepted type DXO. Got {type(dxo)} instead."
+ )
+ shareable.set_return_code(ReturnCode.EXECUTION_EXCEPTION)
+ return shareable
+
+ # ensure data kind is weights.
+ if not dxo.data_kind == DataKind.WEIGHTS:
+ self.log_exception(
+ fl_ctx,
+ f"data_kind expected WEIGHTS but got {dxo.data_kind} instead.",
+ )
+ shareable.set_return_code(ReturnCode.EXECUTION_EXCEPTION)
+ return shareable
+
+ # load weights from dxo
+ self.assign_current_model(dxo.data)
+ # collect meta from dxo
+ self.achieved_meta = dxo.meta
+
+ # set engine state max epochs.
+ self.train_engine.state.max_epochs = (
+ self.train_engine.state.epoch + self.aggregation_epochs
+ )
+ # get current iteration when a round starts
+ iter_of_start_time = self.train_engine.state.iteration
+
+ # execute validation at the beginning of every round
+ self.eval_engine.run(self.train_engine.state.epoch + 1)
+
+ # check abort signal after validation
+ if abort_signal.triggered:
+ return self._abort_execution()
+
+ self.train_engine.run()
+
+ # check abort signal after train
+ if abort_signal.triggered:
+ return self._abort_execution()
+
+ # calculate current iteration and epoch data after training.
+ self.current_iters = self.train_engine.state.iteration - iter_of_start_time
+ # create a new `Shareable` object
+ return self.generate_shareable()
+ else:
+ # If unknown task name, set ReturnCode accordingly.
+ shareable = Shareable()
+ shareable.set_return_code(ReturnCode.TASK_UNKNOWN)
+ return shareable
diff --git a/federated_learning/nvflare/nvflare_example/spleen_example/custom/train_configer.py b/federated_learning/nvflare/nvflare_spleen_example/hello_monai/custom/train_configer.py
similarity index 80%
rename from federated_learning/nvflare/nvflare_example/spleen_example/custom/train_configer.py
rename to federated_learning/nvflare/nvflare_spleen_example/hello_monai/custom/train_configer.py
index 1a50ec9af6..ca7f315b1f 100644
--- a/federated_learning/nvflare/nvflare_example/spleen_example/custom/train_configer.py
+++ b/federated_learning/nvflare/nvflare_spleen_example/hello_monai/custom/train_configer.py
@@ -10,17 +10,11 @@
# limitations under the License.
import json
-import logging
import os
import torch
-import torch.distributed as dist
-from monai.data import (
- CacheDataset,
- DataLoader,
- load_decathlon_datalist,
- partition_dataset,
-)
+from monai.apps.utils import download_and_extract
+from monai.data import CacheDataset, DataLoader, load_decathlon_datalist
from monai.engines import SupervisedEvaluator, SupervisedTrainer
from monai.handlers import (
CheckpointSaver,
@@ -29,6 +23,7 @@
StatsHandler,
TensorBoardStatsHandler,
ValidationHandler,
+ from_engine,
)
from monai.inferers import SimpleInferer, SlidingWindowInferer
from monai.losses import DiceLoss
@@ -47,8 +42,6 @@
Spacingd,
ToTensord,
)
-from torch.nn.parallel import DistributedDataParallel
-from monai.handlers import from_engine
class TrainConfiger:
@@ -58,17 +51,18 @@ class TrainConfiger:
Please check the implementation of `SupervisedEvaluator` and `SupervisedTrainer`
from `monai.engines` and determine which components can be used.
Args:
- config_root: root folder path of config files.
+ app_root: root folder path of config files.
wf_config_file_name: json file name of the workflow config file.
"""
def __init__(
self,
- config_root: str,
+ app_root: str,
wf_config_file_name: str,
local_rank: int = 0,
+ dataset_folder_name: str = "Task09_Spleen",
):
- with open(os.path.join(config_root, wf_config_file_name)) as file:
+ with open(os.path.join(app_root, wf_config_file_name)) as file:
wf_config = json.load(file)
self.wf_config = wf_config
@@ -76,35 +70,38 @@ def __init__(
config Args:
max_epochs: the total epoch number for trainer to run.
learning_rate: the learning rate for optimizer.
- data_list_base_dir: the directory containing the data list json file.
+ dataset_dir: the directory containing the dataset. if `dataset_folder_name` does not
+ exist in the directory, it will be downloaded first.
data_list_json_file: the data list json file.
val_interval: the interval (number of epochs) to do validation.
ckpt_dir: the directory to save the checkpoint.
amp: whether to enable auto-mixed-precision training.
use_gpu: whether to use GPU in training.
- multi_gpu: whether to use multiple GPUs for distributed training.
+
"""
self.max_epochs = wf_config["max_epochs"]
self.learning_rate = wf_config["learning_rate"]
- self.data_list_base_dir = wf_config["data_list_base_dir"]
self.data_list_json_file = wf_config["data_list_json_file"]
self.val_interval = wf_config["val_interval"]
self.ckpt_dir = wf_config["ckpt_dir"]
self.amp = wf_config["amp"]
self.use_gpu = wf_config["use_gpu"]
- self.multi_gpu = wf_config["multi_gpu"]
self.local_rank = local_rank
+ self.app_root = app_root
+ self.dataset_folder_name = dataset_folder_name
+ if not os.path.exists(os.path.join(app_root, self.dataset_folder_name)):
+ self.download_spleen_dataset()
def set_device(self):
- if self.multi_gpu:
- # initialize distributed training
- dist.init_process_group(backend="nccl", init_method="env://")
- device = torch.device(f"cuda:{self.local_rank}")
- torch.cuda.set_device(device)
- else:
- device = torch.device("cuda" if self.use_gpu else "cpu")
+ device = torch.device("cuda" if self.use_gpu else "cpu")
self.device = device
+ def download_spleen_dataset(self):
+ url = "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar"
+ name = os.path.join(self.app_root, self.dataset_folder_name)
+ tarfile_name = f"{name}.tar"
+ download_and_extract(url=url, filepath=tarfile_name, output_dir=self.app_root)
+
def configure(self):
self.set_device()
network = UNet(
@@ -116,13 +113,6 @@ def configure(self):
num_res_units=2,
norm=Norm.BATCH,
).to(self.device)
- if self.multi_gpu:
- network = DistributedDataParallel(
- module=network,
- device_ids=[self.device],
- find_unused_parameters=False,
- )
-
train_transforms = Compose(
[
LoadImaged(keys=("image", "label")),
@@ -157,24 +147,17 @@ def configure(self):
)
# set datalist
train_datalist = load_decathlon_datalist(
- os.path.join(self.data_list_base_dir, self.data_list_json_file),
+ os.path.join(self.app_root, self.data_list_json_file),
is_segmentation=True,
data_list_key="training",
- base_dir=self.data_list_base_dir,
+ base_dir=os.path.join(self.app_root, self.dataset_folder_name),
)
val_datalist = load_decathlon_datalist(
- os.path.join(self.data_list_base_dir, self.data_list_json_file),
+ os.path.join(self.app_root, self.data_list_json_file),
is_segmentation=True,
data_list_key="validation",
- base_dir=self.data_list_base_dir,
+ base_dir=os.path.join(self.app_root, self.dataset_folder_name),
)
- if self.multi_gpu:
- train_datalist = partition_dataset(
- data=train_datalist,
- shuffle=True,
- num_partitions=dist.get_world_size(),
- even_divisible=True,
- )[dist.get_rank()]
train_ds = CacheDataset(
data=train_datalist,
transform=train_transforms,
@@ -225,8 +208,7 @@ def configure(self):
AsDiscreted(
keys=["pred", "label"],
argmax=[True, False],
- to_onehot=True,
- num_classes=2,
+ to_onehot=2,
),
]
)
@@ -235,7 +217,6 @@ def configure(self):
"val_mean_dice": MeanDice(
include_background=False,
output_transform=from_engine(["pred", "label"]),
- #device=self.device,
)
}
val_handlers = [
@@ -274,7 +255,9 @@ def configure(self):
ValidationHandler(
validator=self.eval_engine, interval=self.val_interval, epoch_level=True
),
- StatsHandler(tag_name="train_loss", output_transform=from_engine("loss", first=True)),
+ StatsHandler(
+ tag_name="train_loss", output_transform=from_engine("loss", first=True)
+ ),
TensorBoardStatsHandler(
log_dir=self.ckpt_dir,
tag_name="train_loss",
@@ -295,7 +278,3 @@ def configure(self):
train_handlers=train_handlers,
amp=self.amp,
)
-
- if self.local_rank > 0:
- self.train_engine.logger.setLevel(logging.WARNING)
- self.eval_engine.logger.setLevel(logging.WARNING)
diff --git a/federated_learning/nvflare/nvflare_spleen_example/requirements.txt b/federated_learning/nvflare/nvflare_spleen_example/requirements.txt
new file mode 100644
index 0000000000..e37fc83d93
--- /dev/null
+++ b/federated_learning/nvflare/nvflare_spleen_example/requirements.txt
@@ -0,0 +1,10 @@
+pip
+setuptools
+nvflare==2.0.1
+monai==0.8.0
+pytorch-ignite==0.4.6
+tqdm==4.61.2
+nibabel==3.2.1
+tensorboard==2.5.0
+wheel
+jupyterlab