diff --git a/models/pathology_tumor_detection/configs/metadata.json b/models/pathology_tumor_detection/configs/metadata.json index e3c7cfa0..32f7cab6 100644 --- a/models/pathology_tumor_detection/configs/metadata.json +++ b/models/pathology_tumor_detection/configs/metadata.json @@ -1,7 +1,8 @@ { "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json", - "version": "0.6.1", + "version": "0.6.2", "changelog": { + "0.6.2": "enhance readme for nccl timout issue", "0.6.1": "fix multi-gpu issue", "0.6.0": "use monai 1.4 and update large files", "0.5.9": "update to use monai 1.3.1", diff --git a/models/pathology_tumor_detection/configs/train.json b/models/pathology_tumor_detection/configs/train.json index 52344bad..78dba67f 100644 --- a/models/pathology_tumor_detection/configs/train.json +++ b/models/pathology_tumor_detection/configs/train.json @@ -174,7 +174,7 @@ "_target_": "DataLoader", "dataset": "@train#dataset", "batch_size": 128, - "pin_memory": true, + "pin_memory": false, "num_workers": 8 }, "inferer": { @@ -325,7 +325,7 @@ "_target_": "DataLoader", "dataset": "@validate#dataset", "batch_size": 128, - "pin_memory": true, + "pin_memory": false, "shuffle": false, "num_workers": 8 }, diff --git a/models/pathology_tumor_detection/docs/README.md b/models/pathology_tumor_detection/docs/README.md index 46cc761a..cf04c829 100644 --- a/models/pathology_tumor_detection/docs/README.md +++ b/models/pathology_tumor_detection/docs/README.md @@ -135,6 +135,14 @@ torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run --config Please note that the distributed training-related options depend on the actual running environment; thus, users may need to remove `--standalone`, modify `--nnodes`, or do some other necessary changes according to the machine used. For more details, please refer to [pytorch's official tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html). +**Note:** When using a container based on [PyTorch 24.0x](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes), you may encounter random NCCL timeout errors. To address this issue, consider the following adjustments: + +- Reduce the `num_workers`: Decreasing the number of data loader workers can help minimize these errors. +- Set `pin_memory` to `False`: Disabling pinned memory may reduce the likelihood of timeouts. +- Switch to the `gloo` backend: As a workaround, you can set the distributed training backend to `gloo` to avoid NCCL-related timeouts. + +You can implement these settings by adding flags like `--train#dataloader#num_workers 0` or `--train#dataloader#pin_memory false`. + #### Execute inference ``` diff --git a/models/vista2d/configs/metadata.json b/models/vista2d/configs/metadata.json index a1cba412..cd5ae1dc 100644 --- a/models/vista2d/configs/metadata.json +++ b/models/vista2d/configs/metadata.json @@ -1,7 +1,8 @@ { "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20240725.json", - "version": "0.2.9", + "version": "0.3.0", "changelog": { + "0.3.0": "update readme", "0.2.9": "fix unsupported data dtype in findContours", "0.2.8": "remove relative path in readme", "0.2.7": "enhance readme", diff --git a/models/vista2d/docs/README.md b/models/vista2d/docs/README.md index 4ff5cfa1..afcdb12a 100644 --- a/models/vista2d/docs/README.md +++ b/models/vista2d/docs/README.md @@ -52,7 +52,12 @@ The default dataset for training, validation, and inference is the [Cellpose](ht Additionally, all data lists are available in the `datalists.zip` file located in the root directory of the bundle. Extract the contents of the `.zip` file to access the data lists. ### Dependencies -Please refer to `required_packages_version` in `configs/metadata.json` to install all necessary dependencies before executing. +Please refer to the `required_packages_version` section in `configs/metadata.json` to install all necessary dependencies before execution. If you’re using the MONAI container, you can simply run the commands below and ignore any "opencv-python-headless not installed" error message, as this package is already included in the container. + +``` +pip install fastremap==1.15.0 roifile==2024.5.24 natsort==8.4.0 +pip install --no-deps cellpose +``` Important Note: if your environment already contains OpenCV, installing `cellpose` may lead to conflicts and produce errors such as: @@ -60,13 +65,14 @@ Important Note: if your environment already contains OpenCV, installing `cellpos AttributeError: partially initialized module 'cv2' has no attribute 'dnn' (most likely due to a circular import) ``` -when executing. To resolve this issue, please uninstall OpenCV and then re-install `cellpose` with a command like: +To resolve this, uninstall `OpenCV` first, and then install `cellpose` using the following commands: ```Bash -pip uninstall -y opencv && rm /usr/local/lib/python3.x/dist-packages/cv2 +pip uninstall -y opencv && rm /usr/local/lib/python3.*/dist-packages/cv2 ``` +Make sure to replace 3.* with your actual Python version (e.g., 3.10). -Alternatively, you can use the following command to install `cellpose` without its dependencies: +Alternatively, you can install `cellpose` without its dependencies to avoid potential conflicts: ``` pip install --no-deps cellpose