-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluating existing SCT models #34
Comments
I created the file It computes dice score, lesion ppv, lesion sensitivity and lesion f1 score. It is currently running to evaluate it on th test set using: python evaluation/test_sct_models.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_output |
Because the initial code was taking too long to compute (aroung 90h), I decided to split it into 3 files: python evaluation/test_sct_deepseg_lesion.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_lesion
python evaluation/test_sct_deepseg_psir_stir.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_psir-stir
python evaluation/test_sct_deepseg_mp2rage.py --msd-data-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_mp2rage |
For the sct_deepseg_lesion modelI then plotted the desired curves using: python evaluation/plot_performance.py --pred-dir-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_lesion/ --data-json-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --split test Output: Dice score per contrast (mean ± std)
PSIR (n=60): 0.0068 ± 0.0098
STIR (n=11): 0.3676 ± 0.2831
T2star (n=83): 0.5117 ± 0.2076
T2w (n=358): 0.3206 ± 0.2679
UNIT1 (n=57): 0.0070 ± 0.0084 Here is the output for the other metricsPPV score per contrast (mean ± std)
PSIR (n=60): 0.0222 ± 0.1354
STIR (n=11): 0.4864 ± 0.4037
T2star (n=83): 0.6010 ± 0.2895
T2w (n=358): 0.6079 ± 0.4153
UNIT1 (n=57): 0.0097 ± 0.0526
F1 score per contrast (mean ± std)
PSIR (n=60): 0.0077 ± 0.0441
STIR (n=11): 0.4037 ± 0.3222
T2star (n=83): 0.6396 ± 0.2281
T2w (n=358): 0.5059 ± 0.3690
UNIT1 (n=57): 0.0088 ± 0.0464
Sensitivity score per contrast (mean ± std)
PSIR (n=60): 0.0395 ± 0.1839
STIR (n=11): 0.4500 ± 0.3738
T2star (n=83): 0.8102 ± 0.2478
T2w (n=358): 0.5221 ± 0.4007
UNIT1 (n=57): 0.0085 ± 0.0458 For the MP2RAGE modelpython evaluation/plot_performance.py --pred-dir-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_mp2rage/ --data-json-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --split test Output: Dice score per contrast (mean ± std)
PSIR (n=60): 0.2135 ± 0.1760
STIR (n=11): 0.0110 ± 0.0126
T2star (n=83): 0.0074 ± 0.0223
T2w (n=358): 0.0067 ± 0.0127
UNIT1 (n=57): 0.4549 ± 0.1944 Output for the other metrics:PPV score per contrast (mean ± std)
PSIR (n=60): 0.3733 ± 0.2918
STIR (n=11): 0.0000 ± 0.0000
T2star (n=83): 0.0000 ± 0.0000
T2w (n=358): 0.1425 ± 0.3500
UNIT1 (n=57): 0.3298 ± 0.1770
F1 score per contrast (mean ± std)
PSIR (n=60): 0.3943 ± 0.2621
STIR (n=11): 0.0000 ± 0.0000
T2star (n=83): 0.0000 ± 0.0000
T2w (n=358): 0.0000 ± 0.0000
UNIT1 (n=57): 0.4422 ± 0.1937
Sensitivity score per contrast (mean ± std)
PSIR (n=60): 0.5506 ± 0.3480
STIR (n=11): 0.0000 ± 0.0000
T2star (n=83): 0.0000 ± 0.0000
T2w (n=358): 0.0000 ± 0.0000
UNIT1 (n=57): 0.8224 ± 0.2470 For the PSIR and STIR modelpython evaluation/plot_performance.py --pred-dir-path ~/net/ms-lesion-agnostic/evaluating_existing_models/evaluation_sct_deepseg_psir-stir/ --data-json-path ~/net/ms-lesion-agnostic/msd_data/dataset_2024-07-24_seed42_lesionOnly.json --split test Output: Dice score per contrast (mean ± std)
PSIR (n=60): 0.5701 ± 0.2660
STIR (n=11): 0.5984 ± 0.2237
T2star (n=83): 0.1312 ± 0.1538
T2w (n=358): 0.2213 ± 0.2134
UNIT1 (n=57): 0.0023 ± 0.0016 For the other metrics:PPV score per contrast (mean ± std)
PSIR (n=60): 0.6672 ± 0.3478
STIR (n=11): 0.6605 ± 0.3430
T2star (n=83): 0.1235 ± 0.1475
T2w (n=358): 0.4306 ± 0.4165
UNIT1 (n=57): 0.0000 ± 0.0000
F1 score per contrast (mean ± std)
PSIR (n=60): 0.6381 ± 0.3240
STIR (n=11): 0.6494 ± 0.2915
T2star (n=83): 0.1815 ± 0.1940
T2w (n=358): 0.3392 ± 0.3560
UNIT1 (n=57): 0.0000 ± 0.0000
Sensitivity score per contrast (mean ± std)
PSIR (n=60): 0.7138 ± 0.3415
STIR (n=11): 0.7462 ± 0.3294
T2star (n=83): 0.4796 ± 0.4512
T2w (n=358): 0.5556 ± 0.4181
UNIT1 (n=57): 0.0000 ± 0.0000
|
I then evaluated the SCT models for segmenting spinal lesions on the external testing set (ms-basel-2018 and ms-basel-2020). For sct_deepseg_lesionI rand the following command: python evaluation/test_sct_lesion_external_dataset.py --input-folder ~/net/ms-lesion-agnostic/data --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/external_evaluation/ Output: Dice score per contrast (mean ± std)
PD (n=31): 0.0046 ± 0.0114
T1w (n=22): 0.0673 ± 0.2120
T2w (n=24): 0.3272 ± 0.3372 Here is the output for the other metricsPPV score per contrast (mean ± std)
PD (n=31): 0.0613 ± 0.2076
T1w (n=22): 0.1136 ± 0.3060
T2w (n=24): 0.3993 ± 0.3877
F1 score per contrast (mean ± std)
PD (n=31): 0.0189 ± 0.0651
T1w (n=22): 0.0657 ± 0.2186
T2w (n=24): 0.4000 ± 0.3717
Sensitivity score per contrast (mean ± std)
PD (n=31): 0.0130 ± 0.0461
T1w (n=22): 0.2849 ± 0.4499 ##For sct_deepseg mp2rage I ran the following command: python evaluation/test_sct_mp2rage_external_dataset.py --input-folder ~/net/ms-lesion-agnostic/data --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/external_evaluation/ Output: Dice score per contrast (mean ± std)
PD (n=31): 0.0034 ± 0.0118
T1w (n=22): 0.0559 ± 0.2116
T2w (n=24): 0.2864 ± 0.4308 Here is the output for the other metricsPPV score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.0455 ± 0.2132
T2w (n=24): 0.2500 ± 0.4423
F1 score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.0455 ± 0.2132
T2w (n=24): 0.2500 ± 0.4423
Sensitivity score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.2727 ± 0.4558 For sct_deepseg psir-stirI ran the following command python evaluation/test_sct_psir-stir_external_dataset.py --input-folder ~/net/ms-lesion-agnostic/data --output-path ~/net/ms-lesion-agnostic/evaluating_existing_models/external_evaluation/ Output: Dice score per contrast (mean ± std)
PD (n=31): 0.0036 ± 0.0119
T1w (n=22): 0.2774 ± 0.4529
T2w (n=24): 0.2510 ± 0.3996 Here is the output for the other metricsPPV score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.2727 ± 0.4558
T2w (n=24): 0.2792 ± 0.4128
F1 score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.2727 ± 0.4558
T2w (n=24): 0.2812 ± 0.4154
Sensitivity score per contrast (mean ± std)
PD (n=31): 0.0000 ± 0.0000
T1w (n=22): 0.2727 ± 0.4558 |
This issue reports the work done to evaluate the existing models.
The existing models are the following:
The text was updated successfully, but these errors were encountered: