Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example on MIL/CLAM #2

Open
Tato14 opened this issue Oct 13, 2021 · 12 comments
Open

Example on MIL/CLAM #2

Tato14 opened this issue Oct 13, 2021 · 12 comments

Comments

@Tato14
Copy link

Tato14 commented Oct 13, 2021

Hi,

Could you please share an ExperimentFile using MIL/CLAM pipeline?

Thanks

@narminGhaffari
Copy link
Contributor

@Tato14 I edited the Experiment file in the rpeo. I hope it will solve the problem.

@Tato14
Copy link
Author

Tato14 commented Oct 27, 2021

Hi. Thanks for the reply. I didn't notice that you can specify mil, clam_sb and clam_mb in the modelName parameter.

However, I am still having some issues. It seems that there is an issue when you try to create the splits for the CV. In the following code you can see how train/val/test splits return 0 samples:

Namespace(B=8, adressExp='/mnt/isilon/Lung_HMAR_TCGA/train_Level1/ExperimentFile_MIL_default.txt', bag_loss='ce', bag_weight=0.7, batch_size=1, clini_dir=['/mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/LungDX_CLINI.xlsx'], csv_name='CLEANED_DATA', datadir_train=['/mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/BLOCKS_NORM_MACENKO'], drop_out=True, early_stopping=False, feat_dir=['/mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/FEATURES'], feature_extract=False, freeze_Ratio=0.5, gpuNo=0, inst_loss='svm', k=3, log_data=True, lr=0.0001, maxBlockNum=512, max_epochs=10, model_name='mil', model_size='big', no_inst_cluster=False, normalize_targetNum=False, numHighScoreBlocks=20, numHighScorePatients=10, opt='adam', project_name='ExperimentFile_MIL_default', reg=1e-05, seed=1, slide_dir=['/mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/LungDX_SLIDE.csv'], subtyping=False, target_labels=['lung_type'], testing=False, train_full=False, useClassicModel=False, weighted_sample=True)
1
LOADING DATA FROM/mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/BLOCKS_NORM_MACENKO...
Remove the NaN values from the Target Label...
**********************************************************************
0 Patients didnt have the proper label for target label: lung_type
**********************************************************************
Data for 0 Patients from Clini Table is not found in Slide Table!
Data for 0 Patients from Slide Table is not found in Clini Table!
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1019/1019 [00:02<00:00, 459.83it/s]
FINISHED!
TOTAL NUMBER OF PATIENTS:1019
label column: lung_type
label dictionary: {'LUAD': 0, 'LUSC': 1}
number of classes: 2
Patient-LVL; Number of samples registered in class 0: 534
Patient-LVL; Number of samples registered in class 1: 485
##############################################################


Load the DataSet...
label column: lung_type
label dictionary: {'LUAD': 0, 'LUSC': 1}
number of classes: 2
slide-level counts:  
 0    534
1    485
Name: label, dtype: int64
Patient-LVL; Number of samples registered in class 0: 534
Slide-LVL; Number of samples registered in class 0: 534
Patient-LVL; Number of samples registered in class 1: 485
Slide-LVL; Number of samples registered in class 1: 485
##############################################################

**********************************************************************
START OF CROSS VALIDATION
**********************************************************************
340

Training Fold 0!

Init train/val/test splits... ******************************************************************
Training on 0 samples
Validating on 0 samples
Testing on 0 samples
******************************************************************
Done!

Init loss function... Done!

Init Model... Done!
MIL_fc(
  (classifier): DataParallel(
    (module): Sequential(
      (0): Linear(in_features=1024, out_features=512, bias=True)
      (1): ReLU()
      (2): Dropout(p=0.25, inplace=False)
      (3): Linear(in_features=512, out_features=2, bias=True)
    )
  )
)
Total number of parameters: 525826
Total number of trainable parameters: 525826

Init optimizer ... Done!

Init Loaders... Traceback (most recent call last):
  File "/home/jgibert/KatherLab/HIA/Main.py", line 40, in <module>
    ClamMILTraining(args)
  File "/home/jgibert/KatherLab/HIA/ClamMILTraining.py", line 141, in ClamMILTraining
    test_auc, val_auc, test_acc, val_acc, patient_results  = Train_MIL_CLAM(datasets, i, args)
  File "/home/jgibert/KatherLab/HIA/utils/core_utils.py", line 123, in Train_MIL_CLAM
    train_loader = Get_split_loader(train_split, training = True, testing = args.testing, weighted = args.weighted_sample)
  File "/home/jgibert/KatherLab/HIA/utils/data_utils.py", line 278, in Get_split_loader
    weights = Make_weights_for_balanced_classes_split(split_dataset)
  File "/home/jgibert/KatherLab/HIA/utils/data_utils.py", line 296, in Make_weights_for_balanced_classes_split
    weight_per_class = [N/len(dataset.slide_cls_ids[c]) for c in range(len(dataset.slide_cls_ids))]                                                                                                     
  File "/home/jgibert/KatherLab/HIA/utils/data_utils.py", line 296, in <listcomp>
    weight_per_class = [N/len(dataset.slide_cls_ids[c]) for c in range(len(dataset.slide_cls_ids))]                                                                                                     
ZeroDivisionError: float division by zero

Surfing a little bit on the repo, I found that Get_split_from_df use a self.slide_data that I am not able to find. Do you have any hints on what could be missing there? Thanks

@narminGhaffari
Copy link
Contributor

@Tato14 This problem should have been solved now. Can you check it please and write me back about the result?

@Tato14
Copy link
Author

Tato14 commented Oct 27, 2021

@narminGhaffari It seems that the same error persists.

Namespace(B=8, adressExp='/mnt/isilon/Lung_HMAR_TCGA/train_Level1/ExperimentFile_MIL_default.txt', bag_loss='ce', bag_weight=0.7, batch_size=1, clini_dir=['/mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/LungDX_CLINI.xlsx'], csv_name='CLEANED_DATA', datadir_train=['/mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/BLOCKS_NORM_MACENKO'], drop_out=True, early_stopping=False, feat_dir=['/mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/FEATURES'], feature_extract=False, freeze_Ratio=0.5, gpuNo=0, inst_loss='svm', k=3, log_data=True, lr=0.0001, maxBlockNum=512, max_epochs=10, model_name='mil', model_size='big', no_inst_cluster=False, normalize_targetNum=False, numHighScoreBlocks=20, numHighScorePatients=10, opt='adam', project_name='ExperimentFile_MIL_default', reg=1e-05, seed=1, slide_dir=['/mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/LungDX_SLIDE.csv'], subtyping=False, target_labels=['lung_type'], testing=False, train_full=False, useClassicModel=False, weighted_sample=True)
1
LOADING DATA FROM/mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/BLOCKS_NORM_MACENKO...
Remove the NaN values from the Target Label...
**********************************************************************
0 Patients didnt have the proper label for target label: lung_type
**********************************************************************
Data for 0 Patients from Clini Table is not found in Slide Table!
Data for 0 Patients from Slide Table is not found in Clini Table!
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1019/1019 [00:02<00:00, 472.13it/s]
FINISHED!
TOTAL NUMBER OF PATIENTS:1019
label column: lung_type
label dictionary: {'LUAD': 0, 'LUSC': 1}
number of classes: 2
Patient-LVL; Number of samples registered in class 0: 534
Patient-LVL; Number of samples registered in class 1: 485
##############################################################


Load the DataSet...
label column: lung_type
label dictionary: {'LUAD': 0, 'LUSC': 1}
number of classes: 2
slide-level counts:  
 0    534
1    485
Name: label, dtype: int64
Patient-LVL; Number of samples registered in class 0: 534
Slide-LVL; Number of samples registered in class 0: 534
Patient-LVL; Number of samples registered in class 1: 485
Slide-LVL; Number of samples registered in class 1: 485
##############################################################

**********************************************************************
START OF CROSS VALIDATION
**********************************************************************
340

Training Fold 0!

Init train/val/test splits... ******************************************************************
Training on 0 samples
Validating on 0 samples
Testing on 0 samples
******************************************************************
Done!

Init loss function... Done!

Init Model... Done!
MIL_fc(
  (classifier): DataParallel(
    (module): Sequential(
      (0): Linear(in_features=1024, out_features=512, bias=True)
      (1): ReLU()
      (2): Dropout(p=0.25, inplace=False)
      (3): Linear(in_features=512, out_features=2, bias=True)
    )
  )
)
Total number of parameters: 525826
Total number of trainable parameters: 525826

Init optimizer ... Done!

Init Loaders... Traceback (most recent call last):
  File "/home/jgibert/KatherLab/HIA/Main.py", line 40, in <module>
    ClamMILTraining(args)
  File "/home/jgibert/KatherLab/HIA/ClamMILTraining.py", line 137, in ClamMILTraining
    patient_results, aucList  = Train_MIL_CLAM(datasets = datasets, cur = i, args = args)
  File "/home/jgibert/KatherLab/HIA/utils/core_utils.py", line 123, in Train_MIL_CLAM
    train_loader = Get_split_loader(train_split, training = True, testing = args.testing, weighted = args.weighted_sample)
  File "/home/jgibert/KatherLab/HIA/utils/data_utils.py", line 278, in Get_split_loader
    weights = Make_weights_for_balanced_classes_split(split_dataset)
  File "/home/jgibert/KatherLab/HIA/utils/data_utils.py", line 296, in Make_weights_for_balanced_classes_split
    weight_per_class = [N/len(dataset.slide_cls_ids[c]) for c in range(len(dataset.slide_cls_ids))]                                                                                                     
  File "/home/jgibert/KatherLab/HIA/utils/data_utils.py", line 296, in <listcomp>
    weight_per_class = [N/len(dataset.slide_cls_ids[c]) for c in range(len(dataset.slide_cls_ids))]                                                                                                     
ZeroDivisionError: float division by zero

@narminGhaffari
Copy link
Contributor

@Tato14 It seems that your extract_feature flag is still False, so it is not extracting the feature vectors and since the corresponding folder is empty, then it is not able to load them.

@Tato14
Copy link
Author

Tato14 commented Oct 27, 2021

@narminGhaffari sorry I didn't saw this. Still getting an error:

###############################
Traceback (most recent call last):
  File "/home/jgibert/KatherLab/HIA/Main.py", line 40, in <module>
    ClamMILTraining(args)
  File "/home/jgibert/KatherLab/HIA/ClamMILTraining.py", line 53, in ClamMILTraining
    ExtractFeatures(data_dir = imgs, feat_dir = args.feat_dir, batch_size = args.batch_size, target_patch_size = -1, filterData = True,self_supervised = args.self_supervised)
AttributeError: 'Namespace' object has no attribute 'self_supervised'

Tried to add "self_supervised":"True", to ExperimentFile but the error persists. Moreover, I am not sure if that's what it expects...

@narminGhaffari
Copy link
Contributor

@Tato14 Check it now please!

@Tato14
Copy link
Author

Tato14 commented Oct 28, 2021

Hi, it seems that we are improving. I am still getting an error but I guess is because of the filename, could you confirm that?

Namespace(B=8, adressExp='/mnt/isilon/Lung_HMAR_TCGA/train_Level1/ExperimentFile_MIL_default.txt', bag_loss='ce', bag_weight=0.7, batch_size=1, clini_dir=['/mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/LungDX_CLINI.xlsx'], csv_name='CLEANED_DATA', datadir_train=['/mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/BLOCKS_NORM_MACENKO'], drop_out=True, early_stopping=False, feat_dir=['/mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/FEATURES'], feature_extract=True, freeze_Ratio=0.5, gpuNo=0, inst_loss='svm', k=3, log_data=True, lr=0.0001, maxBlockNum=512, max_epochs=10, model_name='mil', model_size='big', no_inst_cluster=False, normalize_targetNum=False, numHighScoreBlocks=20, numHighScorePatients=10, opt='adam', project_name='ExperimentFile_MIL_default', reg=1e-05, seed=1, slide_dir=['/mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/LungDX_SLIDE.csv'], subtyping=False, target_labels=['lung_type'], testing=False, train_full=False, useClassicModel=False, weighted_sample=True)
###############################
initializing dataset
loading model checkpoint

progress: 0/1395
TCGA-77-7139-01Z-00-DX1
processing /mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/BLOCKS_NORM_MACENKO/TCGA-77-7139-01Z-00-DX1: total of 1427 batches
Traceback (most recent call last):
  File "/home/jgibert/KatherLab/HIA/Main.py", line 40, in <module>
    ClamMILTraining(args)
  File "/home/jgibert/KatherLab/HIA/ClamMILTraining.py", line 53, in ClamMILTraining
    ExtractFeatures(data_dir = imgs, feat_dir = args.feat_dir, batch_size = args.batch_size, target_patch_size = -1, filterData = True)
  File "/home/jgibert/KatherLab/HIA/extractFeatures.py", line 115, in ExtractFeatures
    output_file_path = Compute_w_loader(file_path, output_path, 
  File "/home/jgibert/KatherLab/HIA/extractFeatures.py", line 60, in Compute_w_loader
    for count, (batch, coords) in enumerate(loader):
  File "/home/jgibert/anaconda3/envs/PyTorch_Bioformats/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/home/jgibert/anaconda3/envs/PyTorch_Bioformats/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/jgibert/anaconda3/envs/PyTorch_Bioformats/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/jgibert/anaconda3/envs/PyTorch_Bioformats/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/jgibert/KatherLab/HIA/dataGenerator/dataSet.py", line 65, in __getitem__
    coord =[int(temp.split(',')[0]) , int(temp.split(',')[1])]
ValueError: invalid literal for int() with base 10: '/mnt/isilon/Lung_HMAR_TCGA/train_Level1/LungDX/BLOCKS_NORM_MACENKO/TCGA-77-7139-01Z-00-DX1/TCGA-77-7139-01Z-00-DX1_1_12800-14848-13312-15360_.png'

Thanks!

@narminGhaffari
Copy link
Contributor

@Tato14 Yes, it is. Our workflow creates patches with names like aaaa_(123, 455).png.
The numbers inside () are the coordinates of the patch in the whole slide image. If you don't have this structure, then maybe you don't need to have the coord variable at all and you can comment it.

@Tato14
Copy link
Author

Tato14 commented Oct 29, 2021

@narminGhaffari thanks for the clarification. Since the dataloader expect a filename and coordinate pair was easier to edit the code for my specific filename structure. Now it seems that everything is working nicely!

Just one more thing before closing:
In this line I think you should expect patientList instead of lengthList.

Thanks again for the great feedback!

@Tato14
Copy link
Author

Tato14 commented Nov 2, 2021

@narminGhaffari still having the

Init train/val/test splits... ******************************************************************
Training on 0 samples
Validating on 0 samples
Testing on 0 samples
******************************************************************
Done!

Init loss function... Done!

Init Model... Done!
MIL_fc(
  (classifier): DataParallel(
    (module): Sequential(
      (0): Linear(in_features=1024, out_features=512, bias=True)
      (1): ReLU()
      (2): Dropout(p=0.25, inplace=False)
      (3): Linear(in_features=512, out_features=2, bias=True)
    )
  )
)
Total number of parameters: 525826
Total number of trainable parameters: 525826

Init optimizer ... Done!

Init Loaders... Traceback (most recent call last):
  File "/home/jgibert/KatherLab/HIA/Main.py", line 40, in <module>
    ClamMILTraining(args)
  File "/home/jgibert/KatherLab/HIA/ClamMILTraining.py", line 137, in ClamMILTraining
    patient_results, aucList  = Train_MIL_CLAM(datasets = datasets, cur = i, args = args)
  File "/home/jgibert/KatherLab/HIA/utils/core_utils.py", line 123, in Train_MIL_CLAM
    train_loader = Get_split_loader(train_split, training = True, testing = args.testing, weighted = args.weighted_sample)
  File "/home/jgibert/KatherLab/HIA/utils/data_utils.py", line 278, in Get_split_loader
    weights = Make_weights_for_balanced_classes_split(split_dataset)
  File "/home/jgibert/KatherLab/HIA/utils/data_utils.py", line 296, in Make_weights_for_balanced_classes_split
    weight_per_class = [N/len(dataset.slide_cls_ids[c]) for c in range(len(dataset.slide_cls_ids))]                                                                                                     
  File "/home/jgibert/KatherLab/HIA/utils/data_utils.py", line 296, in <listcomp>
    weight_per_class = [N/len(dataset.slide_cls_ids[c]) for c in range(len(dataset.slide_cls_ids))]                                                                                                     
ZeroDivisionError: float division by zero

After feature extraction. It seems that the splits are not loaded properly but I am not quite sure why.

@narminGhaffari
Copy link
Contributor

@Tato14 I am checking the repo, will write you back as soon as I found the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants