Request for Dataset Processing Scripts #1

TanmDL · 2024-12-05T17:59:10Z

Hello,

Your work is fantastic, and I truly appreciate the effort that went into it. However, I have a few questions about the processing of the two datasets. If you could share the scripts used for this, it would be very helpful for us. I assure you that I will properly cite your project.

Thank you!

AmayaGS · 2024-12-11T18:42:58Z

Hi @TanmDL, really sorry it took me a while to answer you! Thanks very much for your interest in this work :)

All the scripts I used for the preprocessing are on the GitHub repo. I just updated the README file with instructions for how to run the code, including the preprocessing steps - including the tissue segmentation, patching and feature embedding. Maybe this might answer your questions, but if not please let me know which specific step needs further explanation. Hope this helps!

AmayaGS · 2024-12-11T18:46:53Z

Any thoughts you might have to make the README clearer for users would be very valuable to me, so please let me know if there's anything unclear/which could be explained in more detail :)

TanmDL · 2024-12-11T22:17:26Z

Thank you for your response. Could you please clarify which classification task was studied here? Initially, I thought it would involve classes like CD68 and CD138, but I found the following labels: label_dict: {'0': 'Pauci-Immune', '1': 'Lymphoid/Myeloid'}, which appear to represent subtypes. This has confused me. Could you please explain this part in more detail? Also, I want to add that the pipeline design is excellent. Thank you.

AmayaGS · 2024-12-12T19:03:59Z

For the Rheumatoid Arthritis dataset I classified into inflammatory subtypes {'0': 'Pauci-Immune', '1': 'Lymphoid/Myeloid'} and for the Sjogren dataset into Absence and Presence of of Sjogren {'0': 'Not Sjogren', '1': 'Sjogren'}. However, depending on your data structure I think you could use the code to target your stains as labels. If you give me more detail on that, I could suggest how to do it. For example, assuming you have multiple stains per patient and want to classify the stains, you could add a column to patient_labels.csv file like so:

Patient_ID	Patient_stains	Patient_stains_numeric	label
Patient1	Patient1_CD68	Patient1.1_CD68	1
Patient1	Patient1_CD138	Patient1.2_CD138	2
Patient1	Patient1_CD20	Patient1.3_CD20	3
Patient1	Patient1_CD21	Patient1.4_CD21	4
Patient2	Patient2_CD68	Patient2.1_CD68	1
Patient2	Patient2_CD20	Patient2.3_CD20	3
Patient2	Patient2_CD21	Patient2.4_CD21	4

# Label/split configurations
labels:
  label: 'label' # column name for target label
  label_dict:  {'CD68': 1, 'CD138': 2, 'CD20': 3, 'CD21': 4} # Stain type numeric coding dictionary
  n_classes: 4 # number of target classes
  patient_id: 'Patient_stains_numeric' # column name for each unique file 

# Parsing configurations 
parsing:
  patient_ID: 'img.split("_")[0]' # "Patient1.1_stain" -> Patient1.1
  stain: 'img.split("_")[1]' # "Patient1.1_stain" -> stain
  stain_types: {'NA': 0, 'CD68': 1, 'CD138': 2, 'CD20': 3, 'CD21': 4} # Stain types

Of course this off the top of my head - I haven't tested it and it would depend on your file structure, but it should work.

TanmDL · 2024-12-13T15:50:58Z

Thank you for your kind reply and giving me a fantastic idea. Can you please send me dataset links so that I can try to download and test them?

AmayaGS · 2024-12-17T23:41:32Z

Unfortunately, for patient privacy protection, I am not able to share these datasets publicly as they come from clinical trial and research datasets. I am currently exploring options to publish a multistain dataset and can let you know how that goes, however if it works out it wouldn't be until mid-next year. Very sorry about that!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for Dataset Processing Scripts #1

Request for Dataset Processing Scripts #1

TanmDL commented Dec 5, 2024

AmayaGS commented Dec 11, 2024

AmayaGS commented Dec 11, 2024

TanmDL commented Dec 11, 2024

AmayaGS commented Dec 12, 2024

TanmDL commented Dec 13, 2024

AmayaGS commented Dec 17, 2024

Request for Dataset Processing Scripts #1

Request for Dataset Processing Scripts #1

Comments

TanmDL commented Dec 5, 2024

AmayaGS commented Dec 11, 2024

AmayaGS commented Dec 11, 2024

TanmDL commented Dec 11, 2024

AmayaGS commented Dec 12, 2024

TanmDL commented Dec 13, 2024

AmayaGS commented Dec 17, 2024