-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for Dataset Processing Scripts #1
Comments
Hi @TanmDL, really sorry it took me a while to answer you! Thanks very much for your interest in this work :) All the scripts I used for the preprocessing are on the GitHub repo. I just updated the README file with instructions for how to run the code, including the preprocessing steps - including the tissue segmentation, patching and feature embedding. Maybe this might answer your questions, but if not please let me know which specific step needs further explanation. Hope this helps! |
Any thoughts you might have to make the README clearer for users would be very valuable to me, so please let me know if there's anything unclear/which could be explained in more detail :) |
Thank you for your response. Could you please clarify which classification task was studied here? Initially, I thought it would involve classes like CD68 and CD138, but I found the following labels: label_dict: {'0': 'Pauci-Immune', '1': 'Lymphoid/Myeloid'}, which appear to represent subtypes. This has confused me. Could you please explain this part in more detail? Also, I want to add that the pipeline design is excellent. Thank you. |
For the Rheumatoid Arthritis dataset I classified into inflammatory subtypes {'0': 'Pauci-Immune', '1': 'Lymphoid/Myeloid'} and for the Sjogren dataset into Absence and Presence of of Sjogren {'0': 'Not Sjogren', '1': 'Sjogren'}. However, depending on your data structure I think you could use the code to target your stains as labels. If you give me more detail on that, I could suggest how to do it. For example, assuming you have multiple stains per patient and want to classify the stains, you could add a column to patient_labels.csv file like so:
# Label/split configurations
labels:
label: 'label' # column name for target label
label_dict: {'CD68': 1, 'CD138': 2, 'CD20': 3, 'CD21': 4} # Stain type numeric coding dictionary
n_classes: 4 # number of target classes
patient_id: 'Patient_stains_numeric' # column name for each unique file
# Parsing configurations
parsing:
patient_ID: 'img.split("_")[0]' # "Patient1.1_stain" -> Patient1.1
stain: 'img.split("_")[1]' # "Patient1.1_stain" -> stain
stain_types: {'NA': 0, 'CD68': 1, 'CD138': 2, 'CD20': 3, 'CD21': 4} # Stain types
Of course this off the top of my head - I haven't tested it and it would depend on your file structure, but it should work. |
Thank you for your kind reply and giving me a fantastic idea. Can you please send me dataset links so that I can try to download and test them? |
Unfortunately, for patient privacy protection, I am not able to share these datasets publicly as they come from clinical trial and research datasets. I am currently exploring options to publish a multistain dataset and can let you know how that goes, however if it works out it wouldn't be until mid-next year. Very sorry about that! |
Hello,
Your work is fantastic, and I truly appreciate the effort that went into it. However, I have a few questions about the processing of the two datasets. If you could share the scripts used for this, it would be very helpful for us. I assure you that I will properly cite your project.
Thank you!
The text was updated successfully, but these errors were encountered: