-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions About Pretraining and Baselines #18
Comments
Hi xiyue,
Thanks, |
|
Thanks
|
|
@Xiyue-Wang
and
I am a little bit confused about them. If only anatomical information is used, the weakly supervised task is to predict which anatomical site the WSI comes from, which is already encoded in the text encoder. If the slide-level diagnostic labels are used for weakly supervised pre-training, then the task is cancer classification, which makes sense to me that the slide-level aggregator can be directly used to infer cancer types. |
Yes ,you are great! If the slide-level diagnostic labels are used for weakly supervised pre-training, then the task is cancer classification, which makes sense to me that the slide-level aggregator can be directly used to infer cancer types. |
Hi @guillaumejaume . Your questions and discussions are always welcome! And congrats to you as well. Your contributions to UNI and other studies have been insightful for advancing computational pathology. In addition to Xiyue's responses, I wanted to add some to help facilitate the discussion on two of your questions. Kindly let us know! The supplemental material includes some information about UNI (Chen et al., Nat. Medicine, 2024) e.g., in Supp Table 26. Could you provide information about the evaluation scenario? The caption reports that UNI used carefully curated image patches from the TCGA-COADREAD dataset through supervised learning techniques. As UNI is a patch encoder trained with SSL using DINOv2, I'm not sure I understand what this means. To make sure we don't mis-interpret UNI, we extracted the numbers from the publication for comparison. At that moment, while it was on the arxiv, this part should be identical as the later accepted version. We summarized this ("UNI used carefully curated image patches from the TCGA-COADREAD dataset through supervised learning techniques") based on our understanding of the UNI paper on arXiv (pages 34–45). We took a quick calculation of the sum of the regions of interest (ROIs), which do not cover the sum of all available WSIs (nothing wrong since some artifacts or white background should be excluded). Therefore, we used "carefully curated" to make it clear. As the highlighted in bold, two labels were used for both training and evaluation, so we stated that the linear prober tuning was under supervision. CRC MSI prediction based on TCGA CRC-MSI 3, 200: Are you using slide-level diagnostic labels (e.g., TCGA oncotree code) during CHIEF pretraining? You report that The second stage requires only WSI-level labels, enabling CHIEF to construct a holistic understanding of pathology images from global features. If so what is the objective used? The section CHIEF pretraining details mention both weakly-supervised learning and wsi-level contrastive learning. If diagnostic labels were used during pretraining, this presents potentially an unfair comparison with weakly-supervised methods like CLAM/DSMIL/ABMIL that presumably only use labels in the downstream task, e.g., Breast subtyping becomes a lot easier if explicitly trained for this task during pretraining. **We used anatomical sites (e.g., BREAST, LUNG) rather than TCGA labels (e.g., LUAD, LUSC). Nevertheless, understood BRCA subtyping was greatly included in UNI's evaluation, but not in CHIEF's report or claimed goals. ** |
Hi @HHHedo, Great question! We never directly predicted the cancer type (e.g., brca, crc, luad etc as Guillaume asked, but positive or negative); instead, we performed a binary classification as malignancy detection for task 1 and predicted the tumor origin for task 2. In reality, we should always know the anatomical site from which the tissue is obtained. However, it is not necessary to be the tumor origin. I hope this clears up your confusion. Pls lmk! |
Hi @Dadatata-JZ, Thanks for your kind help! The following is my understanding.
, but was contradicted to your answer
, which confused me again. Hopefully, the code of pre-training will be released. Many thanks! In the downstream tasks, since the downstream codes have already been released, I noticed that the biomarker and cancer cell detection used the text encoder, while the tasks of survival and tumor origin used only the image encoder. thanks, |
Tiancheng, no worries at all. All good questions. Sorry for confusing you. They are not contradictory bc my response was referring to the fact that cancer type (e.g., brca, crc, luad etc as Guillaume asked) inference was never in the four major downstream tasks (i.e., cancer detection, tumor origin, molecular classification, survival prediction), while figure 1 caption, Xiyue's and your understanding should relate to its use (cancer type inference [i.e., positive, negative]) in pre-training (see method) . I may misunderstand the context of your post, "If the slide-level diagnostic labels are used for weakly supervised pre-training, then the task is cancer classification, which makes sense to me that the slide-level aggregator can be directly used to infer cancer types." For fine-tuning a specific downstream task, such as genetic profiling or prognostic predictions, where both internal and external validations focus on the same cancer type, incorporating text embeddings is unnecessary. |
Hi @Dadatata-JZ @Xiyue-Wang , Weakly-supervised pre-training: Downstream tasks:
So, no matter directly inferring or fine-tuning, the key assumption here is that the downstream datasets are limited by the pre-training anatomic sites. Am I right? Is there any way to make CHIEF extensible to other cancer categories beyond the 19 types? |
@HHHedo Happy to elaborate further on this. Opinions are my own. In CHIEF, we’re working with 19 anatomic sites (just to clarify, histologically they should cover more than 19 distinct cancer types). For example, both lung adenocarcinoma and lung squamous cell carcinoma fall under the "Lung" category. Many other cancer types (e.g., leukemia) may originate from organs that CHIEF doesn't currently cover. Histologically, cancers across different sites may display similar patterns of abnormal cell growth, invasion, and differentiation, regardless of their anatomical origin, which hopefully can help CHIEF or other foundation models expand to include other uncovered sites and cancer types. However, it is still an open question. More further investigations are encouraging as more real-world data becomes available. We’ve already been running some trials, stay tuned! Please feel free to reach out via email for any brainstorming for designing experiments and models to answer these research questions, such as how foundation models can be generalized to the unseen. Will be interesting! |
@HHHedo Hi Tiancheng, it's unlikely that we would get a clear answer about the architecture and pretraining. It took them days and people calling them out (#20 (comment), #23) to acknowledge that their method was largely similar to SCL-WC with text embedding contrasting, recent evidence suggests that such complex pre-training is rather useless (#24) even on the anatomic sites used for training. We will extend this analysis to every tissue type included in CHIEF and all of their tasks, others are welcome to do the same and we are certain they will reach the same conclusion. The only real way to fully introspect and take apart how they pre-trained their model would be to look at the training code and investigate the reproducibility which the authors have not released even though they indicated they would do so in the nature article, "The source codes for CHIEF are available at https://github.com/hms-dbmi/CHIEF." (https://www.nature.com/articles/s41586-024-07894-z#code-availability) |
Hi,
Congrats on your accepted work! I'd have some questions to understand the model architecture and performance better.
What patch encoder did you use in the CLAM baseline? Is it based on ResNet50 pretrained on ImageNet or based on CHIEF features? What about the ABMIL and DSMIL baselines? I couldn't find this information unless I missed it (which is likely).
The supplemental material includes some information about UNI (Chen et al., Nat. Medicine, 2024) e.g., in Supp Table 26. Could you provide information about the evaluation scenario? The caption reports that
UNI used carefully curated image patches from the TCGA-COADREAD dataset through supervised learning techniques.
As UNI is a patch encoder trained with SSL using DINOv2, I'm not sure I understand what this means.Are you using slide-level diagnostic labels (e.g., TCGA oncotree code) during CHIEF pretraining? You report that
The second stage requires only WSI-level labels, enabling CHIEF to construct a holistic understanding of pathology images from global features.
If so what is the objective used? The section CHIEF pretraining details mention both weakly-supervised learning and wsi-level contrastive learning. If diagnostic labels were used during pretraining, this presents potentially an unfair comparison with weakly-supervised methods like CLAM/DSMIL/ABMIL that presumably only use labels in the downstream task, e.g., Breast subtyping becomes a lot easier if explicitly trained for this task during pretraining.Have you included all TCGA/PANDA slides in the 60K slides for CHIEF pretraining?
I realize this is a lot of questions, but your input would be very helpful in guiding me through the world of slide SSL.
Thanks!
Guillaume
The text was updated successfully, but these errors were encountered: