Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions about baselines #19

Open
tranh215980 opened this issue Sep 11, 2024 · 3 comments
Open

questions about baselines #19

tranh215980 opened this issue Sep 11, 2024 · 3 comments

Comments

@tranh215980
Copy link

Dear authors,

First, I want to applaud you for your great paper acceptance, and thank you for making the model weights available. I read your paper carefully, but I still have questions I hope you can help me with.

  • In the introduction, you say: "However, these methods have two major limitations. First, they primarily focus on individual image tiles in the WSIs, without considering the interactions of different regions of the same tissue. Second, previous studies focused on narrow diagnostic tasks and did not evaluate the generalizability of the extracted quantitative imaging features in different prediction tasks across cancer types and samples from several sources."

I believe these limitations have some misplaced and missing citations. Supervised histopathology methods, like graph-based ones (Slide-Graph, Tea-Graph), can learn interactions in different parts of the same tissue. Also, self-supervised methods (HIPT, SLPD, GigaPath) can learn interactions too. CHIEF slide model is based on ABMIL, and does not learn interactions. Generalizability across cancer types and prediction tasks has been studied (UNI, GigaPath, and Virchow). These papers are not cited, but I think they should be mentioned.

  • In TCGA dataset, there are mostly primary tumors. So, can tumor origin be predicted by just predicting the tissue site? Why TOAD compared? TOAD task is for metastatic cancers, so not same task and not comparable.

  • What tile feature extractor is used in ABMIL, DSMIL, TransMIL, PORPOISE, and PC-CHiP baselines when comparing to CHIEF? In software and code section, you say "ResNet-50 with ImageNet Transfer" is used in CLAM, and that original codes were used for these models, so comparison to CLAM and maybe other MIL models are not most fair.

  • Are random init MIL models (ABMIL, CLAM, DSMIL, TransMIL, PORPOISE, PC-CHiP) good comparisons? CHIEF comes pretrained, but baselines are not. Because main contribution in the slide-level pretraining, CHIEF should be compared to other models with slide-level pretraining like GigaPath to HIPT with CTransPath features? HIPT with GigaPath features was studied in the GigaPath paper from May. Also, CHIEF compares with histopathology part of PORPOISE which is just a ABMIL, so it is not correct to say CHIEF compares with whole PORPOISE model.

  • Are the CHIEF results from the pretrained+finetuned model? If so, what is difference between pretrained-only and pretrained+finetuned CHIEF model?

  • I could not find the cross-validation splits for mutation and survival prediction tasks in the codes. The code seems incomplete. Can you please provide these splits to help others reproduce your results?

  • I also could not find the hyper-parameters for training and finetuning the MIL baselines. How were the MIL baselines trained? Did other models get same benefit from fusing with text embeddings like CHIEF does?

Thank you for your help. I have been reading many machine learning pathology papers, and trying to understand this one.

@Xiyue-Wang
Copy link
Collaborator

Xiyue-Wang commented Sep 12, 2024

  1. Thanks. UNI and CONCH have already been cited. We would like to clarify that the work was actually submitted in early 2023, the revision process took a long time before it was finally accepted in 2024.There are some new paper that have not been cited!
  2. TOAD is doing Prediction of tumor origin,the training data is also all tcga, he only has a separate binary classification to do primary and metastatic.
  3. Note that several of our weakly supervised methods (aggregation network) have been pretrained together using the same WSI as CHIEF .
  4. No.Note that several of our weakly supervised methods (aggregation network) have been pretrained together using the same WSI as CHIEF .PORPOISE was used to evaluate the prognostic task because of this very well-known method, which needs to be compared to the methods used for prognostication
  5. Tumor origin,biomaker and suvial tasks:pretrained+finetuned. Cancer detection:pretrained-only
  6. Please see https://github.com/hms-dbmi/CHIEF/tree/main/Downstream/Tumor_origin/src. Here is splits and config.

@tranh215980
Copy link
Author

tranh215980 commented Sep 12, 2024

Dear authors,

Thank you for fast and fair response. I have more qustions on your points:

Thanks. UNI and CONCH have already been cited. We would like to clarify that the work was actually submitted in early 2023, the revision process took a long time before it was finally accepted in 2024. There are some new paper that have not been cited!

I am not satisfied with this answer. Uni and others are cite but context of citation is with sentence "Second, previous studies focused on narrow diagnostic tasks and did not evaluate the generalizability of the extracted quantitative imaging features in different prediction tasks across cancer types and samples from several sources" which has conflict here (and also 1st claim).

TOAD is doing Prediction of tumor origin,the training data is also all tcga, he only has a separate binary classification to do primary and metastatic.

I dont know enough on training of TOAD, but I read that training used also harvard hospital data. My main argue is that this task is not real tumor origin prediction task but instead tissue site prediction task and only primary tumors are used do you agree?

Note that several of our weakly supervised methods (aggregation network) have been pretrained together using the same WSI as CHIEF.

What does "pretrained together using the same WSI as CHIEF" mean? You mean same WSI are used for train and test in comparisons? What about tile feature extractors for these comparisons?

No.Note that several of our weakly supervised methods (aggregation network) have been pretrained together using the same WSI as CHIEF .PORPOISE was used to evaluate the prognostic task because of this very well-known method, which needs to be compared to the methods used for prognostication

But porpoise is multimodal. If only histopathology subnet used is method still called porpoise?

Tumor origin,biomaker and suvial tasks:pretrained+finetuned. Cancer detection:pretrained-only.

Thank you. I could not read why there was difference in setting can you share?

Please see https://github.com/hms-dbmi/CHIEF/tree/main/Downstream/Tumor_origin/src. Here is splits and config.

I ask about mutation and surval tasks can these be uploaded too? Thank you.

@Dadatata-JZ
Copy link
Collaborator

Dadatata-JZ commented Sep 13, 2024

Hi Tran,

Thank you for your questions across the different tags. While the evaluation of CHIEF and other foundation models proposed by fellows in computational pathology was comprehensive, it did not encompass all available models. It would indeed be valuable to see further follow-up work that expands these comparisons.

At the time of CHIEF's submission, none of the foundation models (model weights) you mentioned had been released. There are also other factors to consider, e.g, COIs. The most widely accepted frameworks, which had been independently evaluated for various clinical tasks globally, were CLAM (for weakly-supervised learning in computational pathology) and PORPOISE (survival predictions).

I also see your confusions/concerns regarding word choices and definitions. We aimed to strike a balance between intuitive language and standard domain terminology to better serve a general audience.

I'm happy to set up a Zoom call to go over your questions in detail. Feel free to ping me directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants