-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
questions about baselines #19
Comments
|
Dear authors, Thank you for fast and fair response. I have more qustions on your points:
I am not satisfied with this answer. Uni and others are cite but context of citation is with sentence "Second, previous studies focused on narrow diagnostic tasks and did not evaluate the generalizability of the extracted quantitative imaging features in different prediction tasks across cancer types and samples from several sources" which has conflict here (and also 1st claim).
I dont know enough on training of TOAD, but I read that training used also harvard hospital data. My main argue is that this task is not real tumor origin prediction task but instead tissue site prediction task and only primary tumors are used do you agree?
What does "pretrained together using the same WSI as CHIEF" mean? You mean same WSI are used for train and test in comparisons? What about tile feature extractors for these comparisons?
But porpoise is multimodal. If only histopathology subnet used is method still called porpoise?
Thank you. I could not read why there was difference in setting can you share?
I ask about mutation and surval tasks can these be uploaded too? Thank you. |
Hi Tran, Thank you for your questions across the different tags. While the evaluation of CHIEF and other foundation models proposed by fellows in computational pathology was comprehensive, it did not encompass all available models. It would indeed be valuable to see further follow-up work that expands these comparisons. At the time of CHIEF's submission, none of the foundation models (model weights) you mentioned had been released. There are also other factors to consider, e.g, COIs. The most widely accepted frameworks, which had been independently evaluated for various clinical tasks globally, were CLAM (for weakly-supervised learning in computational pathology) and PORPOISE (survival predictions). I also see your confusions/concerns regarding word choices and definitions. We aimed to strike a balance between intuitive language and standard domain terminology to better serve a general audience. I'm happy to set up a Zoom call to go over your questions in detail. Feel free to ping me directly. |
Dear authors,
First, I want to applaud you for your great paper acceptance, and thank you for making the model weights available. I read your paper carefully, but I still have questions I hope you can help me with.
I believe these limitations have some misplaced and missing citations. Supervised histopathology methods, like graph-based ones (Slide-Graph, Tea-Graph), can learn interactions in different parts of the same tissue. Also, self-supervised methods (HIPT, SLPD, GigaPath) can learn interactions too. CHIEF slide model is based on ABMIL, and does not learn interactions. Generalizability across cancer types and prediction tasks has been studied (UNI, GigaPath, and Virchow). These papers are not cited, but I think they should be mentioned.
In TCGA dataset, there are mostly primary tumors. So, can tumor origin be predicted by just predicting the tissue site? Why TOAD compared? TOAD task is for metastatic cancers, so not same task and not comparable.
What tile feature extractor is used in ABMIL, DSMIL, TransMIL, PORPOISE, and PC-CHiP baselines when comparing to CHIEF? In software and code section, you say "ResNet-50 with ImageNet Transfer" is used in CLAM, and that original codes were used for these models, so comparison to CLAM and maybe other MIL models are not most fair.
Are random init MIL models (ABMIL, CLAM, DSMIL, TransMIL, PORPOISE, PC-CHiP) good comparisons? CHIEF comes pretrained, but baselines are not. Because main contribution in the slide-level pretraining, CHIEF should be compared to other models with slide-level pretraining like GigaPath to HIPT with CTransPath features? HIPT with GigaPath features was studied in the GigaPath paper from May. Also, CHIEF compares with histopathology part of PORPOISE which is just a ABMIL, so it is not correct to say CHIEF compares with whole PORPOISE model.
Are the CHIEF results from the pretrained+finetuned model? If so, what is difference between pretrained-only and pretrained+finetuned CHIEF model?
I could not find the cross-validation splits for mutation and survival prediction tasks in the codes. The code seems incomplete. Can you please provide these splits to help others reproduce your results?
I also could not find the hyper-parameters for training and finetuning the MIL baselines. How were the MIL baselines trained? Did other models get same benefit from fusing with text embeddings like CHIEF does?
Thank you for your help. I have been reading many machine learning pathology papers, and trying to understand this one.
The text was updated successfully, but these errors were encountered: