Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guillaumejaume patch 1 #65

Merged
merged 4 commits into from
Nov 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 22 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
# HEST-Library: Bringing Spatial Transcriptomics and Histopathology together
## Designed for querying and assembling HEST-1k dataset

\[ [arXiv](https://arxiv.org/abs/2406.16192) | [Download](https://huggingface.co/datasets/MahmoodLab/hest) | [Documentation](https://hest.readthedocs.io/en/latest/) | [Tutorials](https://github.com/mahmoodlab/HEST/tree/main/tutorials) \]
\[ [arXiv](https://arxiv.org/abs/2406.16192) | [Data](https://huggingface.co/datasets/MahmoodLab/hest) | [Documentation](https://hest.readthedocs.io/en/latest/) | [Tutorials](https://github.com/mahmoodlab/HEST/tree/main/tutorials) | [Cite](https://github.com/mahmoodlab/hest?tab=readme-ov-file#citation) \]
<!-- [ArXiv (stay tuned)]() | [Interactive Demo](http://clam.mahmoodlab.org) | [Cite](#reference) -->

<img src="figures/fig1a.jpeg" width="450px" align="right" />

Welcome to the official GitHub repository of the HEST-Library introduced in *"HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis", NeurIPS Spotlight, 2024*. This project was developed by the [Mahmood Lab](https://faisal.ai/) at Harvard Medical School and Brigham and Women's Hospital.

HEST-1k, HEST-Library, and HEST-Benchmark are released under the Attribution-NonCommercial-ShareAlike 4.0 International license.
<img src="figures/fig1a.jpeg" />

<br/>

Expand All @@ -17,6 +15,8 @@ HEST-1k, HEST-Library, and HEST-Benchmark are released under the Attribution-Non
- **HEST-Library:** A series of helpers to assemble new ST samples (ST, Visium, Visium HD, Xenium) and work with HEST-1k (ST analysis, batch effect viz and correction, etc.)
- **HEST-Benchmark:** A new benchmark to assess the predictive performance of foundation models for histology in predicting gene expression from morphology

HEST-1k, HEST-Library, and HEST-Benchmark are released under the Attribution-NonCommercial-ShareAlike 4.0 International license.

<br/>

## Updates
Expand Down Expand Up @@ -85,27 +85,28 @@ In addition, we provide complete [documentation](https://hest.readthedocs.io/en/

## HEST-Benchmark

The HEST-Benchmark was designed to assess foundation models for pathology under a new, diverse, and challenging benchmark. HEST-Benchmark includes 10 tasks for gene expression prediction (50 highly variable genes) from morphology (112 x 112 um regions at 0.5 um/px) in 10 different organs and 9 cancer types. We provide a step-by-step tutorial to run HEST-Benchmark and reproduce our results in [4-Running-HEST-Benchmark.ipynb](https://github.com/mahmoodlab/HEST/tree/main/tutorials/4-Running-HEST-Benchmark.ipynb).
The HEST-Benchmark was designed to assess 11 foundation models for pathology under a new, diverse, and challenging benchmark. HEST-Benchmark includes nine tasks for gene expression prediction (50 highly variable genes) from morphology (112 x 112 um regions at 0.5 um/px) in nine different organs and eight cancer types. We provide a step-by-step tutorial to run HEST-Benchmark and reproduce our results in [4-Running-HEST-Benchmark.ipynb](https://github.com/mahmoodlab/HEST/tree/main/tutorials/4-Running-HEST-Benchmark.ipynb).

### HEST-Benchmark results (08.30.24)

HEST-Benchmark was used to assess 10 publicly available models.
HEST-Benchmark was used to assess 11 publicly available models.
Reported results are based on a Ridge Regression with PCA (256 factors). Ridge regression unfairly penalizes models with larger embedding dimensions. To ensure fair and objective comparison between models, we opted for PCA-reduction.
Model performance measured with Pearson correlation. Best is **bold**, second best
is _underlined_. Additional results based on Random Forest and XGBoost regression are provided in the paper.

| **Dataset** | **[Hoptimus0](https://github.com/bioptimus/releases/blob/main/models/h-optimus/v0/LICENSE.md)** | **[Virchow2](https://huggingface.co/paige-ai/Virchow2)** | **[Virchow](https://huggingface.co/paige-ai/Virchow)** | **[UNI](https://huggingface.co/MahmoodLab/UNI)** | **[Gigapath](https://huggingface.co/prov-gigapath/prov-gigapath)** | **[CONCH](https://huggingface.co/MahmoodLab/CONCH)** | **[Phikon](https://huggingface.co/owkin/phikon)** | **[Remedis](https://arxiv.org/abs/2205.09723)** | **[CTransPath](https://www.sciencedirect.com/science/article/abs/pii/S1361841522002043)** | **[Resnet50](https://arxiv.org/abs/1512.03385)** | **[Plip](https://www.nature.com/articles/s41591-023-02504-3)** |
|:--------------|----------------:|---------------:|--------------:|-------------:|---------------:|---------------:|-------------:|--------------:|-----------------:|---------------:|-----------:|
| **IDC** | **0.5988** | 0.5903 | 0.5725 | 0.5718 | 0.5505 | 0.5363 | 0.5327 | 0.5304 | 0.511 | 0.4732 | 0.4717 |
| **PRAD** | 0.3768 | 0.3478 | 0.3341 | 0.3095 | **0.3776** | 0.3548 | 0.342 | 0.3531 | 0.3427 | 0.306 | 0.2819 |
| **PAAD** | **0.4936** | 0.4716 | 0.4926 | 0.478 | 0.476 | 0.4475 | 0.4441 | 0.4647 | 0.4378 | 0.386 | 0.4099 |
| **SKCM** | **0.6521** | 0.613 | 0.6056 | 0.6344 | 0.5607 | 0.5784 | 0.5334 | 0.5816 | 0.5103 | 0.4825 | 0.5117 |
| **COAD** | 0.3054 | 0.252 | **0.3115** | 0.2876 | 0.2595 | 0.2579 | 0.2573 | 0.2528 | 0.249 | 0.231 | 0.0518 |
| **READ** | **0.2209** | 0.2109 | 0.1999 | 0.1822 | 0.1888 | 0.1617 | 0.1631 | 0.1216 | 0.1131 | 0.0842 | 0.0927 |
| **CCRCC** | 0.2717 | **0.275** | 0.2638 | 0.2402 | 0.2436 | 0.2179 | 0.2423 | 0.2643 | 0.2279 | 0.218 | 0.1902 |
| **LUNG** | **0.5605** | 0.5554 | 0.5433 | 0.5499 | 0.5412 | 0.5317 | 0.5522 | 0.538 | 0.5049 | 0.4919 | 0.4838 |
| **LYMPH_IDC** | 0.2578 | **0.2598** | 0.2582 | 0.2537 | 0.2491 | 0.2507 | 0.2373 | 0.2465 | 0.2354 | 0.2284 | 0.2382 |
| **AVG** | **0.4153** | 0.3973 | 0.3979 | 0.3897 | 0.383 | 0.3708 | 0.3672 | 0.3726 | 0.348 | 0.3224 | 0.3035 |
| Model | IDC | PRAD | PAAD | SKCM | COAD | READ | ccRCC | LUAD | LYMPH IDC | Average |
|------------------------|--------|--------|--------|--------|--------|--------|--------|--------|-----------|---------|
| **[Resnet50](https://arxiv.org/abs/1512.03385)** | 0.4741 | 0.3075 | 0.3889 | 0.4822 | 0.2528 | 0.0812 | 0.2231 | 0.4917 | 0.2322 | 0.326 |
| **[CTransPath](https://www.sciencedirect.com/science/article/abs/pii/S1361841522002043)** | 0.511 | 0.3427 | 0.4378 | 0.5106 | 0.2285 | 0.11 | 0.2279 | 0.4985 | 0.2353 | 0.3447 |
| **[Phikon](https://huggingface.co/owkin/phikon)** | 0.5327 | 0.342 | 0.4432 | 0.5355 | 0.2585 | 0.1517 | 0.2423 | 0.5468 | 0.2373 | 0.3656 |
| **[CONCH](https://huggingface.co/MahmoodLab/CONCH)** | 0.5363 | 0.3548 | 0.4475 | 0.5791 | 0.2533 | 0.1674 | 0.2179 | 0.5312 | 0.2507 | 0.3709 |
| **[Remedis](https://arxiv.org/abs/2205.09723)** | 0.529 | 0.3471 | 0.4644 | 0.5818 | 0.2856 | 0.1145 | 0.2647 | 0.5336 | 0.2473 | 0.3742 |
| **[Gigapath](https://huggingface.co/prov-gigapath/prov-gigapath)** | 0.5508 | _0.3708_ | 0.4768 | 0.5538 | _0.301_ | 0.186 | 0.2391 | 0.5399 | 0.2493 | 0.3853 |
| **[UNI](https://huggingface.co/MahmoodLab/UNI)** | 0.5702 | 0.314 | 0.4764 | 0.6254 | 0.263 | 0.1762 | 0.2427 | 0.5511 | 0.2565 | 0.3862 |
| **[Virchow](https://huggingface.co/paige-ai/Virchow)** | 0.5702 | 0.3309 | 0.4875 | 0.6088 | **0.311** | 0.2019 | 0.2637 | 0.5459 | 0.2594 | 0.3977 |
| **[Virchow2](https://huggingface.co/paige-ai/Virchow2)** | 0.5922 | 0.3465 | 0.4661 | 0.6174 | 0.2578 | 0.2084 | **0.2788** | **0.5605** | 0.2582 | 0.3984 |
| **UNIv1.5** | **0.5989** | 0.3645 | _0.4902_ | _0.6401_ | 0.2925 | _0.2240_ | 0.2522 | _0.5586_ | **0.2597** | _0.4090_ |
| **[Hoptimus0](https://github.com/bioptimus/releases/blob/main/models/h-optimus/v0/LICENSE.md)** | _0.5982_ | **0.385** | **0.4932** | **0.6432** | 0.2991 | **0.2292** | _0.2654_ | 0.5582 | _0.2595_ | **0.4146** |


### Benchmarking your own model
Expand All @@ -122,6 +123,9 @@ Our tutorial in [4-Running-HEST-Benchmark.ipynb](https://github.com/mahmoodlab/H
## Citation

If you find our work useful in your research, please consider citing:

Jaume, G., Doucet, P., Song, A. H., Lu, M. Y., Almagro-Perez, C., Wagner, S. J., Vaidya, A. J., Chen, R. J., Williamson, D. F. K., Kim, A., & Mahmood, F. HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis. _Advances in Neural Information Processing Systems_, December 2024.

```
@inproceedings{jaume2024hest,
author = {Guillaume Jaume and Paul Doucet and Andrew H. Song and Ming Y. Lu and Cristina Almagro-Perez and Sophia J. Wagner and Anurag J. Vaidya and Richard J. Chen and Drew F. K. Williamson and Ahrong Kim and Faisal Mahmood},
Expand Down
Binary file modified figures/fig1a.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 28 additions & 0 deletions tutorials/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# HEST-1k Tutorials

Welcome to the HEST-1k tutorial repository! This set of tutorials provides a step-by-step guide to working with HEST-1k and the HEST-Library.

## Tutorials

### 1. Downloading HEST-1k.ipynb
This notebook guides you through downloading the HEST-1k dataset using HuggingFace. It includes details on dataset structure and requirements.

### 2. Interacting with HEST-1k.ipynb
Learn how to load and explore HEST-1k data. This notebook introduces tools for inspecting data contents, exploring sample images, and performing initial analyses to understand dataset attributes.

### 3. Assembling HEST Data.ipynb
This tutorial provides instructions on assembling HEST data from raw files into a structured format ready for analysis.

### 4. Running HEST Benchmark.ipynb
Run benchmarks on the HEST-1k dataset.

### 5. Batch-effect visualization.ipynb
This notebook is dedicated to visualizing batch effects within the HEST-1k dataset. It covers methods to identify, understand, and mitigate batch effects.

---

## Contributions

External contributions are welcome! If you have ideas for improving these tutorials or would like to contribute, please feel free to reach out to [[email protected]](mailto:[email protected]).

If you encounter any issues, please check the GitHub Issues section, as other users might have already faced similar challenges.
Loading