mahmoodlab · guillaumejaume · Nov 2, 2024 · Nov 1, 2024 · Nov 1, 2024 · Nov 1, 2024
diff --git a/README.md b/README.md
@@ -1,14 +1,12 @@
 # HEST-Library: Bringing Spatial Transcriptomics and Histopathology together
 ## Designed for querying and assembling HEST-1k dataset 
 
-\[ [arXiv](https://arxiv.org/abs/2406.16192) | [Download](https://huggingface.co/datasets/MahmoodLab/hest) | [Documentation](https://hest.readthedocs.io/en/latest/) | [Tutorials](https://github.com/mahmoodlab/HEST/tree/main/tutorials) \]
+\[ [arXiv](https://arxiv.org/abs/2406.16192) | [Data](https://huggingface.co/datasets/MahmoodLab/hest) | [Documentation](https://hest.readthedocs.io/en/latest/) | [Tutorials](https://github.com/mahmoodlab/HEST/tree/main/tutorials) | [Cite](https://github.com/mahmoodlab/hest?tab=readme-ov-file#citation) \]
 <!-- [ArXiv (stay tuned)]() | [Interactive Demo](http://clam.mahmoodlab.org) | [Cite](#reference) -->
 
-<img src="figures/fig1a.jpeg" width="450px" align="right" />
-
 Welcome to the official GitHub repository of the HEST-Library introduced in *"HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis", NeurIPS Spotlight, 2024*. This project was developed by the [Mahmood Lab](https://faisal.ai/) at Harvard Medical School and Brigham and Women's Hospital. 
 
-HEST-1k, HEST-Library, and HEST-Benchmark are released under the Attribution-NonCommercial-ShareAlike 4.0 International license. 
+<img src="figures/fig1a.jpeg" />
 
 <br/>
 
@@ -17,6 +15,8 @@ HEST-1k, HEST-Library, and HEST-Benchmark are released under the Attribution-Non
 - **HEST-Library:** A series of helpers to assemble new ST samples (ST, Visium, Visium HD, Xenium) and work with HEST-1k (ST analysis, batch effect viz and correction, etc.)
 - **HEST-Benchmark:** A new benchmark to assess the predictive performance of foundation models for histology in predicting gene expression from morphology 
 
+HEST-1k, HEST-Library, and HEST-Benchmark are released under the Attribution-NonCommercial-ShareAlike 4.0 International license. 
+
 <br/>
 
 ## Updates
@@ -85,27 +85,28 @@ In addition, we provide complete [documentation](https://hest.readthedocs.io/en/
 
 ## HEST-Benchmark
 
-The HEST-Benchmark was designed to assess foundation models for pathology under a new, diverse, and challenging benchmark. HEST-Benchmark includes 10 tasks for gene expression prediction (50 highly variable genes) from morphology (112 x 112 um regions at 0.5 um/px) in 10 different organs and 9 cancer types. We provide a step-by-step tutorial to run HEST-Benchmark and reproduce our results in [4-Running-HEST-Benchmark.ipynb](https://github.com/mahmoodlab/HEST/tree/main/tutorials/4-Running-HEST-Benchmark.ipynb).
+The HEST-Benchmark was designed to assess 11 foundation models for pathology under a new, diverse, and challenging benchmark. HEST-Benchmark includes nine tasks for gene expression prediction (50 highly variable genes) from morphology (112 x 112 um regions at 0.5 um/px) in nine different organs and eight cancer types. We provide a step-by-step tutorial to run HEST-Benchmark and reproduce our results in [4-Running-HEST-Benchmark.ipynb](https://github.com/mahmoodlab/HEST/tree/main/tutorials/4-Running-HEST-Benchmark.ipynb).
 
 ### HEST-Benchmark results (08.30.24)
 
-HEST-Benchmark was used to assess 10 publicly available models.
+HEST-Benchmark was used to assess 11 publicly available models.
 Reported results are based on a Ridge Regression with PCA (256 factors). Ridge regression unfairly penalizes models with larger embedding dimensions. To ensure fair and objective comparison between models, we opted for PCA-reduction. 
 Model performance measured with Pearson correlation. Best is **bold**, second best
 is _underlined_. Additional results based on Random Forest and XGBoost regression are provided in the paper. 
 
-| **Dataset**   |   **[Hoptimus0](https://github.com/bioptimus/releases/blob/main/models/h-optimus/v0/LICENSE.md)** |   **[Virchow2](https://huggingface.co/paige-ai/Virchow2)** |   **[Virchow](https://huggingface.co/paige-ai/Virchow)** |   **[UNI](https://huggingface.co/MahmoodLab/UNI)** |   **[Gigapath](https://huggingface.co/prov-gigapath/prov-gigapath)** |   **[CONCH](https://huggingface.co/MahmoodLab/CONCH)** |   **[Phikon](https://huggingface.co/owkin/phikon)** |   **[Remedis](https://arxiv.org/abs/2205.09723)** |   **[CTransPath](https://www.sciencedirect.com/science/article/abs/pii/S1361841522002043)** |   **[Resnet50](https://arxiv.org/abs/1512.03385)** |   **[Plip](https://www.nature.com/articles/s41591-023-02504-3)** |
-|:--------------|----------------:|---------------:|--------------:|-------------:|---------------:|---------------:|-------------:|--------------:|-----------------:|---------------:|-----------:|
-| **IDC**       |          **0.5988** |         0.5903 |        0.5725 |       0.5718 |         0.5505 |         0.5363 |       0.5327 |        0.5304 |           0.511  |         0.4732 |     0.4717 |
-| **PRAD**      |          0.3768 |         0.3478 |        0.3341 |       0.3095 |         **0.3776** |         0.3548 |       0.342  |        0.3531 |           0.3427 |         0.306  |     0.2819 |
-| **PAAD**      |          **0.4936** |         0.4716 |        0.4926 |       0.478  |         0.476  |         0.4475 |       0.4441 |        0.4647 |           0.4378 |         0.386  |     0.4099 |
-| **SKCM**      |          **0.6521** |         0.613  |        0.6056 |       0.6344 |         0.5607 |         0.5784 |       0.5334 |        0.5816 |           0.5103 |         0.4825 |     0.5117 |
-| **COAD**      |          0.3054 |         0.252  |        **0.3115** |       0.2876 |         0.2595 |         0.2579 |       0.2573 |        0.2528 |           0.249  |         0.231  |     0.0518 |
-| **READ**      |          **0.2209** |         0.2109 |        0.1999 |       0.1822 |         0.1888 |         0.1617 |       0.1631 |        0.1216 |           0.1131 |         0.0842 |     0.0927 |
-| **CCRCC**     |          0.2717 |         **0.275**  |        0.2638 |       0.2402 |         0.2436 |         0.2179 |       0.2423 |        0.2643 |           0.2279 |         0.218  |     0.1902 |
-| **LUNG**      |          **0.5605** |         0.5554 |        0.5433 |       0.5499 |         0.5412 |         0.5317 |       0.5522 |        0.538  |           0.5049 |         0.4919 |     0.4838 |
-| **LYMPH_IDC** |          0.2578 |         **0.2598** |        0.2582 |       0.2537 |         0.2491 |         0.2507 |       0.2373 |        0.2465 |           0.2354 |         0.2284 |     0.2382 |
-| **AVG**       |          **0.4153** |         0.3973 |        0.3979 |       0.3897 |         0.383  |         0.3708 |       0.3672 |        0.3726 |           0.348  |         0.3224 |     0.3035 |
+| Model                  | IDC    | PRAD   | PAAD   | SKCM   | COAD   | READ   | ccRCC  | LUAD   | LYMPH IDC | Average |
+|------------------------|--------|--------|--------|--------|--------|--------|--------|--------|-----------|---------|
+| **[Resnet50](https://arxiv.org/abs/1512.03385)**      | 0.4741 | 0.3075 | 0.3889 | 0.4822 | 0.2528 | 0.0812 | 0.2231 | 0.4917 | 0.2322    | 0.326   |
+| **[CTransPath](https://www.sciencedirect.com/science/article/abs/pii/S1361841522002043)**         | 0.511  | 0.3427 | 0.4378 | 0.5106 | 0.2285 | 0.11   | 0.2279 | 0.4985 | 0.2353    | 0.3447  |
+| **[Phikon](https://huggingface.co/owkin/phikon)**            | 0.5327 | 0.342  | 0.4432 | 0.5355 | 0.2585 | 0.1517 | 0.2423 | 0.5468 | 0.2373    | 0.3656  |
+| **[CONCH](https://huggingface.co/MahmoodLab/CONCH)**             | 0.5363 | 0.3548 | 0.4475 | 0.5791 | 0.2533 | 0.1674 | 0.2179 | 0.5312 | 0.2507    | 0.3709  |
+| **[Remedis](https://arxiv.org/abs/2205.09723)**            | 0.529  | 0.3471 | 0.4644 | 0.5818 | 0.2856 | 0.1145 | 0.2647 | 0.5336 | 0.2473    | 0.3742  |
+| **[Gigapath](https://huggingface.co/prov-gigapath/prov-gigapath)**          | 0.5508 | _0.3708_ | 0.4768 | 0.5538 | _0.301_ | 0.186 | 0.2391 | 0.5399 | 0.2493    | 0.3853  |
+| **[UNI](https://huggingface.co/MahmoodLab/UNI)**                | 0.5702 | 0.314  | 0.4764 | 0.6254 | 0.263  | 0.1762 | 0.2427 | 0.5511 | 0.2565    | 0.3862  |
+| **[Virchow](https://huggingface.co/paige-ai/Virchow)**            | 0.5702 | 0.3309 | 0.4875 | 0.6088 | **0.311** | 0.2019 | 0.2637 | 0.5459 | 0.2594    | 0.3977  |
+| **[Virchow2](https://huggingface.co/paige-ai/Virchow2)**           | 0.5922 | 0.3465 | 0.4661 | 0.6174 | 0.2578 | 0.2084 | **0.2788** | **0.5605** | 0.2582    | 0.3984  |
+| **UNIv1.5**            | **0.5989** | 0.3645 | _0.4902_ | _0.6401_ | 0.2925 | _0.2240_ | 0.2522 | _0.5586_ | **0.2597** | _0.4090_ |
+| **[Hoptimus0](https://github.com/bioptimus/releases/blob/main/models/h-optimus/v0/LICENSE.md)**        | _0.5982_ | **0.385** | **0.4932** | **0.6432** | 0.2991 | **0.2292** | _0.2654_ | 0.5582 | _0.2595_ | **0.4146** |
 
 
 ### Benchmarking your own model
@@ -122,6 +123,9 @@ Our tutorial in [4-Running-HEST-Benchmark.ipynb](https://github.com/mahmoodlab/H
 ## Citation
 
 If you find our work useful in your research, please consider citing:
+
+Jaume, G., Doucet, P., Song, A. H., Lu, M. Y., Almagro-Perez, C., Wagner, S. J., Vaidya, A. J., Chen, R. J., Williamson, D. F. K., Kim, A., & Mahmood, F. HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis. _Advances in Neural Information Processing Systems_, December 2024.
+
 ```
 @inproceedings{jaume2024hest,
     author = {Guillaume Jaume and Paul Doucet and Andrew H. Song and Ming Y. Lu and Cristina Almagro-Perez and Sophia J. Wagner and Anurag J. Vaidya and Richard J. Chen and Drew F. K. Williamson and Ahrong Kim and Faisal Mahmood},

diff --git a/figures/fig1a.jpeg b/figures/fig1a.jpeg
diff --git a/tutorials/README.md b/tutorials/README.md
@@ -0,0 +1,28 @@
+# HEST-1k Tutorials
+
+Welcome to the HEST-1k tutorial repository! This set of tutorials provides a step-by-step guide to working with HEST-1k and the HEST-Library. 
+
+## Tutorials
+
+### 1. Downloading HEST-1k.ipynb
+This notebook guides you through downloading the HEST-1k dataset using HuggingFace. It includes details on dataset structure and requirements. 
+
+### 2. Interacting with HEST-1k.ipynb
+Learn how to load and explore HEST-1k data. This notebook introduces tools for inspecting data contents, exploring sample images, and performing initial analyses to understand dataset attributes.
+
+### 3. Assembling HEST Data.ipynb
+This tutorial provides instructions on assembling HEST data from raw files into a structured format ready for analysis. 
+
+### 4. Running HEST Benchmark.ipynb
+Run benchmarks on the HEST-1k dataset. 
+
+### 5. Batch-effect visualization.ipynb
+This notebook is dedicated to visualizing batch effects within the HEST-1k dataset. It covers methods to identify, understand, and mitigate batch effects. 
+
+---
+
+## Contributions
+
+External contributions are welcome! If you have ideas for improving these tutorials or would like to contribute, please feel free to reach out to [[email protected]](mailto:[email protected]).
+
+If you encounter any issues, please check the GitHub Issues section, as other users might have already faced similar challenges.