diff --git a/README.md b/README.md
index adf9bc4..1b76ca2 100644
--- a/README.md
+++ b/README.md
@@ -1,14 +1,12 @@
# HEST-Library: Bringing Spatial Transcriptomics and Histopathology together
## Designed for querying and assembling HEST-1k dataset
-\[ [arXiv](https://arxiv.org/abs/2406.16192) | [Download](https://huggingface.co/datasets/MahmoodLab/hest) | [Documentation](https://hest.readthedocs.io/en/latest/) | [Tutorials](https://github.com/mahmoodlab/HEST/tree/main/tutorials) \]
+\[ [arXiv](https://arxiv.org/abs/2406.16192) | [Data](https://huggingface.co/datasets/MahmoodLab/hest) | [Documentation](https://hest.readthedocs.io/en/latest/) | [Tutorials](https://github.com/mahmoodlab/HEST/tree/main/tutorials) | [Cite](https://github.com/mahmoodlab/hest?tab=readme-ov-file#citation) \]
-
-
Welcome to the official GitHub repository of the HEST-Library introduced in *"HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis", NeurIPS Spotlight, 2024*. This project was developed by the [Mahmood Lab](https://faisal.ai/) at Harvard Medical School and Brigham and Women's Hospital.
-HEST-1k, HEST-Library, and HEST-Benchmark are released under the Attribution-NonCommercial-ShareAlike 4.0 International license.
+
@@ -17,6 +15,8 @@ HEST-1k, HEST-Library, and HEST-Benchmark are released under the Attribution-Non
- **HEST-Library:** A series of helpers to assemble new ST samples (ST, Visium, Visium HD, Xenium) and work with HEST-1k (ST analysis, batch effect viz and correction, etc.)
- **HEST-Benchmark:** A new benchmark to assess the predictive performance of foundation models for histology in predicting gene expression from morphology
+HEST-1k, HEST-Library, and HEST-Benchmark are released under the Attribution-NonCommercial-ShareAlike 4.0 International license.
+
## Updates
@@ -85,27 +85,28 @@ In addition, we provide complete [documentation](https://hest.readthedocs.io/en/
## HEST-Benchmark
-The HEST-Benchmark was designed to assess foundation models for pathology under a new, diverse, and challenging benchmark. HEST-Benchmark includes 10 tasks for gene expression prediction (50 highly variable genes) from morphology (112 x 112 um regions at 0.5 um/px) in 10 different organs and 9 cancer types. We provide a step-by-step tutorial to run HEST-Benchmark and reproduce our results in [4-Running-HEST-Benchmark.ipynb](https://github.com/mahmoodlab/HEST/tree/main/tutorials/4-Running-HEST-Benchmark.ipynb).
+The HEST-Benchmark was designed to assess 11 foundation models for pathology under a new, diverse, and challenging benchmark. HEST-Benchmark includes nine tasks for gene expression prediction (50 highly variable genes) from morphology (112 x 112 um regions at 0.5 um/px) in nine different organs and eight cancer types. We provide a step-by-step tutorial to run HEST-Benchmark and reproduce our results in [4-Running-HEST-Benchmark.ipynb](https://github.com/mahmoodlab/HEST/tree/main/tutorials/4-Running-HEST-Benchmark.ipynb).
### HEST-Benchmark results (08.30.24)
-HEST-Benchmark was used to assess 10 publicly available models.
+HEST-Benchmark was used to assess 11 publicly available models.
Reported results are based on a Ridge Regression with PCA (256 factors). Ridge regression unfairly penalizes models with larger embedding dimensions. To ensure fair and objective comparison between models, we opted for PCA-reduction.
Model performance measured with Pearson correlation. Best is **bold**, second best
is _underlined_. Additional results based on Random Forest and XGBoost regression are provided in the paper.
-| **Dataset** | **[Hoptimus0](https://github.com/bioptimus/releases/blob/main/models/h-optimus/v0/LICENSE.md)** | **[Virchow2](https://huggingface.co/paige-ai/Virchow2)** | **[Virchow](https://huggingface.co/paige-ai/Virchow)** | **[UNI](https://huggingface.co/MahmoodLab/UNI)** | **[Gigapath](https://huggingface.co/prov-gigapath/prov-gigapath)** | **[CONCH](https://huggingface.co/MahmoodLab/CONCH)** | **[Phikon](https://huggingface.co/owkin/phikon)** | **[Remedis](https://arxiv.org/abs/2205.09723)** | **[CTransPath](https://www.sciencedirect.com/science/article/abs/pii/S1361841522002043)** | **[Resnet50](https://arxiv.org/abs/1512.03385)** | **[Plip](https://www.nature.com/articles/s41591-023-02504-3)** |
-|:--------------|----------------:|---------------:|--------------:|-------------:|---------------:|---------------:|-------------:|--------------:|-----------------:|---------------:|-----------:|
-| **IDC** | **0.5988** | 0.5903 | 0.5725 | 0.5718 | 0.5505 | 0.5363 | 0.5327 | 0.5304 | 0.511 | 0.4732 | 0.4717 |
-| **PRAD** | 0.3768 | 0.3478 | 0.3341 | 0.3095 | **0.3776** | 0.3548 | 0.342 | 0.3531 | 0.3427 | 0.306 | 0.2819 |
-| **PAAD** | **0.4936** | 0.4716 | 0.4926 | 0.478 | 0.476 | 0.4475 | 0.4441 | 0.4647 | 0.4378 | 0.386 | 0.4099 |
-| **SKCM** | **0.6521** | 0.613 | 0.6056 | 0.6344 | 0.5607 | 0.5784 | 0.5334 | 0.5816 | 0.5103 | 0.4825 | 0.5117 |
-| **COAD** | 0.3054 | 0.252 | **0.3115** | 0.2876 | 0.2595 | 0.2579 | 0.2573 | 0.2528 | 0.249 | 0.231 | 0.0518 |
-| **READ** | **0.2209** | 0.2109 | 0.1999 | 0.1822 | 0.1888 | 0.1617 | 0.1631 | 0.1216 | 0.1131 | 0.0842 | 0.0927 |
-| **CCRCC** | 0.2717 | **0.275** | 0.2638 | 0.2402 | 0.2436 | 0.2179 | 0.2423 | 0.2643 | 0.2279 | 0.218 | 0.1902 |
-| **LUNG** | **0.5605** | 0.5554 | 0.5433 | 0.5499 | 0.5412 | 0.5317 | 0.5522 | 0.538 | 0.5049 | 0.4919 | 0.4838 |
-| **LYMPH_IDC** | 0.2578 | **0.2598** | 0.2582 | 0.2537 | 0.2491 | 0.2507 | 0.2373 | 0.2465 | 0.2354 | 0.2284 | 0.2382 |
-| **AVG** | **0.4153** | 0.3973 | 0.3979 | 0.3897 | 0.383 | 0.3708 | 0.3672 | 0.3726 | 0.348 | 0.3224 | 0.3035 |
+| Model | IDC | PRAD | PAAD | SKCM | COAD | READ | ccRCC | LUAD | LYMPH IDC | Average |
+|------------------------|--------|--------|--------|--------|--------|--------|--------|--------|-----------|---------|
+| **[Resnet50](https://arxiv.org/abs/1512.03385)** | 0.4741 | 0.3075 | 0.3889 | 0.4822 | 0.2528 | 0.0812 | 0.2231 | 0.4917 | 0.2322 | 0.326 |
+| **[CTransPath](https://www.sciencedirect.com/science/article/abs/pii/S1361841522002043)** | 0.511 | 0.3427 | 0.4378 | 0.5106 | 0.2285 | 0.11 | 0.2279 | 0.4985 | 0.2353 | 0.3447 |
+| **[Phikon](https://huggingface.co/owkin/phikon)** | 0.5327 | 0.342 | 0.4432 | 0.5355 | 0.2585 | 0.1517 | 0.2423 | 0.5468 | 0.2373 | 0.3656 |
+| **[CONCH](https://huggingface.co/MahmoodLab/CONCH)** | 0.5363 | 0.3548 | 0.4475 | 0.5791 | 0.2533 | 0.1674 | 0.2179 | 0.5312 | 0.2507 | 0.3709 |
+| **[Remedis](https://arxiv.org/abs/2205.09723)** | 0.529 | 0.3471 | 0.4644 | 0.5818 | 0.2856 | 0.1145 | 0.2647 | 0.5336 | 0.2473 | 0.3742 |
+| **[Gigapath](https://huggingface.co/prov-gigapath/prov-gigapath)** | 0.5508 | _0.3708_ | 0.4768 | 0.5538 | _0.301_ | 0.186 | 0.2391 | 0.5399 | 0.2493 | 0.3853 |
+| **[UNI](https://huggingface.co/MahmoodLab/UNI)** | 0.5702 | 0.314 | 0.4764 | 0.6254 | 0.263 | 0.1762 | 0.2427 | 0.5511 | 0.2565 | 0.3862 |
+| **[Virchow](https://huggingface.co/paige-ai/Virchow)** | 0.5702 | 0.3309 | 0.4875 | 0.6088 | **0.311** | 0.2019 | 0.2637 | 0.5459 | 0.2594 | 0.3977 |
+| **[Virchow2](https://huggingface.co/paige-ai/Virchow2)** | 0.5922 | 0.3465 | 0.4661 | 0.6174 | 0.2578 | 0.2084 | **0.2788** | **0.5605** | 0.2582 | 0.3984 |
+| **UNIv1.5** | **0.5989** | 0.3645 | _0.4902_ | _0.6401_ | 0.2925 | _0.2240_ | 0.2522 | _0.5586_ | **0.2597** | _0.4090_ |
+| **[Hoptimus0](https://github.com/bioptimus/releases/blob/main/models/h-optimus/v0/LICENSE.md)** | _0.5982_ | **0.385** | **0.4932** | **0.6432** | 0.2991 | **0.2292** | _0.2654_ | 0.5582 | _0.2595_ | **0.4146** |
### Benchmarking your own model
@@ -122,6 +123,9 @@ Our tutorial in [4-Running-HEST-Benchmark.ipynb](https://github.com/mahmoodlab/H
## Citation
If you find our work useful in your research, please consider citing:
+
+Jaume, G., Doucet, P., Song, A. H., Lu, M. Y., Almagro-Perez, C., Wagner, S. J., Vaidya, A. J., Chen, R. J., Williamson, D. F. K., Kim, A., & Mahmood, F. HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis. _Advances in Neural Information Processing Systems_, December 2024.
+
```
@inproceedings{jaume2024hest,
author = {Guillaume Jaume and Paul Doucet and Andrew H. Song and Ming Y. Lu and Cristina Almagro-Perez and Sophia J. Wagner and Anurag J. Vaidya and Richard J. Chen and Drew F. K. Williamson and Ahrong Kim and Faisal Mahmood},
diff --git a/figures/fig1a.jpeg b/figures/fig1a.jpeg
index bd5c6e1..aeecfab 100644
Binary files a/figures/fig1a.jpeg and b/figures/fig1a.jpeg differ
diff --git a/tutorials/README.md b/tutorials/README.md
new file mode 100644
index 0000000..871a3c4
--- /dev/null
+++ b/tutorials/README.md
@@ -0,0 +1,28 @@
+# HEST-1k Tutorials
+
+Welcome to the HEST-1k tutorial repository! This set of tutorials provides a step-by-step guide to working with HEST-1k and the HEST-Library.
+
+## Tutorials
+
+### 1. Downloading HEST-1k.ipynb
+This notebook guides you through downloading the HEST-1k dataset using HuggingFace. It includes details on dataset structure and requirements.
+
+### 2. Interacting with HEST-1k.ipynb
+Learn how to load and explore HEST-1k data. This notebook introduces tools for inspecting data contents, exploring sample images, and performing initial analyses to understand dataset attributes.
+
+### 3. Assembling HEST Data.ipynb
+This tutorial provides instructions on assembling HEST data from raw files into a structured format ready for analysis.
+
+### 4. Running HEST Benchmark.ipynb
+Run benchmarks on the HEST-1k dataset.
+
+### 5. Batch-effect visualization.ipynb
+This notebook is dedicated to visualizing batch effects within the HEST-1k dataset. It covers methods to identify, understand, and mitigate batch effects.
+
+---
+
+## Contributions
+
+External contributions are welcome! If you have ideas for improving these tutorials or would like to contribute, please feel free to reach out to [gjaume@bwh.harvard.edu](mailto:gjaume@bwh.harvard.edu).
+
+If you encounter any issues, please check the GitHub Issues section, as other users might have already faced similar challenges.