Skip to content
This repository has been archived by the owner on Sep 19, 2024. It is now read-only.

Commit

Permalink
cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
jkobject committed Jul 26, 2024
1 parent a303f00 commit ce43e2a
Show file tree
Hide file tree
Showing 32 changed files with 1,153 additions and 4,202 deletions.
64 changes: 42 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,24 +8,30 @@
[![Downloads](https://pepy.tech/badge/scprint/week)](https://pepy.tech/project/scprint)
[![GitHub issues](https://img.shields.io/github/issues/jkobject/scPRINT)](https://img.shields.io/github/issues/jkobject/scPRINT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![DOI](https://zenodo.org/badge/391909874.svg)](https://zenodo.org/badge/latestdoi/391909874)
[![DOI](https://zenodo.org/badge/391909874.svg)]()

![logo](logo.png)

scPRINT is a novel transformer model for the inference of gene network (connections between genes explaining the cell's expression profile) from scRNAseq data.
scPRINT is a large transformer model built for the inference of gene network (connections between genes explaining the cell's expression profile) from scRNAseq data.

It uses novel encoding and decoding schemes as well as new pre-training methodologies to learn a model of the cell. s
It uses novel encoding and decoding of the cell expression profile as well as new pre-training methodologies to learn a cell model.

scPRINT can do lots of things:
- like denoising
scPRINT can do lots of things:

[Read the paper!]() if you want to know more about the model.
- __expression denoising__: increase the resolution of your scRNAseq data
- __cell embedding__: generate a low-dimensional representation of your dataset
- __label prediction__: predict the cell type, disease, sequencer, sex, and ethnicity of your cells
- __gene network inference__: generate a gene network from any cell or cell cluster in your scRNAseq dataset

[Read the paper!]() if you want to know more about scPRINT.

![figure1](figure1.png)

## Install it from PyPI

If you want to be using flashattention2, know that it only supports torch==2.0.0 for now.
If you want to be using flashattention2, know that it only supports triton 2.0 MLIR's version and torch==2.0.0 for now.

👷 WIP ...

<!---
Expand All @@ -50,10 +56,13 @@ You should be good to go. You need those specific versions for everything to wor
This is not my fault, scream at nvidia :wink:
-->

### in dev mode
## Install it in dev mode

For the moment scPRINT has been tested on MacOS and Linux (Ubuntu 20.04) with Python 3.10.

If you want to be using flashattention2, know that it only supports triton 2.0 MLIR's version and torch==2.0.0 for now.


```python
conda create -n "[whatever]" python==3.10
git clone https://github.com/jkcobject/scPRINT
Expand All @@ -76,19 +85,24 @@ pip install triton==2.0.0.dev20221202 --no-deps # only if you have a compatible
mkdocs serve # to view the dev documentation
```

We use additional packages developped /// link to websites
We use additional packages we developped, refer to their documentation for more information:

- [scDataLoader](https://github.com/jkobject/scDataLoader): a dataloader for training large cell models.
- [GRnnData](https://github.com/cantinilab/GRnnData): a package to work with gene networks from single cell data.
- [benGRN](https://github.com/jkobject/benGRN): a package to benchmark gene network inference methods from single cell data.

### lamin.ai // highly recommended in install
### lamin.ai

if you want to use the scDataloader's multi dataset mode or some of the functions of the model you might need to use lamin.ai, in that case connect with google or github to [lamin.ai](https://lamin.ai/login), then be sure to connect before running anything (or before starting a notebook): `lamin login <email> --key <API-key>`
⚠️ if you want to use the scDataloader's multi dataset mode or if you want to preprocess datasets and other functions of the model, you will need to use lamin.ai.

In that case connect with google or github to [lamin.ai](https://lamin.ai/login), then be sure to connect before running anything (or before starting a notebook): `lamin login <email> --key <API-key>`. Follow the instructions on [their website](https://docs.lamin.ai/guide).

## Usage

### scPRINT's basic commands

This is the most minimal example of how scprint gets used:


```py
from lightning.pytorch import Trainer
from scprint import scPrint
Expand All @@ -109,7 +123,7 @@ $ scprint fit/train/predict/test --config config/[medium|large|vlarge] ...

### Notes on GPU/CPU usage with triton

If you do not have triton installed you will not be able to take advantage of gpu acceleration, but you can still use the model on the cpu.
If you do not have [triton](https://triton-lang.org/main/python-api/triton.html) installed you will not be able to take advantage of gpu acceleration, but you can still use the model on the cpu.

In that case, if loading from a checkpoint that was trained with flashattention, you will need to specify `transformer="normal"` in the `load_from_checkpoint` function like so:

Expand All @@ -119,27 +133,27 @@ model = scPrint.load_from_checkpoint(
transformer="normal")
```

We will explore here the different usages of scPRINT:
We now explore the different usages of scPRINT:

### I want to generate gene networks from scRNAseq data:

-> refer to the section 1. gene network inference in [this notebook](./notebooks/cancer_usecase.ipynb#)
-> refer to the section 1. gene network inference in [this notebook](./notebooks/cancer_usecase.ipynb#).

-> more examples in this notebook [./notebooks/bench_omni.ipynb](./notebooks/bench_omni.ipynb)
-> more examples in this notebook [./notebooks/assessments/bench_omni.ipynb](./notebooks/assessments/bench_omni.ipynb).

### I want to generate embeddings and cell annotations from scRNAseq data:
### I want to generate cell embeddings and cell label predictions from scRNAseq data:

-> refer to the embeddings and cell annotations section in [this notebook](./notebooks/cancer_usecase.ipynb)
-> refer to the embeddings and cell annotations section in [this notebook](./notebooks/cancer_usecase.ipynb).

### I want to generate oversampling of counts (denoising) of my scRNAseq data:
### I want to denoising my scRNAseq dataset:

-> refer to the Denoising of B-cell section in [this notebook](./notebooks/cancer_usecase.ipynb)
-> refer to the Denoising of B-cell section in [this notebook](./notebooks/cancer_usecase.ipynb).

-> More example in our benchmark notebook [./notebooks/bench_denoising.ipynb](./notebooks/bench_denoising.ipynb)
-> More example in our benchmark notebook [./notebooks/assessments/bench_denoising.ipynb](./notebooks/assessments/bench_denoising.ipynb).

### I want to generate an atlas level embedding

-> refer to the notebook [nice_umap.ipynb](./figures/nice_umap.ipynb)
-> refer to the notebook [nice_umap.ipynb](./figures/nice_umap.ipynb).

### Documentation

Expand All @@ -150,10 +164,16 @@ for more information on usage please see the documentation in [https://www.jkobj
-->

### Model Weights

Model weights are available on [hugging face](https://huggingface.co/jkobject).

## Development

Read the [CONTRIBUTING.md](CONTRIBUTING.md) file.

Read the [training runs](https://wandb.ai/ml4ig/scprint_scale/reports/scPRINT-trainings--Vmlldzo4ODIxMjgx?accessToken=80metwx7b08hhourotpskdyaxiflq700xzmzymr6scvkp69agybt79l341tv68hp) document to know more about how training was performed and the results there.

acknowledgement:
[python template](https://github.com/rochacbruno/python-project-template)
[laminDB](https://lamin.ai/)
Expand Down
Binary file removed Untitled-2024-05.png
Binary file not shown.
Binary file modified figure1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
1 change: 0 additions & 1 deletion lamin-intro/.lamindb/_is_initialized

This file was deleted.

Empty file.
File renamed without changes.
File renamed without changes.
901 changes: 901 additions & 0 deletions notebooks/assessments/bench_denoising.ipynb

Large diffs are not rendered by default.

File renamed without changes.
File renamed without changes.
Loading

0 comments on commit ce43e2a

Please sign in to comment.