-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge remote-tracking branch 'origin/main' into update-pretrained-models
- Loading branch information
Showing
28 changed files
with
1,031 additions
and
607 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,75 +4,17 @@ Rare variant association testing using deep learning and data-driven burden scor | |
|
||
[![Documentation Status](https://readthedocs.org/projects/deeprvat/badge/?version=latest)](https://deeprvat.readthedocs.io/en/latest/?badge=latest) | ||
|
||
## Installation | ||
|
||
1. Clone this repository: | ||
``` | ||
git clone [email protected]:PMBio/deeprvat.git | ||
``` | ||
1. Change directory to the repository: `cd deeprvat` | ||
1. Install the conda environment. We recommend using [mamba](https://mamba.readthedocs.io/en/latest/index.html), though you may also replace `mamba` with `conda` | ||
|
||
*note: [the current deeprvat env does not support cuda when installed with conda](https://github.com/PMBio/deeprvat/issues/16), install using mamba for cuda support.* | ||
``` | ||
mamba env create -n deeprvat -f deeprvat_env.yaml | ||
``` | ||
1. Activate the environment: `mamba activate deeprvat` | ||
1. Install the `deeprvat` package: `pip install -e .` | ||
## Installation and usage | ||
|
||
If you don't want to install the gpu related requirements use the `deeprvat_env_no_gpu.yml` environment instead. | ||
``` | ||
mamba env create -n deeprvat -f deeprvat_env_no_gpu.yaml | ||
``` | ||
Please consult our [documentation](https://deeprvat.readthedocs.io/en/latest/) | ||
|
||
|
||
## Basic usage | ||
## Citation | ||
|
||
### Customize pipelines | ||
If you use this package, please cite: | ||
|
||
Before running any of the snakefiles, you may want to adjust the number of threads used by different steps in the pipeline. To do this, modify the `threads:` property of a given rule. | ||
|
||
If you are running on an computing cluster, you will need a [profile](https://github.com/snakemake-profiles) and may need to add `resources:` directives to the snakefiles. | ||
|
||
|
||
### Run the preprocessing pipeline on VCF files | ||
|
||
Instructions [here](https://deeprvat.readthedocs.io/en/latest/preprocessing.html) | ||
|
||
|
||
### Annotate variants | ||
|
||
Instructions [here](https://deeprvat.readthedocs.io/en/latest/annotations.html) | ||
|
||
|
||
|
||
### Try the full training and association testing pipeline on some example data | ||
|
||
``` | ||
mkdir example | ||
cd example | ||
ln -s [path_to_deeprvat]/example/* . | ||
snakemake -j 1 --snakefile [path_to_deeprvat]/pipelines/training_association_testing.snakefile | ||
``` | ||
|
||
Replace `[path_to_deeprvat]` with the path to your clone of the repository. | ||
|
||
Note that the example data is randomly generated, and so is only suited for testing whether the `deeprvat` package has been correctly installed. | ||
|
||
|
||
### Run the association testing pipeline with pretrained models | ||
|
||
``` | ||
mkdir example | ||
cd example | ||
ln -s [path_to_deeprvat]/example/* . | ||
ln -s [path_to_deeprvat]/pretrained_models | ||
snakemake -j 1 --snakefile [path_to_deeprvat]/pipelines/association_testing_pretrained.snakefile | ||
``` | ||
|
||
Replace `[path_to_deeprvat]` with the path to your clone of the repository. | ||
|
||
Again, note that the example data is randomly generated, and so is only suited for testing whether the `deeprvat` package has been correctly installed. | ||
Clarke, Holtkamp et al., “Integration of Variant Annotations Using Deep Set Networks Boosts Rare Variant Association Genetics.” bioRxiv. https://dx.doi.org/10.1101/2023.07.12.548506 | ||
|
||
|
||
## Credits | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Cluster execution | ||
|
||
## Pipeline resource requirements | ||
|
||
For cluster exectution, resource requirements are expected under `resources:` in all rules. All pipelines have some suggested resource requirements, but they may need to be adjusted for your data or cluster. | ||
|
||
|
||
## Cluster execution | ||
|
||
If you are running on a computing cluster, you will need a [profile](https://github.com/snakemake-profiles). We have tested execution on LSF. If you run into issues running on other clusters, please [let us know](https://github.com/PMBio/deeprvat/issues). | ||
|
||
|
||
## Execution on GPU vs. CPU | ||
|
||
Two steps in the pipelines use GPU by default: Training (rule `train` from [train.snakefile](https://github.com/PMBio/deeprvat/blob/main/pipelines/training/train.snakefile)) and burden computation (rule `compute_burdens` from [burdens.snakefile](https://github.com/PMBio/deeprvat/blob/main/pipelines/association_testing/burdens.snakefile)). To run on CPU on a computing cluster, you may need to remove the line `gpus = 1` from the `resources:` of those rules. | ||
|
||
Bear in mind that this will make burden computation substantially slower, but still feasible for most datasets. Training without GPU is not practical on large datasets such as UK Biobank. |
Oops, something went wrong.