alphafold_sm

Simple snakemake pipeline for each scaling of AlphaFold2

Version: 0.1.0
Authors:
- Nick Youngblut [email protected]
Maintainers:
- Nick Youngblut [email protected]

Summary

This snakemake pipeline handles the software install and cluster job submission/tracking.

Note: the pipeline was designed and tested for an SGE cluster. You may need to adapt the pipeline somewhat to work on other clusters or cloud computing services.

For failed cluster jobs, job resources are automatically escalated in an attempt to successfully complete the job, assuming that the job died due to a lack of cluster resources (eg., a lack of memory).

Alphafold is run as 2 parts:

Generation of the MSAs
- Just CPUs required for database searching
- All subprocesses will use the same number of CPUs
  - Unlike with the original alphafold code
Prediction of protein structures
- GPU usage recommended (used by default)

To do this, the pipeline utilizes a modified version of alphafold. Only the user interface has been edited, and not how alphafold actually functions.

Dependencies

The setup is based upon the alphafold_non_docker.

Databases

NOTE: You may to change the location all of required databases if you do not have access to the

Setup

Clone the pipeline

git clone --recurse-submodules <alphafold_sm>

If you forgot to use --recurse-submodules:

cd ./alphafold_sm/bin/
git submodule add https://github.com/leylabmpi/ll_pipeline_utils.git
git submodule add https://github.com/nick-youngblut/alphafold.git
git submodule update --remote --init --recursive

Download chemical properties to the common folder

wget -q -P bin/scripts/alphafold/alphafold/common/ https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt

Usage

Conda

You need a conda environment with snakemake installed.

Be sure to activate you snakemake conda environment!

Input

Databases

You may need to download the required alphafold databases if you do not have access to the database files listed on the config.yaml.

Input Fasta

The pipeline processes each user-provided fasta separately, in parallel.

If running model_preset: monomer, then each fasta should contain 1 sequence. If running model_preset: multimer, then each fasta can contain >=1 sequence.

You can use ./utils/seq_split.py for splitting a multi-fasta into per-sequence fasta files for input to this pipeline.

`config.yaml`

The config.yaml file sets the parameters for the pipeline.

Important parameters

use_gpu:
- Only used if cluster=True, which is set automatically via using ./snakemake_sge.sh for running the pipeline on the MPI Bio. cluster.
- If cluster=False (eg., if a run on a local server) then only CPUs will be used.
Other params
- See the alphafold documentation
databases:
- base_path:
  - All databases are assumed to be within this path
  - In other words, the base_path is prepended to all database paths
pipeline:
- export_conda:
  - Export all conda envs at the end of a success run

WARNINGs

If you delete the ./snakemake/conda/ directory, then BE SURE TO delete the pip_update.done and patch.done files in the output directory, or you have to apply the pip update & patch manually to the alphafold conda environment that snakemake will automatically generate.

Output

For general info on alphafold output, see the alphafold docs.

mTM-align

mTM-align is used for 2 sets of comparisons:

Intra
- The ranked_[0-9].pdb structures are compared per-sample
Inter
- The ranked_0.pdb structures are compared between samples

TODO

tools to possibly add

Structure-based calculations
structural comparison
- TM-Align
- mTM-Align
- madoka
- FATCAT
- bio3d
visualization
- ipymol

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
bin		bin
tests/data		tests/data
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
snakemake_sge.sh		snakemake_sge.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

alphafold_sm

Summary

Dependencies

Databases

Setup

Usage

Conda

Input

Databases

Input Fasta

`config.yaml`

Important parameters

WARNINGs

Output

mTM-align

TODO

tools to possibly add

About

Releases

Packages

Contributors 2

Languages

License

leylabmpi/alphafold_sm

Folders and files

Latest commit

History

Repository files navigation

alphafold_sm

Summary

Dependencies

Databases

Setup

Usage

Conda

Input

Databases

Input Fasta

config.yaml

Important parameters

WARNINGs

Output

mTM-align

TODO

tools to possibly add

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`config.yaml`

Packages