Skip to content
kzuberi edited this page Nov 24, 2014 · 4 revisions

Layout

  • Code: Data processing scripts are under builder/, java libraries under lib/, and pipeline workflow rules under snakefiles/.

  • Data: An empty input data folder hierarchy is organized under data/. Text files containing genes and interactions can be dropped by the user into the appropriate folder for processing, described in detail elswhere [TODO: link]. The contents of this folder is not changed by the workflow rules, and is suitable for backup and/or versioning of input data.

  • Configuration: the config/ folder contains general data and configuration files required for the build. Organism level configuration is stored in data/organism.cfg, and arguments controlling the parameters of the various programs comprising the pipeline are in the workflow files

  • Output: Upon successful execution of the data processing pipeline, output files required for deployment on a website will appear in the results/ folder. These include results/lucene_index, results/network_cache, and also the intermediate text format of all the processed data in results/generic_db.

  • Intermediate data: During processing various intermediate files are produced. These are stored in the work/ folder. To save disk space once a build has been completed and validated, this folder be removed.

    .
    ├── builder
    ├── config
    ├── data
    │   ├── attributes
    │   │   ├── attrib-gene-list
    │   │   └── gene-attrib-list
    │   ├── functions
    │   ├── identifiers
    │   │   ├── descriptions
    │   │   ├── mixed_table
    │   │   └── symbols
    │   └── networks
    │       ├── direct
    │       ├── profile
    │       └── sharedneighbour
    ├── lib
    ├── result
    │   ├── generic_db
    │   ├── lucene_index
    │   └── network_cache
    ├── snakefiles
    └── work

Execution

Once data files are in place, simply run:

snakemake -j 4

The default build target is to build everything, but intermediate targets can be specified as well. -j 4 instructs the workflow engine to use 4 processing cores instad of the default 1. Snakemake provides many useful build options.

Clone this wiki locally