-
Notifications
You must be signed in to change notification settings - Fork 1
Home
-
Code: Data processing scripts are under builder/, java libraries under lib/, and pipeline workflow rules under snakefiles/.
-
Data: An empty input data folder hierarchy is organized under data/. Text files containing genes and interactions can be dropped by the user into the appropriate folder for processing, described in detail elswhere [TODO: link]. The contents of this folder is not changed by the workflow rules, and is suitable for backup and/or versioning of input data.
-
Configuration: the config/ folder contains general data and configuration files required for the build. Organism level configuration is stored in data/organism.cfg, and arguments controlling the parameters of the various programs comprising the pipeline are in the workflow files
-
Output: Upon successful execution of the data processing pipeline, output files required for deployment on a website will appear in the results/ folder. These include results/lucene_index, results/network_cache, and also the intermediate text format of all the processed data in results/generic_db.
-
Intermediate data: During processing various intermediate files are produced. These are stored in the work/ folder. To save disk space once a build has been completed and validated, this folder be removed.
.
├── builder
├── config
├── data
│ ├── attributes
│ │ ├── attrib-gene-list
│ │ └── gene-attrib-list
│ ├── functions
│ ├── identifiers
│ │ ├── descriptions
│ │ ├── mixed_table
│ │ └── symbols
│ └── networks
│ ├── direct
│ ├── profile
│ └── sharedneighbour
├── lib
├── result
│ ├── generic_db
│ ├── lucene_index
│ └── network_cache
├── snakefiles
└── work
Once data files are in place, simply run:
snakemake -j 4
The default build target is to build everything, but intermediate targets can be specified as well. -j 4 instructs the workflow engine to use 4 processing cores instad of the default 1. Snakemake provides many useful build options.