-
Notifications
You must be signed in to change notification settings - Fork 1
Home
-
Code: Data processing scripts are under builder/, java libraries under lib/, and pipeline workflow rules under snakefiles/.
-
Data: An empty input data folder hierarchy is organized under data/. Text files containing genes and interactions can be dropped by the user into the appropriate folder for processing, as described in DataLayout. The contents of this folder is not changed by the workflow rules, and is suitable for backup and/or versioning of input data.
-
Configuration: the config/ folder contains general data and configuration files required for the build. Organism level configuration is stored in data/organism.cfg, and arguments controlling the parameters of the various programs comprising the pipeline are in the workflow files
-
Output: Upon successful execution of the data processing pipeline, output files required for deployment on a website will appear in the results/ folder. These include results/lucene_index, results/network_cache, and also the intermediate text format of all the processed data in results/generic_db.
-
Intermediate data: During processing various intermediate files are produced. These are stored in the work/ folder. To save disk space once a build has been completed and validated, this folder be removed.
.
├── builder
├── config
├── data
│ ├── attributes
│ │ ├── attrib-gene-list
│ │ └── gene-attrib-list
│ ├── functions
│ ├── identifiers
│ │ ├── descriptions
│ │ ├── mixed_table
│ │ └── symbols
│ └── networks
│ ├── direct
│ ├── profile
│ └── sharedneighbour
├── lib
├── result
│ ├── generic_db
│ ├── lucene_index
│ └── network_cache
├── snakefiles
└── work
Once data files are in place, simply run:
snakemake -j 4
The default build target is to build everything, but intermediate targets can be specified as well. -j 4 instructs the workflow engine to use 4 processing cores instad of the default 1. Snakemake provides many useful build options.