Skip to content
dnsosa edited this page Mar 29, 2020 · 6 revisions

Overview

This guide will summarize the proposed workflow for integrating our work together. The goal is to do the greatest amount of parallelization and the least amount of redundant work.

Workflow

This image is a summary of the current directory structure: alt text

The main Therapeutic task (Task 1T) consists of producing these 3 deliverables. The master pipeline for running everything and generating these outputs will be found at main_pipeline/MAIN_PIPELINE.ipynb. This task has been broken down into subtasks in the above spreadsheet, and each subtask (Subtask 1T.*) has a corresponding directory. Within each of these 4 directories the structure is identical: a directory for notebooks and a script for helper functions.

We will be following these principles:

  • Define functions and classes outside of your notebooks. Put these functions and classes in logical places that could be shared with others
  • Write test functions for each function you write (more on that in the next notebook)
  • Each notebook that you create should be like a mini-pipeline. It has inputs and outputs. The notebook has the benefit of nice inline visualizations and is a great place to conduct analyses. Again, think of a notebook as a pipeline and only call functions defined somewhere else so that your data "flows" through it, with corresponding analyses/visualizations.

The most important thing here is to define your functions/classes outside of your notebooks, and put them in logical places like helper_functions.py or other scripts within each Subtask directory.

Clone this wiki locally