Reproducibility and package management techniques: workflow languages (CWL, Snakemake, and Conda). This course introduces some of the approaches for package management and how to create reproducible workflows or pipelines.
This session seeks to impart the following competencies:
- Knowledge and skills: Bioinformatics tools and their usage.
- Knowledge and Skills: Command line and scripting based computing skills appropriate to the discipline.
By the end of this session, and the projects that follow, the learner should be able to:
- Select the best workflow and package managers based on the task at hand
- Implement a genomic pipeline in at least one workflow manager
- Set up a reproducible analysis environment
- Introduce the high-level concept of workflows and high throughput data analysis
- Hands-on activities for setting up the packages
- Introduce package management and how we can use conda to increase reproducibility with workflows
- Introduce the theory of workflows: with emphasis on one language (say, snakemake)
- Hands-on activities of developing workflows
- Using Bioconda to streamline software installation for bioinformatics
- Workflows and Pipelines
- Package mgmt| resource mgmt | reproducibility
- Package Management with conda
- Workflow with Snakemake will provide a quick introduction then we'll dive deeper using Reproducible Research tutorial.See this tutorial also
- Nextflow and Singularity tutorial
- Docker Tutorial
- Common Workflow language tutorial. We will not cover this, but we provide links to useful tutorials for you to explore and learn further. Also see this and this(https://andrewjesaitis.com/2017/02/common-workflow-language---a-tutorial-on-making-bioinformatics-repeatable/) walkthroughs.
- Resource management on HPC
Some resources and articles you can make use in this course:
-
Awesome pipelines: A curated list of pipelines and workflow languages
-
Existing Workflow systems: Computational Data Analysis Workflow Systems
-
Papers: