Welcome to BIOS 259 – a Stanford Biosciences mini-course on computational reproducibility
This mini-course is designed to equip graduate students and postdocs with essential skills to ensure computational research reproducibility. Through practical exercises and interactive sessions, participants will learn best practices, tools, and techniques for doing open and reproducible research. Topics covered include version control, containerization, data management, workflows, and documentation strategies. This course empowers students to overcome challenges associated with reproducibility, fostering rigorous scientific inquiry, and enhancing the credibility and impact of their computational work, while also exploring the primary causes and consequences of irreproducibility in research. Participants will gain valuable insights and practical experience in achieving computational reproducibility across various domains, including biology.
This course aims to foster a culture of reproducibility, open science, open source and collaboration in research and provide the necessary tools and skills. Through active engagement and completion of course activities, you will be able to:
- Understand the importance and causes of computational reproducibility in research
- Gain proficiency in version control systems (e.g., Git) for collaborative code and data tracking
- Create and share conda environments for software dependency management
- Utilize containerization tools (e.g., Docker, Singularity) for portable computing environments.
- Learn techniques for automating workflows and generating reproducible results
- Develop effective strategies for managing and documenting data and code to ensure reproducibility
- Implement best practices for transparent and reproducible project organization
Version Control with Git: Learn how to track changes, collaborate effectively, and maintain a robust version history of your code and documents.
Environment management with Conda/Bioconda/Mamba: Explore Conda/Mamba, a famous package manager, and learn how to manage software dependencies in your projects efficiently and reproducibly. Additionally, we use channels, including Bioconda, for specialized bioinformatics packages.
Containerization with Docker and Singularity: Understand how to encapsulate your computational environment, ensuring consistent and reproducible execution across different systems.
Workflow management using Snakemake/nf-core: Dive into Snakemake, a powerful workflow management system, and gain hands-on experience creating, executing, and managing complex computational workflows. We will also explore the basics of Nextflow and get hands-on experience with nf-core pipelines.
Documentation and code and data sharing: Learn best practices for organizing, sharing, and documenting code and data to facilitate reproducibility and enable others to build upon your work.
- Basic understanding and familiarity with programming (e.g., Python, R)
- Basic understanding of Unix/Linux Bash
Date | Day | Topics covered | Time | Location | Material |
---|---|---|---|---|---|
02.26.2024 | Mon | Introduction to reproducibility and setting up | 10:00-13:00 | M218A | Slides, Setup instructions |
02.28.2024 | Wed | Version Control (Git/GitHub) | 10:00-13:00 | M218A | Slides, Git cheat sheet |
03.01.2024 | Fri | Environment management (Conda, Bioconda, Mamba) | 10:00-13:00 | M218A | Slides, Conda cheat sheet |
03.04.2024 | Mon | Containerization (Docker, Singularity) | 10:00-13:00 | M218A | Slides, Docker cheat sheet, Excercise |
03.06.2024 | Wed | Workflows (Snakemake, Nextflow/nf-core) | 10:00-13:00 | M218A | Excercise |
03.08.2024 | Fri | Document and share (Notebooks, FAIR data, and open code) and wrap-up | 10:00-13:00 | LK208 | Slides, Excercise |
We will add additional learning material here, meantime you can find resources list at the end of each day's slides.