diff --git a/BOSC2015-ReprSciCurr.html b/BOSC2015-ReprSciCurr.html new file mode 100644 index 0000000..bf286fe --- /dev/null +++ b/BOSC2015-ReprSciCurr.html @@ -0,0 +1,1604 @@ + + + + + + A curriculum for teaching Reproducible Computational Science bootcamps + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+

A curriculum for teaching Reproducible Computational Science bootcamps

+ +
+

+Hilmar Lapp, Duke University
+Participants of the Reproducible Science Curriculum Hackathon +

+ +

BOSC 2015, Dublin, Ireland +
+ + + CC0 + +

+ +
+ +
+
+

Reproducibility crisis

+
+
    +
  • Only 6 of 56 landmark oncology papers confirmed
  • +
  • 43 of 67 drug target validation studies failed to reproduce
  • +
  • Effect size overestimation is common
  • +
+ + +
+ +
+ + +

Nature Special Issue on Challenges in Irreproducible Research

+ +
+ +
+
+

Reproducibility matters

+
+

Lack of reproducibility in science causes significant issues

+ +
    +
  • For science as an enterprise
  • +
  • For other researchers in the community
  • +
  • For public policy
  • +
+ +
+ +
+
+

Science retracts gay marriage paper

+
+
    +
  • Science retracted (without lead author's consent) a study of how +canvassers can sway people's opinions about gay marriage

  • +
  • Original survey data was not made available for independent +reproduction of results (and survey incentives misrepresented, and +sponsorship statement false)

  • +
  • Two Berkeley grad students attempted to replicate the study and +discovered that the data must have been faked.

  • +
+ +

Source: +http://news.sciencemag.org/policy/2015/05/science-retracts-gay-marriage-paper-without-lead-author-s-consent

+ +
+ +
+
+

Reproducibility matters

+
+

Lack of reproducibility in science causes significant issues

+ +
    +
  • For science as an enterprise
  • +
  • For other researchers in the community
  • +
  • For public policy
  • +
  • For patients
  • +
+ +
+ +
+
+

Seizure study retracted after authors realize data got "terribly mixed"

+
+

From the authors of Low Dose Lidocaine for Refractory Seizures in +Preterm Neonates (doi:10.1007/s12098-010-0331-7:

+ +
+

The article has been retracted at the request of the authors. After carefully re-examining the data presented in the article, they identified that data of two different hospitals got terribly mixed. The published results cannot be reproduced in accordance with scientific and clinical correctness.

+
+ +

Source: Retraction Watch

+ +
+ +
+
+

Reproducibility matters

+
+

Lack of reproducibility in science causes significant issues

+ +
    +
  • For science as an enterprise
  • +
  • For other researchers in the community
  • +
  • For policy making
  • +
  • For patients
  • +
  • For oneself as a researcher
  • +
+ +
+ +
+
+

Reproducibility = Accelerating science

+
+
    +
  • If my research is difficult to reproduce it impedes my lab, and my +future self.
  • +
+ +
+

Any work you do to make your analysis more reproducible pays dividends for colleagues and your future self.

+
+ +

Jeremy Leipzig

+ + +
+ + + +
+
+

Reproducible computational research is challenging

+
+

+ +
    +
  • Most software has many dependencies, any one of which can fail to install.
  • +
  • Gaps and errors in docs may be harmless for experts, but are often +fatal for “method novices”.
  • +
  • Software evolution means that parameters that worked a year ago may +now throw an error.
  • +
  • Dependency hell: baseline software and packages differ from one to another.
  • +
+ + +
+ +
+ + +

NESCent Informatics experiment on reproducing reproducible computational research

+ +
+ +
+
+

Bewildering technology soup

+
+

+ +
    +
  • Distributed version control
  • +
  • Git, Mercurial, Subversion
  • +
  • Provenance
  • +
  • SHA256
  • +
  • Docker, Docker Hub
  • +
  • Continuous Integration
  • +
+ + +
+ +
+ + +

+ +
    +
  • Literate programming
  • +
  • RMarkdown, Knitr
  • +
  • DataCite DOIs
  • +
  • Dryad, Zenodo, Figshare
  • +
  • HIPAA, PHI
  • +
+ +
These are all about technology, not scientific discovery.
+ +
+ +
+
+

Reproducible Science Curriculum Workshop & Hackathon

+
+

To develop an open source curriculum for a two-day workshop on reproducibility for computational research

+ +

Reproducible Science Curriculum logo

+ +
+ +
+
+

Reproducible Science Curriculum Workshop & Hackathon

+
+

+Reproducible Science Curriculum Workshop & Hackathon Participants +Reproducible Science Curriculum Workshop & Hackathon Participants

+ + +
+ +
+ + +

+ +
    +
  • Held December 11-14 at NESCent in Durham, NC
  • +
  • Two days brainstorming / unconference, followed by two days +curriculum development
  • +
  • 21 participants comprising statisticians, biologists, +bioinformaticians, open-science activists, programmers, graduate +students, postdocs, untenured and tenured faculty
  • +
+ +
+ +
+
+

Reproducible Science Curriculum Workshop & Hackathon

+
+

+Reproducible Science Curriculum Workshop & Hackathon Participants

+ +

Mission: To train researchers in the best practices and approaches of reproducible research

+ + +
+ +
+ + +

+Goals:

+ +
    +
  • Modular - allow multiple formats, languages, to be incorporated
  • +
  • No restrictions or barriers to reuse
  • +
  • No reliance on tools that don't already exist
  • +
+ +
+ +
+
+

Key concepts underlying the curriculum

+
+ +
+ +
+
+

1. Benefits first accrue to researcher

+
+

B. Oliviera, Geeks and repetitive tasks +Bruno Oliviera

+ + +
+ +
+ + +
    +
  • Making your research reproducible for your future self.
  • +
  • Faster reporting of results despite updating data, tools, parameters, etc.
  • +
  • Faster resumption of research by others.
  • +
+ +

More generally, accelerating scientific progress through reproducible science.

+ +
+ +
+
+

2. Good - Better - Best

+
+


+
+Peng, R. D. (2011) Reproducible Research in Computational Science +Peng, R. D. “Reproducible Research in Computational Science” Science 334, no. 6060 (2011): 1226–1227

+ +
+ +
+
+

3. Literate Programming

+
+

Provenance with results pasted into manuscript: +Figure copy&pasted into MS Word

+ +
    +
  • Which code?
  • +
  • Which data?
  • +
  • Which context?
  • +
+ + +
+ +
+ + +

vs. Provenance of figures with Rmarkdown reports: +Figure generating code following by generated figure

+ +
+ +
+
+

4. One dataset throughout

+
+ + +
+ +
+
+

5. Two pronged-approach

+
+


+ +
    +
  • For scientists just starting with data workflows, raise them so that +they do not have workflows other than reproducible ones.
  • +
  • For scientists who already have their workflows, convince and train +them to adopt more reproducible ones.
  • +
+ +

Curriculum is suitable for both.

+ +
+ +
+
+

Inaugural workshop: May 14-15, Duke

+
+

+ +
+ +
+
+

Syllabus v1.0

+
+

Day 1, morning:

+ +
    +
  • Introduction to Reproducible Research
  • +
  • Rotation-based exercise
  • +
  • Introduction of R, RStudio, Rmarkdown, and Knitr
  • +
+ +

Day 1, afternoon:

+ +
    +
  • Organizing Files to Facilitate Reproducible Research
  • +
  • Literate Programming
  • +
  • Literate programming to clean and unit-test data
  • +
+ + +
+ +
+ + +

Day 2, morning:

+ +
    +
  • Why automate?
  • +
  • Transforming R scripts into R functions
  • +
  • Automated testing and integration testing
  • +
+ +

Day 2, afternoon:

+ +
    +
  • Sharing and publishing for data and code
  • +
  • Archiving for perpetuity
  • +
  • Licensing
  • +
+ +

On popular demand: Github

+ +
+ +
+
+

Second workshop: June 1-2, iDigBio

+
+ + +
+ +
+
+

Syllabus v1.1

+
+


+ +
    +
  • Earlier introduction of literate programming
  • +
  • More exercise time on using literate programming reports for +executable documentation
  • +
  • Started Day 2 with 90 minutes on version control and collaboration +using Git and Github
  • +
+ +
+ +
+
+

Observations, feedback and lessons learned

+
+ +
+ +
+
+

Unexpectly strong demand and interest

+
+


+

+ +
    +
  • For first workshop held at Duke, 30 seats filled up in less than 24 hours.
  • +
  • Ended up with >20 on waiting list, and several inquiries about when +the workshop will be repeated.
  • +
  • Second workshop (held at U. Florida) filled up. too.
  • +
+ +
+ +
+
+

Tools and practices coming into the workshop cover a wide spectrum

+
+

+How are you recording your own research?

+ +
    +
  • Local digital documents
  • +
  • R scripts
  • +
  • Github
  • +
  • Rmarkdown, markdown text
  • +
  • Matlab
  • +
  • Lab notebooks
  • +
  • MS Word, Dropbox
  • +
+ + +
+ +
+ + +

+ +
    +
  • MS Powerpoint
  • +
  • Google drive
  • +
  • Script logging
  • +
  • Folder organization
  • +
  • Evernote
  • +
  • iPython
  • +
  • Perl scripts
  • +
  • Saving R sessions
  • +
  • Code comments
  • +
  • Wiki
  • +
  • Scratch paper
  • +
+ +
+ +
+
+

Don't be afraid of version control

+
+


+ +
    +
  • Some students had heard about version control. Almost none had practiced it.
  • +
  • Nonetheless, it was the most consistently and most frequently +mentioned topic in feedback cards for wanting to hear about.
  • +
  • If you don't teach it, it will be asked for. But it is challenging +to teach effectively to novices in 90 minutes or less.
  • +
+ +
+ +
+
+

Need lessons using iPython

+
+


+We chose R because it is

+ +
    +
  • popularly used for data analysis;
  • +
  • used widely across disciplines;
  • +
  • free and open-source;
  • +
  • supported on Windows, Mac OS X, and Linux;
  • +
  • and well-suited to teach the key concepts.
  • +
+ +

However, those not familiar with R struggle with R's syntax and get frustrated.

+ +
+ +
+
+

Future plans

+
+
    +
  • Material development sprint & workshop: + +
      +
    • Tweaking design of exercises to be shorter, and have more +emphasis on skill development than realizing what the problems +are.
    • +
    • Refining Git/Github lesson for the short time alloted to it.
    • +
  • +
  • Teaching more workshops
  • +
  • Expanding instructor pool
  • +
  • Sustaining and growing the effort + +
      +
    • Current funding ends in 2015.
    • +
    • “Adoption” under the Data Carpentry umbrella possible.
    • +
  • +
+ +
+ +
+
+

Acknowledgements

+
+
    +
  • Organizing committee members and curriculum instructors + +
      +
    • Mine Çetinkaya-Rundel (I)
    • +
    • Karen Cranston (O,I)
    • +
    • Ciera Martinez (O,I)
    • +
    • François Michonneau (O,I)
    • +
    • Matt Pennell (O)
    • +
    • Tracy Teal (O)
    • +
  • +
+ + +
+ +
+ + +
    +
  • Inspiration: Sophie (Kershaw) Kay - Rotation Based Learning
  • +
  • Funding and support: + +
      +
    • US National Science Foundation (NSF)
    • +
    • National Evolutionary Synthesis Center (NESCent)
    • +
    • Center for Genomic & Computational Biology (GCB), Duke University
    • +
  • +
+ +
+ +
+
+

Eating our dogfood: text formats, version control, sharing

+
+

+This slideshow was generated as HTML from Markdown using RStudio.

+ +

The Markdown sources, and the HTML, are hosted on Github: +https://github.com/Reproducible-Science-Curriculum/bosc2015

+ +


+The repository is archived on Zenodo: http://dx.doi.org/10.5281/zenodo.17844

+ + +
+ +
+ + +

+Rstudio screenshot of this presentation +RStudio Logo

+ +
+ +
+ + +
+
+ + + + + + + + + +