Broadly defined, Open Science aims to make the products of scholarly investigation accessible to as many people as possible. In empirical sciences, one of these products is data analysis. How can you build your analysis pipeline in a way that can easily be inspected, so that results can be reproduced both by novices and experts? In this workshop you will learn some functionalities of the tidyverse
, a collection of R packages for data manipulation and plotting that is easily readable for machines and humans. You will load a preselected dataset, manipulate it in various ways (e.g., variable renaming, filtering, and recoding), visualize it to better understand its hidden relationships, and perform simple statistical analyses. Importantly, these operations will be run in an environment that ensures computational reproducibility: if you have the initial dataset, you will be able to reproduce the final results with a simple mouse click.
- Download this GitHub repository by clicking on the green button Clone or download
- Unzip the folder somewhere on your hard disk
- Double-click on ERIM2020_intro_tidyverse.Rproj to open the self-contained RStudio project
- From the Files panel in RStudio, click on ERIM2020_intro_tidyverse.Rmd (in the doc subfolder) for the code we will discuss during the workshop
- In the same subfolder you will also find the complete output as well as the solutions to the exercises (source, html)