-
Notifications
You must be signed in to change notification settings - Fork 0
/
am-05-data-management.Rmd
117 lines (83 loc) · 5.64 KB
/
am-05-data-management.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
title: "Week 5 - Data Management"
---
<script defer data-domain="rbtl-fs22.github.io/website" src="https://plausible.io/js/plausible.js"></script>
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Introduction
Research Data Management has become increasingly important and is a signifcant aspect of open science and reproducible research. Most funders require a data management plan today (e.g. Swiss National Science Foundation). There are many elements that are a part of it and we will focus our efforts on documentation and metadata. You will receive templates that you can use for your own future data management practices.
## Prerequisites
We assume that you have:
1. A working account for Select Survey
2. Access to a spreadsheet based software (if not, consider [LibreOffice](https://www.libreoffice.org/))
## Learning Objectives
1. Learners can use a tool for survey design for 12 to 15 questions ~~using at least two elements of survey logic~~
2. Learners can apply 12 principles for data organisation in spreadsheets in the layout of a collected dataset
3. Learners understand the importance of documentation and metadata for (research) data management
## Terminology {#terminology}
# Exercises
## Exercise 0 - Feedback on Assignment 4
1. Open your Assignment 4 on GitHub.
2. Find the issue where you received feedback on Assignment 4.
3. Incorporate this feedback into your submission for this Assignment 5.
## Exercise 1 - Clone the repo
1. Open the rbtl organisation on GitHub and locate your repository for this assignment (i.e. ae-05-data-management-GITHUB-USERNAME).
2. Open the rbtl workspace on rstudio.cloud and clone the your repository into the workspace.
3. Open the project
4. Let git know who your are (we will have to do this every time now):
- Edit the details in the following code to your name and email. This is the name and email that your commits will be associated. with.
- usethis::use_git_config(user.name = "Jane", user.email = "[email protected]")
- copy the line of code to your clipboard (Ctrl + C).
- paste the line of code in the Console and hit ther Return (Enter) key.
## Exercise 2 - Read this paper
1. Read: [Data Organization in Spreadsheets](https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989)
2. Summarise what you have learned from reading [Data Organization in Spreadsheets](https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989). Add your text to the R Markdown file in the repository you have cloned to your workspace.
3. knit, ommit, add, and push your changes to GitHub.
<!-- You can write free text or use bullet points -->
## Exercise 3 - Add your data template
**If you are using a survey as your data collection tool:**
1. Prepare the questionnaire on Select Survey.
2. Answer the questionnaire at least twice on your own.
3. Export the responses data as a CSV file.
4. Save the exported `UserResponses.csv` file on your computer.
5. Open the rbtl workspace on rstudio.cloud and open your project for this assignment.
6. Upload the exported `UserResponses.csv` file into the `/data` directory (see screenshots below)
7. commit, add, and push your changes to GitHub.
```{r, echo=FALSE}
knitr::include_graphics("img/rscloud-upload-file.png")
```
```{r, echo=FALSE}
knitr::include_graphics("img/rscloud-upload-file-2.png")
```
**If you using experimental data as your data collection tool:**
1. Use a spreadsheet based software of your choice and open a new spreadsheet.
2. Write your variables in the first row of your spreadsheet, starting with the first column (cell A1).
3. Save the file as a CSV file on your computer and apply file naming conventions shared during the lecture.
4. Open the rbtl workspace on rstudio.cloud and open your project for this assignment.
5. Upload your created file into the `/data` directory (see screenshots above).
6. commit, add, and push your changes to GitHub.
## Exercise 4 - Add your documentation (codebook)
Independent of whether you are using a survey or experimental data as your data collection tool:
1. Use a spreadsheet based software of your choice and open a new spreadsheet.
2. Copy the variables of the `attributes.csv` file to your first row (also displayed as an attributes table below)
3. Save the file as a CSV file on your computer and and name it `attributes.csv`.
4. Open the rbtl workspace on rstudio.cloud and open your project for this assignment.
5. Upload your created file into the `/data` directory and overwrite the existing `attributes.csv` (see screenshots above).
6. commit, add, and push your changes to GitHub.
| variableName | description |
|--------------|---------------------------------------------------------|
| fileName | the name of the input data file(s) |
| variableName | the name of the measured variable |
| description | a written description of what that measured variable is |
| unitText | the units the variable was measured in |
| variableType | the variable data type |
## Exercise 5 - Add your metadata
1. Open the rbtl workspace on rstudio.cloud and open your project for this assignment.
2. Navigate to the `/data` directory in the file manager (bottom right window) and click on the directory.
3. Click on the `DATASETNAME-readme.md` file to open it in your code editor (top left window).
4. Add all information that is applicable and save the file.
5. commit, add, and push your changes to GitHub.
## Exercise 6 - Complete the assignment
1. knit, commit, add, and push your changes to GitHub.
2. Open an issue on the repo to let us know you have completed the assignment.