-
Notifications
You must be signed in to change notification settings - Fork 5
/
paths_and_projects.qmd
215 lines (126 loc) · 11.4 KB
/
paths_and_projects.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
# Paths & Projects {.unnumbered #sec-pathproj}
_You step onto the road, and if you don't keep your feet, there's no knowing where you might be swept off to_
![](images/paths.png)
In context of `R`, a path is a way to tell `R`, where to look for a file. First, let us familiarise us a bit with paths in RStudio.
## Getting Familiar with Paths in RStudio
Log on to the RStudio Cloud Server and in case you do so for the first time, your `files`-pane should look something like:
![](images/paths_01.png)
Here, ![](images/paths_home.png) `Home` defines your home, that is where you "live" on the server. Now, click the button ![](images/paths_new_folder.png) `New Folder` and create a new folder called `projects`. Hereafter, your `files`-pane should look something like:
![](images/paths_02.png)
Now, click the projects folder you created and then you should see:
![](images/paths_03.png)
Note how the `Home` now is extended to ![](images/paths_home.png) `Home > projects`. This signifies, that you are looking at the projects folder in your home.
Try to click on the 2 dots in ![](images/paths_dir_up.png)`..` (the green arrow won't do it, so hit those dots!), this will take you one level up, so you again see:
![](images/paths_02.png)
Note, that you are now back in ![](images/paths_home.png) `Home`
## Intermezzo: Creating a Project
Ok, so far so good. Now again click into `projects` and click ![](images/paths_new_proj.png) `Project: (None)` and from here select ![](images/paths_new_proj_plus.png) `New Project...`. Now you will se a dialogue window open, i.e. RStudio requires input from you:
![](images/paths_04.png)
Click ![](images/paths_new_proj.png) `New Directory` and you should see:
![](images/paths_05.png)
Click ![](images/paths_new_proj.png) `New Project`:
![](images/paths_06.png)
In the `Directory name:`, enter e.g. `r_for_bio_data_science` and note how we are creating the folder (In this case `Directory name`, which is equivalent) as a sub-folder of `projects`. The funny wavy symbol followed by a slash `~/` is simply a short hand for "in this users home folder". So let us proceed:
![](images/paths_07.png)
and then click `Create Project`. Now, depending on your choice of directory name, you should see:
![](images/paths_08.png)
Briefly, the created `r_for_bio_data_science.Rproj` file contains the settings for your project. You can verify this, by clicking on the file and you should see:
![](images/paths_18.png)
We will leave this as is for now, so go ahead and click `OK`.
Now, back to folders and paths - note how you now see ![](images/paths_home.png) `Home > projects > r_for_bio_data_science`, this means that you are now in the `r_for_bio_data_science` folder, which is inside the `projects` folder which in turn is inside the `Home` folder. If you take a look at the ![](images/paths_home.png) `Home > projects > r_for_bio_data_science`, you can in fact also click directly on e.g. `projects` - Try it!
_But why? Don't worry, we will return to why RStudio Projects are indispensable when working with reproducible Bio Data Science_
## Locating Data
Let's move on. Now, again click the ![](images/paths_new_folder.png) `New Folder`-button and create a `data`-folder, make sure it ends up in the `r_for_bio_data_science`-folder. This should result in:
![](images/paths_09.png)
Note here how we now have ![](images/paths_folder.png) `data` and ![](images/paths_new_proj.png) `r_for_bio_data_science.Rproj`. These are different, one is a folder, `data`, and the other is a file, `r_for_bio_data_science.Rproj`, containing settings for your RStudio Project.
Let us try to put som data into the `data`-folder. In the console window, run the following command
```{r}
#| eval: false
#| echo: true
write.table(x = datasets::Puromycin, file = "data/puromycin.tsv", sep = "\t")
```
This should look like so:
![](images/paths_10.png)
This command write a table containing the `Puromycin` from an R-package named `datasets`, this is done using the `x`-parameter. The next paramter is `file` and we set that to indicate, that the file should go into the `data`-folder we created and that we would like the file to be named `puromycin.tsv`, where `tsv` is an abbreviation for tab-separated-values and then the last parameter `sep` is set to `"\t"` indicating, that we want the values to be separated by a `tab`, as indicated, when we named the file.
Once you have run the command, click the `data`-folder and you should now see:
![](images/paths_11.png)
Again, click ![](images/paths_dir_up.png)`..`, this will take you one level up, so you again see:
![](images/paths_09.png)
Now, we have a data file called `puromycin.tsv`. Let us read that file into `R`. We can do that like so:
```{r}
#| eval: false
#| echo: true
my_data <- read.table(file = "puromycin.tsv")
```
Enter the command into the console and run it like we did before. You will now see the following:
![](images/paths_12.png)
So, what happend? The blue writing is your command and the red is an error message from `R`. Always read error messages carefully, they will inform you what went wrong. In this case, we can see that `cannot open file 'puromycin.tsv': No such file or directory`.
This happened because we forgot to specify **where** the `puromycin.tsv`-file is located. `R` is **very** picky here, you **have** to specify **exactly** where `R` should find things. Recall, that we decided to create a `data`-folder and that we placed the `puromycin.tsv`-file into that folder. This we **have** to specify, when we use the `file` parameter in the `read.table()`-function. So, let us fix that:
```{r}
#| eval: false
#| echo: true
my_data <- read.table(file = "data/puromycin.tsv")
```
This `data/puromycin.tsv` is the **path** to the file and now, that we have got that correct, you will see no error message and furthermore, you will see in the environment-pane, that we now have an object called `my_data`, containing `23 obs. of 3 varibles`, i.e. a data set with 23 rows and 3 columns.
![](images/paths_13.png)
## Absolute versus Relative Paths
Let us get back to why we **have** to work using RStudio Projects, recall we created the `r_for_bio_data_science.Rproj`-file, defining out project. You can verify, that we are indeed working in that project, by looking in the upper right corner of the RStudio IDE and you should see ![](images/paths_new_proj.png) `r_for_bio_data_science`.
Good, now in the console, enter the command:
```{r}
#| eval: false
#| echo: true
getwd()
```
You should see something along the lines of:
```{r}
#| eval: false
#| echo: true
"/net/pupilx/home/people/student_id/projects/r_for_bio_data_science"
```
So, when we read the `puromycin.tsv`-file using the **path** `data/puromycin.tsv`, we specify, that `R` should look for the file `puromycin.tsv` in the `data` folder. So why did we not have to specify `/net/pupilx/home/people/student_id/projects/r_for_bio_data_science`? Well indeed, we could have specified the full location of the `puromycin.tsv`-file, which would be:
```{r}
#| eval: false
#| echo: true
"/net/pupilx/home/people/student_id/projects/r_for_bio_data_science/data/puromycin.tsv"
```
This is called the **absolute path** and here you should note, that it begins with a `/`. But let us say, that we had indeed in our code stated:
```{r}
#| eval: false
#| echo: true
my_data <- read.table(file = "/net/pupilx/home/people/student_id/projects/r_for_bio_data_science/data/puromycin.tsv")
```
Then that would work... On **OUR** computer. If we were to share our code to a colleague or a collaborator, then that would not work, because that person would have a **different path**, e.g. a different `student_id`. The code would break! Imagine that you have thousands of line of code with hundreds of **absolute paths** - You would spend hours-and-hours on fixing all the **absolute paths**, so they matched that particular computer. Then every time we would want to share the analysis project, we would have to redo this tedious proces!
This is why we work in RStudio Projects! RStudio Projects allows us to specifiy where **everything** is located relative to where the `.Rproj`-file is. So in our case, the `r_for_bio_data_science.Rproj`-file is located in the same place as the `data`-folder, namely in the folder containing our **entire** project, the `r_for_bio_data_science`-folder, which in turn is located in the `projects`-folder.
This means, that **all** paths in the analysis project, can be stated relative to the location of the `.Rproj`-file and hence we have **relative paths**, meaning that anyone can receive the project and run it straight-out-of-the-box!
## Working in Multiple Projects
Now, we did add that plural `s`, when we created the `projects`-folder. When you have completed this course, perhaps you want to attend the "Introduction to Systems Biology"-course. In that case, we would setup a new project, so use the `Files`-pane to navigate to the `projects`-folder:
![](images/paths_14.png)
Then, we simply repeat the proces: Click ![](images/paths_new_proj.png) `r_for_bio_data_science` in the upper right corner and from here select ![](images/paths_new_proj_plus.png) `New Project...`. Now you will se a dialogue window open, i.e. RStudio requires input from you:
![](images/paths_04.png)
Click ![](images/paths_new_proj.png) `New Directory` and you should see:
![](images/paths_05.png)
Click ![](images/paths_new_proj.png) `New Project`:
![](images/paths_06.png)
In the `Directory name:`, enter e.g. `introduction_to_systems_biology`:
![](images/paths_15.png)
and then click `Create Project`. Now, you should see:
![](images/paths_16.png)
Now, note how you now see ![](images/paths_home.png) `Home > projects > introduction_to_systems_biology`, meaning that you are now in your `Home` and then in your folder containing `projects`, one of which is your `introduction_to_systems_biology` project.
Click ![](images/paths_dir_up.png)`..` and you will see:
![](images/paths_17.png)
This is now your two project folders and you can add others, such as yet another course or e.g. `special_course`, `bachelor_thesis` or `master_thesis`.
Now, we can easily switch between different projects. In the upper right corner you will see, that you are currently in the `introduction_to_systems_biology` project, meaning that `R` will look for all files relative to the `introduction_to_systems_biology.Rproj`-file. Naturally, we would want to switch back to the project, we created for the "R for Bio Data Science"-course. To do this, we simply click ![](images/paths_new_proj.png) `introduction_to_systems_biology` and if you look at the drop-down menu, you should see `r_for_bio_data_science` - Click it! Notice how `R` automatically restarts and you are moved to the correct folder for this project.
**Make sure to change your active project back to `r_for_bio_data_Science` before doing further lab exercises!**
## Learning Objectives
_If you made it this far, you should now be able to:_
- Navigate the RStudio IDE in context of folders and projects
- Create a new project
- Create a new folder
- Read and write data files from relative paths
- Work in and switch between different projects
- Explain the difference between a folder, a data file and a Rproj-file
- Explain the difference between absolute and relative paths
- Explain why RStudio Projects are an essential part of doing reproducible bio data science
## Epilogue
That's all folks! I hope this cleared up some things - Please feel free to revisit this chapter, as needed!
Remember - Have fun! No one ever got really good at something they didn't think was fun!