-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path10-github_actions.qmd
260 lines (199 loc) · 9.88 KB
/
10-github_actions.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
# Intro to CI/CD with Github Actions
<div style="text-align:center;">
```{r, echo = F}
knitr::include_graphics("img/octopus.png")
```
</div>
What you’ll have learned by the end of the chapter: very basic knowledge of Github Actions,
but enough to run your RAP in the cloud.
## Introduction
We are almost at the end; actually, we could have stopped at the end of the previous chapter. We
have reached our goal; we are able to run pipeline in a 100% reproducible way. However, this
requires some manual steps. And maybe that's not a problem; if your image is done, and users only
need to pull it and run the container, that's not really a big problem. But you should keep in mind
that manual steps don't scale. Let's imagine another context; let's suppose that you are part of a
company and that you are part of a team that needs to quickly ship products to clients. Maybe
several people contribute to the product using an internal version control solution (like a Gitlab
instance that is deployed on the premises of your company). Maybe you even need to work on several
products in the same day; you (and your teammates) should only be focusing writing code (and
`Dockerfiles`)... your time and resources cannot get clogged by building images (which depending on
what you're working on, can take quite some time). So ideally, we would want to automate this step.
That is what we are going to learn in this chapter.
This chapter will introduce you to the basic ideas of CI/CD (Continuous Integration and Continuous
Deployment/Delivery) and DevOps with Github Actions. Because we’re using Git to trigger all the events
and automate the whole pipeline, this can also be referred to as GitOps.
What's Dev(Git)Ops? I think that the [Atlassian](https://www.atlassian.com/devops) page on DevOps makes
a good job of explaining it. The bottom line is that DevOps makes it easy for developers to focus on
coding, and makes it easy for them to ship data products. The core IT team provides the required
infrastructure and tools to make this possible. GitOps is a variant of DevOps where the definition
of the infrastructure is versioned, and can be changed by editing simple text files. Through events,
such as pushing to the repository, new images can be built, or containers executed. Data products
can then also be redeployed automatically. All the steps we've been doing manually, with one simple push!
It's also possible, in the context of package development, to execute unit tests when code gets pushed
to repo, or get documentation and vignettes compiled. This also means that you could be developing
on a very thin client with only a text editor and git installed. Pushing to Github would then
execute everything needed to have a package ready for sharing.
So our goal here is, in short, to do exactly the same as what we have been doing on our computer
(so build an image, run a container, and get back 3 plots), but on Github.
## Getting your repo ready for Github Actions
You should see an "Actions" tab on top of any Github repo:
<div style="text-align:center;">
```{r, echo = F}
knitr::include_graphics("img/ga_1.png")
```
</div>
This will open a new view where you can select a lot of available, ready to use actions. Shop around for
a bit, and choose the right one (depending on what you want to do). You should know that there is a
very nice repository with many [actions for R](https://github.com/r-lib/actions). Once you're done
choosing an action, a new view in which you can edit a file will open. This file will have the name of
the chosen action, and have the `.yml` extension. This file will be automatically added to your repository,
in the following path: `.github/workflows`.
Let's take a look at such a workflow file:
```
name: Hello world
on: [push]
jobs:
say-hello:
runs-on: ubuntu-latest
steps:
- run: echo "Hello from Github Actions!"
- run: echo "This command is running from an Ubuntu VM each time you push."
```
Let's study this workflow definition line by line:
```
name: Hello world
```
Simply gives a name to the workflow.
```
on: [push]
```
When should this workflow be triggered? Here, whenever something gets pushed.
```
jobs:
```
What is the actual things that should happen? This defines a list of actions.
```
say-hello:
```
This defines the `say-hello` job.
```
runs-on: ubuntu-latest
```
This job should run on an Ubuntu VM. You can also run jobs on Windows or macOS VMs, but
this uses more compute minutes than a Linux VM (you have 2000 compute minutes for free per month).
```
steps:
```
What are the different steps of the job?
```
- run: echo "Hello from Github Actions!"
```
First, run the command `echo "Hello from Github Actions!"`. This commands runs inside the VM.
Then, run this next command:
```
- run: echo "This command is running from an Ubuntu VM each time you push."
```
Let's push, and see what happens on github.com:
<div style="text-align:center;">
<video width="640" height="480" controls>
<source src="img/ga_1.mp4" type="video/mp4">
</video>
</div>
If we take a look at the commit we just pushed, we see this yellow dot next to the commit name.
This means that an action is running. We can then take a look at the output of the job, and
see that our commands, defined with the `run` statements in the workflow file, succeeded and echoed
what we asked them.
So, the next step is running our Docker image and getting back our plots. This is what our
workflow file looks like:
```
name: Reproducible pipeline
on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build the Docker image
run: docker build -t my-image-name .
- name: Docker Run Action
run: docker run --rm --name my_pipeline_container -v /github/workspace/fig/:/home/graphs/:rw my-image-name
- uses: actions/upload-artifact@v3
with:
name: my-figures
path: /github/workspace/fig/
```
For now, let's focus on the `run` statements, because these should be familiar:
```
run: docker build -t my-image-name .
```
and:
```
run: docker run --rm --name my_pipeline_container -v /github/workspace/fig/:/home/graphs/:rw my-image-name
```
The only new thing here, is that the path has been changed to `/github/workspace/`. This is the
home directory of your repository, so to speak. Now there's the `uses` keyword that's new:
```
uses: actions/checkout@v3
```
This action checkouts your repository inside the VM, so the files in the repo are available inside the VM.
Then, there's this action here:
```
- uses: actions/upload-artifact@v3
with:
name: my-figures
path: /github/workspace/fig/
```
This action takes what's inside `/github/workspace/fig/` (which will be the output of our pipeline)
and makes the contents available as so-called "artifacts". Artifacts are the outputs of your
workflow. In our case, as stated, the output of the pipeline. So let's run this by pushing a
change, and let's take a look at these artifacts!
<div style="text-align:center;">
<video width="640" height="480" controls>
<source src="img/ga_2.mp4" type="video/mp4">
</video>
</div>
As you can see from the video above, a zip file is now available and can be downloaded. This
zip contains our plots! It is thus possible to rerun our workflow in the cloud. This has the
advantage that we can now focus on simply changing the code, and not have to bother with
boring manual steps. For example, let's change this target in the `_targets.R` file:
```
tar_target(
commune_data,
clean_unemp(unemp_data,
place_name_of_interest = c("Luxembourg", "Dippach",
"Wiltz", "Esch/Alzette",
"Mersch", "Dudelange"),
col_of_interest = active_population)
)
```
I've added "Dudelange" to the list of communes to plot. Let me push this change to the repo now,
and let's take a look at the artifacts. The video below summarises the process:
<div style="text-align:center;">
<video width="640" height="480" controls>
<source src="img/ga_3.mp4" type="video/mp4">
</video>
</div>
As you can see in the video, the `_targets.R` script was changed, and the changes pushed to Github.
This triggered the action we've defined before. The plots (artifacts) get refreshed, and we can
download them. We see then that Dudelange was added in the `communes.png` plot!
It is also possible to "deploy" the plots directly to another branch, and do much, much more. I just wanted
to give you a little taste of Github Actions (and more generally GitOps). The possibilities are virtually
limitless, and I still can't get over the fact that Github Actions is free (well, up to
[2000 compute minutes and 500MB storage per month](https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions)).
## Building a Docker image and pushing it to a registry
It is also possible to build a Docker image and have it made available on an image registry.
You can see how this works on this [repository](https://github.com/b-rodrigues/ga_demo).
This images can then be used as a base for other RAPs, as in this [repository](https://github.com/b-rodrigues/ga_demo_rap/tree/main).
Why do this? Well because of "separation of concerns". You could have a repository which builds in image
containing your development environment: this could be an image with a specific version of R and R packages. And then
have as many repositories as projects that run RAPs using that development environment image as a basis. Simply add the
project-specific packages that you need for each project.
## Further reading
- http://haines-lab.com/post/2022-01-23-automating-computational-reproducibility-with-r-using-renv-docker-and-github-actions/
- https://orchid00.github.io/actions_sandbox/
- https://www.petefreitag.com/item/903.cfm
- https://dev.to/mihinduranasinghe/using-docker-containers-in-jobs-github-actions-3eof