-
Notifications
You must be signed in to change notification settings - Fork 17
/
readme.Rmd
executable file
·979 lines (735 loc) · 30.2 KB
/
readme.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
---
title: "Learning Docker"
output: github_document
---
```{r setup, include=FALSE}
Sys.setenv(PATH=paste0(Sys.getenv("PATH"), ":", getwd()))
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(error = TRUE)
```
## Introduction
![](https://github.com/davetang/learning_docker/actions/workflows/create_readme.yml/badge.svg)
Docker is an open source project that allows one to pack, ship, and run any
application as a lightweight container. An analogy of Docker containers are
shipping containers, which provide a standard and consistent way of shipping
just about anything. The container includes everything that is needed for an
application to run including the code, system tools, and the necessary
dependencies. If you wanted to test an application, all you need to do is to
download the Docker image and run it in a new container. No more compiling and
installing missing dependencies!
The [overview](https://docs.docker.com/get-started/overview/) at
https://docs.docker.com/ provides more information. For more a more hands-on
approach, check out know [Enough Docker to be
Dangerous](https://docs.docker.com/) and [this short
workshop](https://davetang.github.io/reproducible_bioinformatics/docker.html)
that I prepared for BioC Asia 2019.
This README was generated by GitHub Actions using the R Markdown file
`readme.Rmd`, which was executed via the `create_readme.sh` script.
## Installing the Docker Engine
To get started, you will need to install the Docker Engine; check out [this
guide](https://docs.docker.com/engine/install/).
## Checking your installation
To see if everything is working, try to obtain the Docker version.
```{bash engine.opts='-l'}
docker --version
```
And run the `hello-world` image. (The `--rm` parameter is used to automatically
remove the container when it exits.)
```{bash engine.opts='-l'}
docker run --rm hello-world
```
## Docker information
Get more version information.
```{bash engine.opts='-l'}
docker version
```
Even more information.
```{bash engine.opts='-l'}
docker info
```
## Basics
The two guides linked in the introduction section provide some information on
the basic commands but I'll include some here as well. One of the main reasons
I use Docker is for building tools. For this purpose, I use Docker like a
virtual machine, where I can install whatever I want. This is important because
I can do my testing in an isolated environment and not worry about affecting
the main server. I like to use Ubuntu because it's a popular Linux distribution
and therefore whenever I run into a problem, chances are higher that someone
else has had the same problem, asked a question on a forum, and received a
solution.
Before we can run Ubuntu using Docker, we need an image. We can obtain an
Ubuntu image from the [official Ubuntu image
repository](https://hub.docker.com/_/ubuntu/) from Docker Hub by running
`docker pull`.
```{bash engine.opts='-l'}
docker pull ubuntu:18.04
```
To run Ubuntu using Docker, we use `docker run`.
```{bash engine.opts='-l'}
docker run --rm ubuntu:18.04 cat /etc/os-release
```
You can work interactively with the Ubuntu image by specifying the `-it`
option.
```console
docker run --rm -it ubuntu:18:04 /bin/bash
```
You may have noticed that I keep using the `--rm` option, which removes the
container once you quit. If you don't use this option, the container is saved
up until the point that you exit; all changes you made, files you created, etc.
are saved. Why am I deleting all my changes? Because there is a better (and
more reproducible) way to make changes to the system and that is by using a
Dockerfile.
## Start containers automatically
When hosting a service using Docker (such as running [RStudio
Server](https://davetang.org/muse/2021/04/24/running-rstudio-server-with-docker/https://davetang.org/muse/2021/04/24/running-rstudio-server-with-docker/)),
it would be nice if the container automatically starts up again when the server
(and Docker) restarts. If you use `--restart flag` with `docker run`, Docker
will [restart your
container](https://docs.docker.com/config/containers/start-containers-automatically/)
when your container has exited or when Docker restarts. The value of the
`--restart` flag can be the following:
* `no` - do not automatically restart (default)
* `on-failure[:max-retries]` - restarts if it exits due to an error (non-zero
exit code) and the number of attempts is limited using the `max-retries`
option
* `always` - always restarts the container; if it is manually stopped, it is
restarted only when the Docker daemon restarts (or when the container is
manually restarted)
* `unless-stopped` - similar to `always` but when the container is stopped, it
is not restarted even after the Docker daemon restarts.
```console
docker run -d \
--restart always \
-p 8888:8787 \
-e PASSWORD=password \
-e USERID=$(id -u) \
-e GROUPID=$(id -g) \
rocker/rstudio:4.1.2
```
## Dockerfile
A Dockerfile is a text file that contains instructions for building Docker
images. A Dockerfile adheres to a specific format and set of instructions,
which you can find at [Dockerfile
reference](https://docs.docker.com/engine/reference/builder/). There is also a
[Best practices
guide](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/)
for writing Dockerfiles.
A Docker image is made up of different layers and they act like snapshots. Each
layer, or intermediate image, is created each time an instruction in the
Dockerfile is executed. Each layer is assigned a unique hash and are cached by
default. This means that you do not need to rebuild a layer again from scratch
if it has not changed. Keep this in mind when creating a Dockerfile.
Some commonly used instructions include:
* `FROM` - Specifies the parent or base image to use for building an image and
must be the first command in the file.
* `COPY` - Copies files from the current directory (of where the Dockerfile is)
to the image filesystem.
* `RUN` - Executes a command inside the image.
* `ADD` - Adds new files or directories from a source or URL to the image
filesystem.
* `ENTRYPOINT` - Makes the container run like an executable.
* `CMD` - The default command or parameter/s for the container and can be used
with `ENTRYPOINT`.
* `WORKDIR` - Sets the working directory for the image. Any `CMD`, `RUN`,
`COPY`, or `ENTRYPOINT` instruction after the `WORKDIR` declaration will be
executed in the context of the working directory.
* `USER` - Changes the user
I have an example Dockerfile that uses the Ubuntu 18.04 image to build
[BWA](https://github.com/lh3/bwa), a popular short read alignment tool used in
bioinformatics.
```{bash engine.opts='-l'}
cat Dockerfile
```
### ARG
To define variables in your Dockerfile use `ARG name=value`. For example, you
can use `ARG` to create a new variable that stores a version number of a
program. When a new version of the program is released, you can simply change
the `ARG` and re-build your Dockerfile.
```
ARG star_ver=2.7.10a
RUN cd /usr/src && \
wget https://github.com/alexdobin/STAR/archive/refs/tags/${star_ver}.tar.gz && \
tar xzf ${star_ver}.tar.gz && \
rm ${star_ver}.tar.gz && \
cd STAR-${star_ver}/source && \
make STAR && \
cd /usr/local/bin && \
ln -s /usr/src/STAR-${star_ver}/source/STAR .
```
### CMD
The [CMD](https://docs.docker.com/engine/reference/builder/#cmd) instruction in
a Dockerfile does not execute anything at build time but specifies the intended
command for the image; there can only be one CMD instruction in a Dockerfile
and if you list more than one CMD then only the last CMD will take effect. The
main purpose of a CMD is to provide defaults for an executing container.
### COPY
The [COPY](https://docs.docker.com/engine/reference/builder/#copy) instruction
copies new files or directories from `<src>` and adds them to the filesystem of
the container at the path `<dest>`. It has two forms:
```
COPY [--chown=<user>:<group>] [--chmod=<perms>] <src>... <dest>
COPY [--chown=<user>:<group>] [--chmod=<perms>] ["<src>",... "<dest>"]
```
Note the `--chown` parameter, which can be used to set the ownership of the
copied files/directories. If this is not specified, the default ownership is
`root`, which can be a problem.
For example in the RStudio Server
[Dockerfile](https://github.com/davetang/learning_docker/blob/main/rstudio/Dockerfile),
there are two `COPY` instructions that set the ownership to the `rstudio` user.
```
COPY --chown=rstudio:rstudio rstudio/rstudio-prefs.json /home/rstudio/.config/rstudio
COPY --chown=rstudio:rstudio rstudio/.Rprofile /home/rstudio/
```
The two files that are copied are config files and therefore need to be
writable by `rstudio` if settings are changed in RStudio Server.
Usually the root path of `<src>` is set to the directory where the Dockerfile
exists. The example above is different because the RStudio Server image is
built by GitHub Actions, and the root path of `<src>` is the GitHub repository.
### ENTRYPOINT
An [ENTRYPOINT](https://docs.docker.com/engine/reference/builder/#entrypoint)
allows you to configure a container that will run as an executable. ENTRYPOINT
has two forms:
* ENTRYPOINT ["executable", "param1", "param2"] (exec form, preferred)
* ENTRYPOINT command param1 param2 (shell form)
```console
FROM ubuntu
ENTRYPOINT ["top", "-b"]
CMD ["-c"]
```
Use `--entrypoint` to override ENTRYPOINT instruction.
```console
docker run --entrypoint
```
## Building an image
Use the `build` subcommand to build Docker images and use the `-f` parameter if
your Dockerfile is named as something else otherwise Docker will look for a
file named `Dockerfile`. The period at the end, tells Docker to look in the
current directory.
```{bash engine.opts='-l'}
cat build.sh
```
You can push the built image to [Docker Hub](https://hub.docker.com/) if you
have an account. I have used my Docker Hub account name to name my Docker
image.
```console
# use -f to specify the Dockerfile to use
# the period indicates that the Dockerfile is in the current directory
docker build -f Dockerfile.base -t davetang/base .
# log into Docker Hub
docker login
# push to Docker Hub
docker push davetang/base
```
## Renaming an image
The `docker image tag` command will create a new tag, i.e. new image name, that
refers to an old image. It is not quite renaming but can be considered renaming
since you will have a new name for your image.
The usage is:
Usage: docker image tag SOURCE_IMAGE[:TAG] TARGET_IMAGE[:TAG]
For example I have created a new tag for my RStudio Server image, so that I can
easily push it to Quay.io.
```console
docker image tag davetang/rstudio:4.2.2 quay.io/davetang31/rstudio:4.2.2
```
The original image `davetang/rstudio:4.2.2` still exists, which is why tagging
is not quite renaming.
## Running an image
[Docker run documentation](https://docs.docker.com/engine/reference/run/).
```{bash engine.opts='-l'}
docker run --rm davetang/bwa:0.7.17
```
## Setting environment variables
Create a new environment variable (ENV) using `--env`.
```{bash engine.opts='-l'}
docker run --rm --env YEAR=1984 busybox env
```
Two ENVs.
```{bash engine.opts='-l'}
docker run --rm --env YEAR=1984 --env SEED=2049 busybox env
```
Or `-e` for less typing.
```{bash engine.opts='-l'}
docker run --rm -e YEAR=1984 -e SEED=2049 busybox env
```
## Resource usage
To [restrict](https://docs.docker.com/config/containers/resource_constraints/)
CPU usage use `--cpus=n` and use `--memory=` to restrict the maximum amount of
memory the container can use.
We can confirm the limited CPU usage by running an endless while loop and using
`docker stats` to confirm the CPU usage. *Remember to use `docker stop` to stop
the container after confirming the usage!*
Restrict to 1 CPU.
```console
# run in detached mode
docker run --rm -d --cpus=1 davetang/bwa:0.7.17 perl -le 'while(1){ }'
# check stats and use control+c to exit
docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
8cc20bcfa4f4 vigorous_khorana 100.59% 572KiB / 1.941GiB 0.03% 736B / 0B 0B / 0B 1
docker stop 8cc20bcfa4f4
```
Restrict to 1/2 CPU.
```console
# run in detached mode
docker run --rm -d --cpus=0.5 davetang/bwa:0.7.17 perl -le 'while(1){ }'
# check stats and use control+c to exit
docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
af6e812a94da unruffled_liskov 50.49% 584KiB / 1.941GiB 0.03% 736B / 0B 0B / 0B 1
docker stop af6e812a94da
```
## Copying files between host and container
Use `docker cp` but I recommend mounting a volume to a Docker container (see
next section).
```console
docker cp --help
Usage: docker cp [OPTIONS] CONTAINER:SRC_PATH DEST_PATH|-
docker cp [OPTIONS] SRC_PATH|- CONTAINER:DEST_PATH
Copy files/folders between a container and the local filesystem
Options:
-L, --follow-link Always follow symbol link in SRC_PATH
--help Print usage
# find container name
docker ps -a
# create file to transfer
echo hi > hi.txt
docker cp hi.txt fee424ef6bf0:/root/
# start container
docker start -ai fee424ef6bf0
# inside container
cat /root/hi.txt
hi
# create file inside container
echo bye > /root/bye.txt
exit
# transfer file from container to host
docker cp fee424ef6bf0:/root/bye.txt .
cat bye.txt
bye
```
## Sharing between host and container
Use the `-v` flag to mount directories to a container so that you can share
files between the host and container.
In the example below, I am mounting `data` from the current directory (using
the Unix command `pwd`) to `/work` in the container. I am working from the root
directory of this GitHub repository, which contains the `data` directory.
```{bash engine.opts='-l'}
ls data
```
Any output written to `/work` inside the container, will be accessible inside
`data` on the host. The command below will create BWA index files for
`data/chrI.fa.gz`.
```{bash engine.opts='-l'}
docker run --rm -v $(pwd)/data:/work davetang/bwa:0.7.17 bwa index chrI.fa.gz
```
We can see the newly created index files.
```{bash engine.opts='-l'}
ls -lrt data
```
However note that the generated files are owned by `root`, which is slightly
annoying because unless we have root access, we need to start a Docker
container with the volume re-mounted to alter/delete the files.
### File permissions
As seen above, files generated inside the container on a mounted volume are
owned by `root`. This is because the default user inside a Docker container is
`root`. In Linux, there is typically a `root` user with the UID and GID of 0;
this user exists in the host Linux environment (where the Docker engine is
running) as well as inside the Docker container.
In the example below, the mounted volume is owned by UID 1211 and GID 1211 (in
the host environment). This UID and GID does not exist in the Docker container,
thus the UID and GID are shown instead of a name like `root`. This is important
to understand because to circumvent this file permission issue, we need to
create a user that matches the UID and GID in the host environment.
```console
ls -lrt
# total 2816
# -rw-r--r-- 1 1211 1211 1000015 Apr 27 02:00 ref.fa
# -rw-r--r-- 1 1211 1211 21478 Apr 27 02:00 l100_n100_d400_31_2.fq
# -rw-r--r-- 1 1211 1211 21478 Apr 27 02:00 l100_n100_d400_31_1.fq
# -rw-r--r-- 1 1211 1211 119 Apr 27 02:01 run.sh
# -rw-r--r-- 1 root root 1000072 Apr 27 02:03 ref.fa.bwt
# -rw-r--r-- 1 root root 250002 Apr 27 02:03 ref.fa.pac
# -rw-r--r-- 1 root root 40 Apr 27 02:03 ref.fa.ann
# -rw-r--r-- 1 root root 12 Apr 27 02:03 ref.fa.amb
# -rw-r--r-- 1 root root 500056 Apr 27 02:03 ref.fa.sa
# -rw-r--r-- 1 root root 56824 Apr 27 02:04 aln.sam
```
As mentioned already, having `root` ownership is problematic because when we
are back in the host environment, we can't modify these files. To circumvent
this, we can create a user that matches the host user by passing three
environmental variables from the host to the container.
```console
docker run -it \
-v ~/my_data:/data \
-e MYUID=$(id -u) \
-e MYGID=$(id -g) \
-e ME=$(whoami) \
bwa /bin/bash
```
We use the environment variables and the following steps to create an identical
user inside the container.
```console
adduser --quiet --home /home/san/$ME --no-create-home --gecos "" --shell /bin/bash --disabled-password $ME
# optional: give yourself admin privileges
echo "%$ME ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
# update the IDs to those passed into Docker via environment variable
sed -i -e "s/1000:1000/$MYUID:$MYGID/g" /etc/passwd
sed -i -e "s/$ME:x:1000/$ME:x:$MYGID/" /etc/group
# su - as the user
exec su - $ME
# run BWA again, after you have deleted the old files as root
bwa index ref.fa
bwa mem ref.fa l100_n100_d400_31_1.fq l100_n100_d400_31_2.fq > aln.sam
# check output
ls -lrt
# total 2816
# -rw-r--r-- 1 dtang dtang 1000015 Apr 27 02:00 ref.fa
# -rw-r--r-- 1 dtang dtang 21478 Apr 27 02:00 l100_n100_d400_31_2.fq
# -rw-r--r-- 1 dtang dtang 21478 Apr 27 02:00 l100_n100_d400_31_1.fq
# -rw-r--r-- 1 dtang dtang 119 Apr 27 02:01 run.sh
# -rw-rw-r-- 1 dtang dtang 1000072 Apr 27 02:12 ref.fa.bwt
# -rw-rw-r-- 1 dtang dtang 250002 Apr 27 02:12 ref.fa.pac
# -rw-rw-r-- 1 dtang dtang 40 Apr 27 02:12 ref.fa.ann
# -rw-rw-r-- 1 dtang dtang 12 Apr 27 02:12 ref.fa.amb
# -rw-rw-r-- 1 dtang dtang 500056 Apr 27 02:12 ref.fa.sa
# -rw-rw-r-- 1 dtang dtang 56824 Apr 27 02:12 aln.sam
# exit container
exit
```
This time when you check the file permissions in the host environment, they
should match your username.
```console
ls -lrt ~/my_data
# total 2816
# -rw-r--r-- 1 dtang dtang 1000015 Apr 27 10:00 ref.fa
# -rw-r--r-- 1 dtang dtang 21478 Apr 27 10:00 l100_n100_d400_31_2.fq
# -rw-r--r-- 1 dtang dtang 21478 Apr 27 10:00 l100_n100_d400_31_1.fq
# -rw-r--r-- 1 dtang dtang 119 Apr 27 10:01 run.sh
# -rw-rw-r-- 1 dtang dtang 1000072 Apr 27 10:12 ref.fa.bwt
# -rw-rw-r-- 1 dtang dtang 250002 Apr 27 10:12 ref.fa.pac
# -rw-rw-r-- 1 dtang dtang 40 Apr 27 10:12 ref.fa.ann
# -rw-rw-r-- 1 dtang dtang 12 Apr 27 10:12 ref.fa.amb
# -rw-rw-r-- 1 dtang dtang 500056 Apr 27 10:12 ref.fa.sa
# -rw-rw-r-- 1 dtang dtang 56824 Apr 27 10:12 aln.sam
```
### File Permissions 2
There is a `-u` or `--user` parameter that can be used with `docker run` to run
a container using a specific user. This is easier than creating a new user.
In this example we run the `touch` command as `root`.
```{bash engine.opts='-l'}
docker run -v $(pwd):/$(pwd) ubuntu:22.10 touch $(pwd)/test_root.txt
ls -lrt $(pwd)/test_root.txt
```
In this example, we run the command as a user with the same UID and GID; the
`stat` command is used to get the UID and GID.
```{bash engine.opts='-l'}
docker run -v $(pwd):/$(pwd) -u $(stat -c "%u:%g" $HOME) ubuntu:22.10 touch $(pwd)/test_mine.txt
ls -lrt $(pwd)/test_mine.txt
```
One issue with this method is that you may encounter the following warning (if
running interactively):
```
groups: cannot find name for group ID 1000
I have no name!@ed9e8b6b7622:/$
```
This is because the user in your host environment does not exist in the
container environment. As far as I am aware, this is not a problem; we just
want to create files/directories with matching user and group IDs.
### Read only
To mount a volume but with read-only permissions, append `:ro` at the end.
```{bash engine.opts='-l'}
docker run --rm -v $(pwd):/work:ro davetang/bwa:0.7.17 touch test.txt
```
## Removing the image
Use `docker rmi` to remove an image. You will need to remove any stopped
containers first before you can remove an image. Use `docker ps -a` to find
stopped containers and `docker rm` to remove these containers.
Let's pull the `busybox` image.
```{bash engine.opts='-l'}
docker pull busybox
```
Check out `busybox`.
```{bash engine.opts='-l'}
docker images busybox
```
Remove `busybox`.
```{bash engine.opts='-l'}
docker rmi busybox
```
## Committing changes
Generally, it is better to use a Dockerfile to manage your images in a
documented and maintainable way but if you still want to [commit
changes](https://docs.docker.com/engine/reference/commandline/commit/) to your
container (like you would for Git), read on.
When you log out of a container, the changes made are still stored; type
`docker ps -a` to see all containers and the latest changes. Use `docker
commit` to commit your changes.
```console
docker ps -a
# git style commit
# -a, --author= Author (e.g., "John Hannibal Smith <[email protected]>")
# -m, --message= Commit message
docker commit -m 'Made change to blah' -a 'Dave Tang' <CONTAINER ID> <image>
# use docker history <image> to check history
docker history <image>
```
## Access running container
To access a container that is already running, perhaps in the background (using
detached mode: `docker run` with `-d`) use `docker ps` to find the name of the
container and then use `docker exec`.
In the example below, my container name is `rstudio_dtang`.
```console
docker exec -it rstudio_dtang /bin/bash
```
## Cleaning up exited containers
I typically use the `--rm` flag with `docker run` so that containers are
automatically removed after I exit them. However, if you don't use `--rm`, by
default a container's file system persists even after the container exits. For
example:
```{bash engine.opts='-l'}
docker run hello-world
```
Show all containers.
```{bash engine.opts='-l'}
docker ps -a
```
We can use a sub-shell to get all (`-a`) container IDs (`-q`) that have exited
(`-f status=exited`) and then remove them (`docker rm -v`).
```{bash engine.opts='-l'}
docker rm -v $(docker ps -a -q -f status=exited)
```
Check to see if the container still exists.
```{bash engine.opts='-l'}
docker ps -a
```
We can set this up as a Bash script so that we can easily remove exited
containers. In the Bash script `-z` returns true if `$exited` is empty, i.e. no
exited containers, so we will only run the command when `$exited` is not true.
```{bash engine.opts='-l'}
cat clean_up_docker.sh
```
As I have mentioned, you can use the
[--rm](https://docs.docker.com/engine/reference/run/#clean-up---rm) parameter
to automatically clean up the container and remove the file system when the
container exits.
```{bash engine.opts='-l'}
docker run --rm hello-world
```
No containers.
```{bash engine.opts='-l'}
docker ps -a
```
## Installing Perl modules
Use `cpanminus`.
```console
apt-get install -y cpanminus
# install some Perl modules
cpanm Archive::Extract Archive::Zip DBD::mysql
```
## Creating a data container
This [guide on working with Docker data
volumes](https://www.digitalocean.com/community/tutorials/how-to-work-with-docker-data-volumes-on-ubuntu-14-04)
provides a really nice introduction. Use `docker create` to create a data
container; the `-v` indicates the directory for the data container; the `--name
data_container` indicates the name of the data container; and `ubuntu` is the
image to be used for the container.
```console
docker create -v /tmp --name data_container ubuntu
```
If we run a new Ubuntu container with the `--volumes-from` flag, output written
to the `/tmp` directory will be saved to the `/tmp` directory of the
`data_container` container.
```console
docker run -it --volumes-from data_container ubuntu /bin/bash
```
## R
Use images from [The Rocker Project](https://www.rocker-project.org/), for
example `rocker/r-ver:4.3.0`.
```{bash engine.opts='-l'}
docker run --rm rocker/r-ver:4.3.0
```
## Saving and transferring a Docker image
You should just share the Dockerfile used to create your image but if you need
another way to save and share an image, see [this
post](http://stackoverflow.com/questions/23935141/how-to-copy-docker-images-from-one-host-to-another-without-via-repository)
on Stack Overflow.
```console
docker save -o <save image to path> <image name>
docker load -i <path to image tar file>
```
Here's an example.
```console
# save on Unix server
docker save -o davebox.tar davebox
# copy file to MacBook Pro
scp [email protected]:/home/davetang/davebox.tar .
docker load -i davebox.tar
93c22f563196: Loading layer [==================================================>] 134.6 MB/134.6 MB
...
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
davebox latest d38f27446445 10 days ago 3.46 GB
docker run davebox samtools
Program: samtools (Tools for alignments in the SAM format)
Version: 1.3 (using htslib 1.3)
Usage: samtools <command> [options]
...
```
## Sharing your image
### Docker Hub
Create an account on [Docker Hub](https://hub.docker.com/); my account is
`davetang`. Use `docker login` to login and use `docker push` to push to Docker
Hub (run `docker tag` first if you didn't name your image in the format of
`yourhubusername/newrepo`).
```console
docker login
# create repo on Docker Hub then tag your image
docker tag bb38976d03cf yourhubusername/newrepo
# push
docker push yourhubusername/newrepo
```
### Quay.io
Create an account on [Quay.io](https://quay.io/); you can use Quay.io for free as
stated in their [plans](https://quay.io/plans/):
> Can I use Quay for free?
> Yes! We offer unlimited storage and serving of public repositories. We
> strongly believe in the open source community and will do what we can to
> help!
Use `docker login` to [login](https://docs.quay.io/guides/login.html) and use
the credentials you set up when you created an account on Quay.io.
```console
docker login quay.io
```
Quay.io images are prefixed with `quay.io`, so I used `docker image tag` to create
a new tag of my RStudio Server image. (Unfortunately, the username `davetang`
was taken on RedHat [possibly by me a long time ago], so I have to use
`davetang31` on Quay.io.)
```console
docker image tag davetang/rstudio:4.2.2 quay.io/davetang31/rstudio:4.2.2
```
Push to Quay.io.
```console
docker push quay.io/davetang31/rstudio:4.2.2
```
### GitHub Actions
[login-action](https://github.com/docker/login-action) is used to automatically
login to [Docker Hub](https://github.com/docker/login-action#docker-hub) when
using GitHub Actions. This allows images to be automatically built and pushed
to Docker Hub. There is also support for
[Quay.io](https://github.com/docker/login-action#quayio).
## Tips
Tip from https://support.pawsey.org.au/documentation/display/US/Containers:
each RUN, COPY, and ADD command in a Dockerfile generates another layer in the
container thus increasing its size; use multi-line commands and clean up
package manager caches to minimise image size:
```console
RUN apt-get update \
&& apt-get install -y \
autoconf \
automake \
gcc \
g++ \
python \
python-dev \
&& apt-get clean all \
&& rm -rf /var/lib/apt/lists/*
```
I have found it handy to mount my current directory to the same path inside a
Docker container and to [set it as the working
directory](https://docs.docker.com/engine/reference/commandline/run/#set-working-directory--w);
the directory will be automatically created inside the container if it does not
already exist. When the container starts up, I will conveniently be in my
current directory. In the command below I have also added the `-u` option,
which sets the user to `<name|uid>[:<group|gid>]`.
```console
docker run --rm -it -u $(stat -c "%u:%g" ${HOME}) -v $(pwd):$(pwd) -w $(pwd) davetang/build:1.1 /bin/bash
```
If you do not want to preface `docker` with `sudo`, create a Unix group called
`docker` and add users to it. On some Linux distributions, the system
automatically creates this group when installing Docker Engine using a package
manager. In that case, there is no need for you to manually create the group.
Check `/etc/group` to see if the `docker` group exists.
```console
cat /etc/group | grep docker
```
If the `docker` group does not exist, create the group:
```console
sudo groupadd docker
```
Add users to the group.
```console
sudo usermod -aG docker $USER
```
The user will need to log out and log back in, before the changes take effect.
On Linux, Docker is installed in `/var/lib/docker`.
```console
docker info -f '{{ .DockerRootDir }}'
# /var/lib/docker
```
This may not be ideal depending on your partitioning. To change the default
root directory update the daemon configuration file; the default location on
Linux is `/etc/docker/daemon.json`. This file may not exist, so you need to
create it.
The example below makes `/home/docker` the Docker root directory; you can use
any directory you want but just make sure it exists.
```console
cat /etc/docker/daemon.json
```
```
{
"data-root": "/home/docker"
}
```
Restart the Docker server (this will take a little time, since all the files
will be copied to the new location) and then check the Docker root directory.
```console
sudo systemctl restart docker
docker info -f '{{ .DockerRootDir}}'
```
```
/home/docker
```
Check out the new home!
```console
sudo ls -1 /home/docker
```
```
buildkit
containers
engine-id
image
network
overlay2
plugins
runtimes
swarm
tmp
volumes
```
Use `--progress=plain` to show container output, which is useful for debugging!
```console
docker build --progress=plain -t davetang/scanpy:3.11 .
```
For Apple laptops using the the M[123] chips, use `--platform linux/amd64` if that's the architecture of the image.
```
docker run --rm --platform linux/amd64 -p 8787:8787 rocker/verse:4.4.1/
```
## Useful links
* [Post installation steps](https://docs.docker.com/engine/install/linux-postinstall/)
* [A quick introduction to
Docker](http://blog.scottlowe.org/2014/03/11/a-quick-introduction-to-docker/)
* [The BioDocker project](https://github.com/BioDocker/biodocker); check out
their [Wiki](https://github.com/BioDocker/biodocker/wiki), which has a lot of
useful information
* [The impact of Docker containers on the performance of genomic
pipelines](http://www.ncbi.nlm.nih.gov/pubmed/26421241)
* [Learn enough Docker to be
useful](https://towardsdatascience.com/learn-enough-docker-to-be-useful-b0b44222eef5)
* [10 things to avoid in Docker
containers](http://developers.redhat.com/blog/2016/02/24/10-things-to-avoid-in-docker-containers/)
* The [Play with Docker classroom](https://training.play-with-docker.com/)
brings you labs and tutorials that help you get hands-on experience using
Docker
* [Shifter](https://github.com/NERSC/shifter) enables container images for HPC
* http://biocworkshops2019.bioconductor.org.s3-website-us-east-1.amazonaws.com/page/BioconductorOnContainers__Bioconductor_Containers_Workshop/
* Run the Docker daemon as a non-root user ([Rootless mode](https://docs.docker.com/engine/security/rootless/))