Docker Tips

Tips for working with Docker images and containers

Docker architecture

Preliminaries: terms and principles

Caveat: The following rules of thumb apply to running Docker on a Linux environment. On Mac OS and Windows, Docker behaves somewhat differently, due to differences in the underlying OS architecture on which Docker depends.

Containers

Containers are not virtual environments. Rather, containers "contain" by isolating system resources.
This isolation happens through the use of dedicated namespaces, one per container. This fact explains the mappings we use in the docker-compose.yml file. For example, the port mapping 8984:8080 simply tells Docker to translate requests to port 8984 (as made outside the container) to port 8080 inside the container. Likewise, /opt/scholarspace/scholarspace-derivatives:/opt/scholarspace/scholarspace-derivatives instructs Docker to assign one particular directory inside the container to a directory with the same path that exists outside the container.
Because container namespaces are isolated by default, without this mapping, processes running outside the container cannot access resources or communicate with processes running inside the container.
Stopping a container (docker stop [CONTAINER-NAME]) interrupts any processes running inside the container, but it preserves the container's persistent storage (e.g, files saved to disk).
Removing a container (docker rm [CONTAINER-NAME]) removes the container's namespace. As a result, it also deletes the container's persistent storage (by freeing up those resources for use by the rest of the system).
A stopped container can be restarted (docker start [CONTAINER-NAME]), but a removed container cannot.

Images

A Docker image is similar to a disk image (as used in backing up/restoring your computer, for example).
As an inexact analogy, I picture the Docker image as a frozen, prepackaged dinner bought at the supermarket, and the Docker container made from the image as the result of heating up the frozen dinner in the microwave.
Inexact, because thanks to Docker's clever way of structuring resources (n.b., called "copy-on-write"), the same image can be used to create multiple containers without creating significant overhead on disk. (So maybe it's like a magical frozen dinner that can be eaten any number of times...)
The Dockerfile is just a recipe for creating an image. Most images are built on top of other images, called layers. In fact, each separate command in a Dockerfile (COPY, ADD, RUN, etc.) creates a new layer. The resulting Docker image is a stack of such layers.
When rebuilding an image, only the layers whose contents have changed -- and those layers that come after them in the order of the Dockerfile -- will be recreated. This fact leads to the following principle: aspects of the image that are liable to change more frequently should, where possible, be included after those that change less frequently.
- For reference, in the ScholarSpace Dockerfile, the most expensive layers (in terms of time to build) are those that install ImageMagick and its dependencies, and those that install the Ruby gems for our application.
The last command in a Dockerfile is typically CMD or ENTRYPOINT. The actions performed here -- usually in a separate script for convenience -- technically do not affect the image; rather they are run inside the container as it starts up. (These can be superseded by commands included in a docker run command or in a docker-compose.yml file.) The logic in such scripts can be used to customize a container's runtime environment, depending on certain conditions.
- For instance, the same Dockerfile/image is used to create both the Hyrax app-server container and the sidekiq container, but whereas the former is started with commands that launch Nginx/Passenger, the latter is started with a command that launches Sidekiq. Otherwise, the containers are identical.

Volumes

Changes to the filesystem or system configuration made while a container is running do not outlive the container. This fact leads to the following principles:
- Immutable and invariant files and settings -- those that do not change between releases of the application, and which do not depend on the local environment of installation -- should comprise the Docker image. This category includes application code and application dependencies, as well as some system settings.
- Runtime settings, if implemented at container startup, can be be stored using the container's ephemeral storage.
- Data created by the application should generally be stored using one of two methods, described below.
Docker offers the following methods of persistence (in addition to images):
- Bind mounts: Paths (directories and files) inside the container are mapped to paths outside the container. In this arrangement, Docker will allow processes inside the container to read and write to locations in the host filesystem.
- Docker volumes: Containers write to dedicated, persistent storage managed by the Docker daemon itself.

The following table summarizes key differences between these approaches.

	Bind Mounts	Docker Volumes
Access	Managed by the host system. Container processes not running as root will need permission, granted to their users outside the container.	Managed by Docker. No direct access outside of a running container.
Users	Container users/groups must match those on the host system. Because Docker containers have their own namespaces, a `scholarspace` or `solr` user created inside the container will not be the same as a `scholarspace` or `solr` user created outside the container. Therefore, it is necessary to either a) assign privileges outside the container using the in-container user id (`uid`) and group id (`gid`), or b) create users inside the container matching a `uid` and `gid` known outside the container.	Permissions depend on how the resources were created, in isolation from the host.
Most useful for	- Files that "live" outside the application, such as data that may be migrated. - Files that need to be modified by users outside of the application, such as application code in a development environment.	Application data coupled closely with the application, such as SQL database files.

Docker volume caveats

The relationship between volumes and images is a little counterintuitive. Specifically, changes made to an image (e.g., when the image is rebuilt) will not automatically populate to the Docker volume. When changing an image whose containers use a Docker volume, it's necessary to delete the Docker volume (docker volume rm [VOLUME-NAME]) as well as deleting the image (docker image rm [IMAGE-NAME]) before rebuilding.
As the foregoing demonstrates, it is alarmingly easy to delete a Docker volume. Care is warranted when dealing with volumes that persist important data (e.g., database files).

Networking

Like storage, Docker offers two ways of managing network access (communication via ports, etc.)
In port mapping, ports within the container will be mapped onto ports outside the container.
- Example: In our application, the Fedora Jetty server runs on port 8080 inside its container. This port is mapped to 8984 outside the container, so that from the host, doing curl localhost:8984 connects you to the Fedora server.
- Likewise, the Hyrax app-server container maps 443 to the same port outside the container, allowing HTTPS requests to be routed to the Nginx instance running inside that container.
Using a Docker network driver, we let Docker manage connections between containers.
- Our docker-compose.yml file defines a couple of networks: hyrax and fedora.
- Each container connects to one or both networks.
- Each container that receives connections from others has its own hostname. For example, the Solr container is attached to the hyrax network with the hostname solr-hyrax. This allows the app-server and sidekiq containers to reach the Solr server at an address like http://solr-hyrax:8983. (Comparable to http://localhost:8983 on a non-Dockerized setup.)

ScholarSpace architecture

Production environment

Diagram showing ScholarSpace Docker containers and volumes with connections

Development environment

Diagram showing ScholarSpace Docker containers and volumes with connections

Docker Compose

A docker-compose.yml file is a set of instructions for launching one or more Docker containers.
Containers may be built locally (useful for dev environments) or created from hosted images (best for production).
- To use a hosted image, we use the image directive: image: postgres:9.5.25-alpine
- To build locally, we use a build directive in conjunction with the image directive. The directive below builds an image from the local context and names it scholarspace-app.
```
 image: scholarspace-app
 build: 
  context: .
```
The build context is the directory relative to which the Dockerfile's COPY or ADD instructions are carried out. Usually, this context will be ., referring to the root directory of the repository.
Other Docker Compose directives used in running our application include the following:
- volumes: associates bind-mounts and/or Docker volumes with a given container
- networks: associates Docker networks with a given container
- ports: assigns port mapping (used for connecting to the container from the outside, i.e., from the host machine)
- environment: enumerates environment variables (from a .env file in the same directory as docker-compose.yml) that will be passed into the container. Alternately, the env_file directive can be used to pass in the complete contents of a .env file.
- command: starts a container with a particular command (if this is different from the CMD or ENTRYPOINT commands in the Dockerfile/image).
At the bottom of the docker-compose.yml file, we also define Docker volumes (by simply naming them) and Docker networks. (Configuration for the latter is, for our app, fairly boilerplate).
At the command line, docker compose up -d starts all the containers as background processes. docker compose down gracefully shuts down all containers and deletes them (equivalent to docker stop [CONTAINER-NAME] and docker rm [CONTAINER-NAME]). Note that unlike the equivalent Docker CLI commands, docker compose must be run in the directory that houses the docker-compose.yml file.

Docker commands

The following is a non-exhaustive list of commands useful for interacting with ScholarSpace containers, images, and volumes.

Task	Command	Notes
List running containers	`docker ps`
List running & stopped containers	`docker ps -a`
Show the logs from a container	`docker logs [CONTAINER-NAME]`
Follow the logs from a container	`docker logs [CONTAINER-NAME] -f`
Skip to the last 100 lines of a container's logs	`docker logs --tail 100 [CONTAINER-NAME]`
Search a container's logs	`docker logs [CONTAINER-NAME] 2>&1 \| grep "[SEARCH-STRING]"`
Stop a container	`docker stop [CONTAINER-NAME]`
Restart a container	`docker start [CONTAINER-NAME]`
Delete a stopped container	`docker rm [CONTAINER-NAME]`
Recreate containers after deleting	`docker compose up -d`	Recreates any deleted containers
Open a Bash shell in a container (non-Hyrax)	`docker exec -it [CONTAINER-NAME] /bin/bash`
Open a Bash shell in a Hyrax container	`docker exec -it --user scholarspace [CONTAINER-NAME] bash -l`	The slightly different syntax is due to the nature of the underlying image.
Run a command in a Hyrax container (e.g., a Rake task)	`docker exec -it --user scholarspace [CONTAINER-NAME] bash -lc "[COMMAND]"`
Show all images	`docker image ls`
Delete an image	`docker image rm [IMAGE-NAME]`	Deleting a container and then its image will force the image to be rebuilt/redownloaded
Show all Docker volumes	`docker volume ls`
Delete a Docker volume	`docker volume rm [VOLUME-NAME]`	Use carefully! This will irrevocable delete all volume contents, leading to permanent data loss.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly