Skip to content

Docker Fundamentals

Siddharth Rawat edited this page Dec 14, 2023 · 3 revisions

Linux Containers

Containers are not virtual machines, rather, they are ephemeral (temporary), they come and go readily than a traditional virtual machine. Virtual machines are stand-ins for real hardware. Virtual machines are often long-lived, since they are abstracting a real server.

Containers are small and lightweight since they are just processes and a reference to a layered file-system image (and configuration metadata). There is no copy of data allocated to a container. The container processes run on the Linux kernel, docker daemon is one of the many user space tools/libraries that talks to the kernel to setup containers. They just share the host kernel with other containerized processes.

What make a process a Container?

  • Each containerized process is isolated from other processes running on the same Linux host, using kernel namespaces. Kernel namespaces provide a virtualized world for the container processes to run in.
  • Resources consumed by each container processes (memory, cpu, I/O, etc.) are confined to specified limits, using Linux control group (cgroups). This helps eliminate noisy neighbor problems (by keeping one container from over-consuming Linux host resources and starving other containers).
  • The ability to isolated containerized processes and confine the resources they consume is what enables multiple application containers to run more securely on a shared Linux host. The combination of isolation and resource confinement is what makes a Linux process a Linux container.

Control Groups (cgroups)

  • Resource metering and limiting
    • memory, CPU, block I/O, network and device node(/dev/) access control
  • Each subsystem (memory, CPU, etc) has a hierarchy (tree)
  • Each process belongs to exactly 1 node in each hierarchy
  • Each hierarchy starts with 1 node (root)
  • Each node = group of processes (sharing the same resource)
  • When a process is created, it is placed in the same groups as its parent.

Namespaces

  • Provides processes with their own view of the system
  • cgroups limit how much resources you can use, therefore namespaces limit what you can see and use.
  • Each process is in one namespace of each type. Processes with a PID namespace only see processes in the same PID namespace.
  • Each PID namespace has its own numbering (starting with 1). When PID 1 goes away, the whole namespace is killed.

Motivation for containerization

  • Ease of deployment: Write Once Run Anywhere.
  • Isolation: Reduced interference between applications.
  • Efficiency: Better resource utilization.
  • Better packaging: Bundling application software and required OS file-systems together in a single standardized image format.
  • Using packaged artifacts to test and deliver the exact same artifact to all systems in the environment.
  • Abstract software applications from the hardware without sacrificing resources.

Docker

Docker is a platform that enables developers to automate the deployment of applications inside lightweight, portable containers. Containers are a form of virtualization that allows applications and their dependencies to be packaged together in a consistent and reproducible way. Docker provides a standardized way to package, distribute, and run software in containers.

The docker container engine manages the configuration of Linux kernel namespaces, additional security features and cgroups. Docker introduced a layered packaging format for content that runs inside containers. This made it easy for developers to run containers on their local machines and create immutable images that would run consistently across other machines and in different environments. The runtime for these containers isn't Docker, it's Linux.

Features

Key features of Docker include:

  • Containerization: Docker uses container technology to encapsulate applications and their dependencies. Containers are isolated from each other and from the underlying system, ensuring consistency across different environments.
  • Portability: Docker containers can run on any system that supports Docker, whether it's a developer's laptop, a test environment, or a production server. This portability eliminates the "it works on my machine" problem, making it easier to move applications between different environments.
  • Versioning and Image Registry: Docker uses images, which are lightweight, standalone, and executable packages that include everything needed to run a piece of software, including the code, runtime, libraries, and system tools. These images can be versioned and stored in a registry, making it easy to share and distribute applications.
  • Dockerfile: Docker images are created using a script called a Dockerfile. The Dockerfile contains instructions for building the image, specifying the base image, adding dependencies, and configuring the environment.
  • Orchestration: Docker can be used in conjunction with orchestration tools like Docker Compose and Kubernetes to manage the deployment, scaling, and networking of containerized applications in a clustered environment.
  • Resource Efficiency: Containers share the host system's kernel and do not require a full operating system, resulting in efficient resource utilization and faster startup times compared to traditional virtual machines.

Every container is based on an image. The image is the underlying definition of a running container. A docker container image is a standard TAR file that combines rootfs (container root file system) and a JSON file (container configuration). Docker "tars-up" the rootfs and the JSON file to create the base image. This enables us to install additional content on the rootfs, create a new JSON file, and tar the difference between the original image with updated JSON file. This creates a layered image.

The definition of a container image was eventually standardized by the Open Container Initiative (OCI) standards body as the OCI Image Specification.

Installation

To use docker, we need to have the Docker desktop installed on our machines. We can do this by visiting https://www.docker.com/products/docker-desktop/ to install the latest version of the software from the official source.

Working with Docker

To dockerize an application, we need to create a Dockerfile that will help the docker service to understand and containerize our application. Docker can build images automatically by reading the instructions from a Dockerfile. A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image. This page describes the commands you can use in a Dockerfile.

Each line in a Dockerfile creates a new image layer that is stored by Docker. The layer contains all the changes that are a result of that command being issued. This means when we build new images, docker will only need to build layers that deviate from our previous builds, we can reuse all the layers that haven't changed.

Once we build the Docker image, we can push this image to a container registry, which is a web-service from which we can pull these images. Container engines are programs that can pull these images and launch container runtime(s).

Container storage is usually a copy-on-write (COW) layered filesystem. When we pull a container image from a container registry, we first need to untar the rootfs and place it on the disk. If we have multiple layers that make up our image, each layer is downloaded and stored on a different layer on the COW filesystem. This COW filesystem allows each layer to be stored separately, which maximizes sharing for layered images.

NOTE: The order of commands in a Dockerfile can have a very significant impact on ongoing build times.

Here is an example of a Dockerfile that will create a docker image of an application:

# which node to use for your container
FROM node:lts-alpine

# the directory where the appication will be copied to in your container
WORKDIR /usr/src/app

# copy the files required that define the application dependencies to the container WORKDIR
COPY package*.json ./

# commands to run to install dependencies
# The RUN instruction will execute any commands in a new layer on top of the current image and
# commit the results. The resulting committed image will be used for the next step in the Dockerfile
RUN npm install

# copy remaining contents of the base(app) directory to the container WORKDIR
COPY . .

# port that must be exposed to the client where the application is running in the container
# EXPOSE specifies which network port the container will listen on at runtime. It does not actually
# publish the port. It is a type of documentation about which ports are intended to be published.
# NOTE: we will need to use the -p flag when running a container with this image to expose ports
EXPOSE 3000

# command to start the application in the container
# CMD is used to provide defaults for an executing container. There can only be one CMD instruction in
# a Dockerfile, listing more than one will result in only the last CMD to take effect.
CMD ["yarn", "start"]

# ENTRYPOINT allows us to configure a container that will run as an executable.
# ENTRYPOINT ["npm", "run", "start"] # OR ENTRYPOINT npm run start

Both CMD and ENTRYPOINT instructions define what commands get executed when running a container. A Dockerfile must specify at-least one of CMD or ENTRYPOINT commands. CMD should be used as a way of defining default args for an ENTRYPOINT command or for executing ad-hoc commands in a container. The CMD will be overridden when running the container with alternative args.

To ignore files to be copied into the container WORKDIR, we can use a special file called .dockerignore, which will ignore those files when using the COPY command in the Dockerfile during a build.

Example:

# https://docs.docker.com/build/building/context/#dockerignore-files/
.env
node_modules/
*.log
logs/
k8s/

For more information on the instructions supported for Dockerfiles, refer the original documentation.

The Dockerfile will build a docker image with a DIGEST and OS/ARCH (the architecture of your workstation). In order to build images for other hardware architectures, use Docker buildx (see below).

Building images & running containers

  • To create a simple docker image, we can use the docker build command:

    # -t option to tag the image with a version
    # if a version is not provided, the default tag is 'latest'
    # the '.' option at the end is to build the current working directory
    docker build -t <image-name>:<image-version> -f <optional-dockerfile-name> .

    Example:

    docker build -t api:v2 -f Dockerfile.dev .
  • To verify your image is built and ready:

    # lists all images that docker has built or pulled from docker hub
    docker image ls -a
  • To create and run a container:

    # -ti is to run the container in an interactive terminal mode
    # --name is to name your container
    # -p is to expose the port from the container to the local machine where the container is running
    docker run -ti --name <container-name> \
    -p <local-machine-port>:<container-port> \
    <image-name>:<image-version>
  • To stop a running container:

    docker container stop <container-name>
  • To restart a stopped container:

    docker container start <container-name>
  • A few things to remember when building docker images:

    • The digest remains the same if image is same (immutable)
    • Tags are not immutable (mutable)
    • After running in detached mode we can use docker logs to observe the logs
    • Containers are immutable (View Only)
    • When re-building a docker image, the cache from the previous build are used. Hence it is important to pay attention to how you define the steps for your build in the Dockerfile
    • --no-cache to ignore the cache to build an image

Debugging containers

To debug containers must have a Shell access. We can then access the container using the following command:

docker exec -ti <container-id> /bin/bash
# OR
docker exec -ti <container-name> /bin/sh

Once inside the container, we have access to all resources within the container. We mostly use this in order to check logs and resolve errors in case there are any.

We can refer to the container we need to SSH into using the container name or the container id, either work. Same goes for the default shell we want to use after we ssh into the container. We can use bash or sh.

We can also use the docker logs command to get the logs for a currently running container:

docker logs <container-id>
docker logs -f <container-id>

Bind Mounts

The -v flag represents the volume in our container, where our application will be uploaded in the Docker image.

docker run --name <container-name> \
-p <local-machine-port>:<container-port> \
-v <local-machine-dir>:<container-WORKDIR-PATH> \
<image-name>:<image-version>

Example:

docker run --name rest-api-container-2 \
-p 3000:1337 \
-v $(pwd):/usr/src/app \
api:v1

NOTE: If you are on a windows computer, $(pwd) will not work. Instead, use %cd%.

Anonymous Volume

To leave the node_modules directory (and other such directories in your application) untouched in the local machine directory, and to not track it in the docker container, we use anonymous volumes.

docker run --name <container-name> \
-p <local-machine-port>:<container-port> \
-v <local-machine-dir>:<container-WORKDIR-PATH> \
-v <container-WORKDIR/node_modules> \
<image-name>:<image-version>

Example:

docker run --name rest-api-container-3
-p 3000:1337 \
-v $(pwd):/usr/src/app \
-v /usr/src/app/node_modules \
api:v1

Since we do not have the : colon to link this volume in our container to the local machine directory, this is an anonymous volume.

Stats & Resource Limits

  • To check the stats of a running container:

    docker stats <container-id>
  • If at container launch time, we want to restrict the resources a container uses (by default all use the same resources), we can use resource constraints on memory and cpu:

    # `b`,`k`,`m`,`g` indicate bytes, kilobytes, megabytes or gigabytes
    # 1 cpu = 1000m
    # `-m` flag (memory) to limit the resources during running a container
    docker run --cpus="0.1" -m="8m"

Multi-Platform Builds

Create a new builder instance, reference.

docker buildx create --name=custom_builder --use
docker buildx ls
# If you want to push the image then while building image add --push flag
docker buildx build --platform=linux/amd64,linux/arm64 -f Dockerfile.app --no-cache -t sydrawat01/api:latest -t sydrawat01/api:0.0.2 .