Containerization Continued

 

Volumes

The file system of a Docker container is designed to be ephemeral, that is changes made to the file system do not persist when the container is stopped and re-started. This is by design as containers are designed to be treated as disposable, while images are the permanent artifact.

Docker does support volumes which are capable of storing data permanently. A volume is separate from a container and image. The physical computing analogy is an external hard drive which can be plugged into a computer and which persists after the computer is shut down.

The general layout of how this works is:

For example, if we were making a WordPress website as a Docker container, we would have an image containing the LAMP stack (Linux, Apache, MySQL and PHP binary files), and the PHP code and configuration files for WordPress. We would setup a volume to contain the WordPress database. That way when the container is running and we edit the site (making posts or changing settings) they persist in the database. Every other aspect of the file system can be treated as read-only.

The benefit of this is it makes you think about what data an application needs to have persist and allow everything else to be ephemeral.

Volumes can be created through Docker Desktop or with the command:

$ docker volume create site-data

Here "site-data" is the name of the volume. When a container is launched, we can then mount a volume to a Linux directory. This can be done through the Docker Desktop GUI or through a command like:

$ docker run -v site-data:/var/www/html wordpress

 

Dockerfiles

While Docker Hub contains images for many popular applications, it's also common to create custom images. This is done by writing a Dockerfile, which is a text file which defines an image.

Below is an example for a custom image for a web application:


FROM python:3.12-alpine

WORKDIR /todo-app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY todo-app.py .

EXPOSE 8080

CMD ["python3", "todo-app.py"]

The pieces of this file are:


 

Docker Compose

Dockerfiles like the one above can be used to create custom images. We then can run them as containers to implement applications. However, it's very common to implement an application with multiple containers working together.

Consider a web application for an online store. It would have a web-front end, a database for storage, and code for tracking inventory and dealing with checkout and payment. We could install the software for all of these things in one container, but the recommended solution is to split them up into separate containers. Reasons for doing this include:

This idea is called "micro-services". We break a large application into smaller building blocks which can be run in separate containers. This is as opposed to a "monolithic architecture" where the software is all run together.

Docker has a command called "compose" which takes a compose file. This file is written in YAML syntax (like Ansible) and specifies what containers should be launched and how they interact. Here is a compose file taken from the WordPress page on Docker Hub:


services:

  wordpress:
    image: wordpress
    restart: always
    ports:
      - 8080:80
    environment:
      WORDPRESS_DB_HOST: db
      WORDPRESS_DB_USER: exampleuser
      WORDPRESS_DB_PASSWORD: examplepass
      WORDPRESS_DB_NAME: exampledb
    volumes:
      - wordpress:/var/www/html

  db:
    image: mysql:8.0
    restart: always
    environment:
      MYSQL_DATABASE: exampledb
      MYSQL_USER: exampleuser
      MYSQL_PASSWORD: examplepass
      MYSQL_RANDOM_ROOT_PASSWORD: '1'
    volumes:
      - db:/var/lib/mysql

volumes:
  wordpress:
  db:

This specifies two containers which should be run: one for a web server containing the WordPress application and a second for the database. They are both based on official images (but could be custom ones built from Dockerfiles).

This compose file uses the environment variables to make sure the database connection happens correctly. It also uses restart:always to restart the containers if they stop for any reason.

This could be launched using the docker compose up command in the same directory that this compose.yaml file exists. Docker will make sure the images are downloaded, and launch containers for them.

It also creates a virtual network for the two containers to communicate on. The name of the container, "wordpress" and "db" here, are the hostnames of the containers on this network. This allows them to communicate together. In this case, the database transactions from the WordPress container will be sent to the database container over this network.


 

Orchestration

For relatively small applications, running a handful of containers, this would be all that would be needed. However in environments with many physical machines running many containers at a time, there are additional concerns:

Doing these things with containers is called orchestration. Docker has an orchestration tool called "Docker Swarm" which is built-in and uses the same YAML syntax as Docker Compose.

However the most popular orchestration tool in industry is Kubernetes, which was initially developed by Google. The main idea behind Kubernetes is you specify your hardware (whether physical machines on a network or virtual machines), and a set of services. It is in charge of scheduling the services across your hardware.