Containerization Continued
Volumes
The file system of a Docker container is designed to be ephemeral, that is changes made to the file system do not persist when the container is stopped and re-started. This is by design as containers are designed to be treated as disposable, while images are the permanent artifact.
Docker does support volumes which are capable of storing data permanently. A volume is separate from a container and image. The physical computing analogy is an external hard drive which can be plugged into a computer and which persists after the computer is shut down.
The general layout of how this works is:
- Images store the read-only data needed for an application. This includes underlying Linux libraries and binaries, configuration files, the code and data files for whatever application we are running in the container.
- Containers are launched from images and then executed. Writes to the files in the file system do not persist.
- Volumes are created explicitly to store data that the application needs in a persistent way. When writes to the volume happen, the data is safe.
For example, if we were making a WordPress website as a Docker container, we would have an image containing the LAMP stack (Linux, Apache, MySQL and PHP binary files), and the PHP code and configuration files for WordPress. We would setup a volume to contain the WordPress database. That way when the container is running and we edit the site (making posts or changing settings) they persist in the database. Every other aspect of the file system can be treated as read-only.
The benefit of this is it makes you think about what data an application needs to have persist and allow everything else to be ephemeral.
Volumes can be created through Docker Desktop or with the command:
$ docker volume create site-data
Here "site-data" is the name of the volume. When a container is launched, we can then mount a volume to a Linux directory. This can be done through the Docker Desktop GUI or through a command like:
$ docker run -v site-data:/var/www/html wordpress
Dockerfiles
While Docker Hub contains images for many popular applications, it's also common to create custom images. This is done by writing a Dockerfile, which is a text file which defines an image.
Below is an example for a custom image for a web application:
FROM python:3.12-alpine
WORKDIR /todo-app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY todo-app.py .
EXPOSE 8080
CMD ["python3", "todo-app.py"]
The pieces of this file are:
- The
FROMcommand defines the base image, which in this case is an image containing a Python 3 interpreter on top of Alpine Linux. - The
WORKDIRcommand sets the present working directory for subsequent commands, creating it if it does not exist. - The
COPYcommand copies files from the host machine into the image file. The first argument is the path on the host and the second is in the image file. - The
RUNcommand executes a command when the image is being built and creates a file system layer with the results. In this case, it's to install Python libraries our application needs. - The
EXPOSEcommand lists the port that the application will communicate on. We still need to specify this port when launching the container. - The
CMDcommand specifies what command will be run when a container is launched from this image.
Docker Compose
Dockerfiles like the one above can be used to create custom images. We then can run them as containers to implement applications. However, it's very common to implement an application with multiple containers working together.
Consider a web application for an online store. It would have a web-front end, a database for storage, and code for tracking inventory and dealing with checkout and payment. We could install the software for all of these things in one container, but the recommended solution is to split them up into separate containers. Reasons for doing this include:
- It provides better encapsulation of logic. The code for dealing with different parts of an application do not need to be tightly coupled and running them separately enforces this.
- We can update one part of the application without needing to interact with the others.
- We can use official containers for server and database systems, such as those published on Docker Hub. For example rather than install MySQL in a large image, we can just grab the official MySQL database image from Docker which is maintained for us.
- We can scale the different parts of our application separately. If the web application is taking a lot of resources, but the database is not, we can launch more containers running instances of the web application, but just keep one running the database. That wouldn't be possible if they were in one container.
This idea is called "micro-services". We break a large application into smaller building blocks which can be run in separate containers. This is as opposed to a "monolithic architecture" where the software is all run together.
Docker has a command called "compose" which takes a compose file. This file is written in YAML syntax (like Ansible) and specifies what containers should be launched and how they interact. Here is a compose file taken from the WordPress page on Docker Hub:
services:
wordpress:
image: wordpress
restart: always
ports:
- 8080:80
environment:
WORDPRESS_DB_HOST: db
WORDPRESS_DB_USER: exampleuser
WORDPRESS_DB_PASSWORD: examplepass
WORDPRESS_DB_NAME: exampledb
volumes:
- wordpress:/var/www/html
db:
image: mysql:8.0
restart: always
environment:
MYSQL_DATABASE: exampledb
MYSQL_USER: exampleuser
MYSQL_PASSWORD: examplepass
MYSQL_RANDOM_ROOT_PASSWORD: '1'
volumes:
- db:/var/lib/mysql
volumes:
wordpress:
db:
This specifies two containers which should be run: one for a web server containing the WordPress application and a second for the database. They are both based on official images (but could be custom ones built from Dockerfiles).
This compose file uses the environment variables to make sure the database
connection happens correctly. It also uses restart:always to restart
the containers if they stop for any reason.
This could be launched using the docker compose up command in
the same directory that this compose.yaml file exists. Docker will
make sure the images are downloaded, and launch containers for them.
It also creates a virtual network for the two containers to communicate on. The name of the container, "wordpress" and "db" here, are the hostnames of the containers on this network. This allows them to communicate together. In this case, the database transactions from the WordPress container will be sent to the database container over this network.
Orchestration
For relatively small applications, running a handful of containers, this would be all that would be needed. However in environments with many physical machines running many containers at a time, there are additional concerns:
- Balancing the load of containers so that physical resources are used efficiently
- Scaling up certain services relative to others
- Updating services gradually to avoid downtime
Doing these things with containers is called orchestration. Docker has an orchestration tool called "Docker Swarm" which is built-in and uses the same YAML syntax as Docker Compose.
However the most popular orchestration tool in industry is Kubernetes, which was initially developed by Google. The main idea behind Kubernetes is you specify your hardware (whether physical machines on a network or virtual machines), and a set of services. It is in charge of scheduling the services across your hardware.