Or: How to Move Away From Vagrant


Docker is a new generation of virtualisation (around 3 years old) that makes building complex software stacks much easier and more isolated than previously. Now when I say previously I'm talking about Vagrant, Chef, Puppet, etc. These all work on the basis of creating a base image (containing the OS and some basic software), and using scripts to install whatever other software and configuration is required. This results in an entire image that can be snapshotted and deployed easily and reliably... or so I thought...

A Little Backstory

We have been using PuPHPet at work for at least a year now. PuPHPet uses a YAML configuration file with a heaps of middleware scripts to build the machine. Making changes to the machine means you only have to change the YAML configuration and vagrant provision again. Originally it was heaven to be able to use their scripts to do all the nasty bash/puppet that would have had to be written previously. Initially the only real downside was the huge amount of files that had to be committed with the project; a fair trade.

As time went on so did the changes to PuPHPet, rapidly. Changes were made to the format of the YAML configuration itself, different combinations of packages were no longer supported for no good reason other than the project only wanted to move forward and concentrate on the latest. Worst of all was the underlying subsystem that it was based on, Puppet and Ruby had ever changing dependencies that meant when a new developer went to build a new machine it was almost always broken from what was working just a month before.

In the space of a year we (I) had to do a complete rebuild the base configuration several times which always causes massive disruption to the team. Finally the combination of OS (Centos-flavoured linux) and PHP (we had a requirement for PHP 5.4) failed as packages in the default repositories were simply not there anymore.

Enough was enough, we needed something more reliable.

Introducing Docker

The Docker documentation sucks; it’s some of the worst I’ve seen. For me it just added to the confusion. It is written with the assumption that you are already very familiar with container-based infrastructure and simply doesn’t explain anything, or has any congruency with trying to find what you looking for.

I want to highlight some of the “obvious" things that are very simple to understand once explained, but the aren't easy to discover without spending time with trial and error. Hopefully this will make moving to, and understanding Docker a lot easier for you.

1. Containers

Docker uses a very different kind of virtualisation called containers. A container can be thought as a completely self-contained machine, for all intents and purposes it has its own OS, its own file system and anything else you would expect to find in a virtualised machine. But the catch is that a container only runs one program. For example you may have a MySQL server running in a container and Redis running in a separate container.

Even though each container works as a self-contained OS it does not require the same resources as a dedicated virtualised OS. Many containers can share the same physical resources (like CPU and memory) against a host.

† A container can actually run more than one process. However you probably wouldn't do this unless you had a special reason to. In almost all cases it's best to run one process or service in a single container.


2. Images

Images are a snapshot of the file system, however they are always based on a another image. For example if we took an image of the container and it was 200mb, then installed 10mb worth of software and took another image, it would only be 10mb for that image because it only contains the changes since the previous base image.

The image does not contain the kernel, so it's not uncommon for images to be just a few megabytes.

Images are cached which makes rebuilding containers very, very fast.


3. Stateless

Each container can have directories (zero or more) mounted to it from the host. For example if you were running an Apache web server container you would not load the source files onto the container itself. Rather you would mount a directory of the host operating system (containing the files for the web server) to a directory of the container, like:

/Users/elliot/Development/mywebsite -> /var/www/html

This makes the containers (and images) stateless. Containers can be restarted and images can be destroyed without affecting the application. It also makes the images much smaller and reusable. Another advantage is that several containers can share the same mounted directory. For example, if you had several web servers serving the same files.

4. The Dockerfile

A Dockerfile is a text file (usually held in the root of your project) that has the steps required to build an image. This is akin to the bash script that you would use to install software or setup environment variables. A Dockerfile looks like this:

FROM php:7.0-apache
COPY config/php.ini /usr/local/etc/php/
ENV APP_ENV dev

Each line is a command. The first line is always a FROM command that specifies the base image which we build upon. Each step creates a new image, but each image only contains the changes since the last snapshot (previous command).

If your containers are stateless then you should be able to change the Dockerfile and rebuild the containers very quickly and easily.


5. Multiple Containers

It is unlikely that your application will only require a single container. Usually you will have several containers for other services like a database, web service, background tasks, etc. For this we use the docker-compose command.

docker-compose uses a very simple YAML file to build multiple containers. Each container can have its own Dockerfile that customises the individual container but docker-compose will build all the containers and put them into the same virtual network.

Here is an example of a docker-compose.yml (usually in the root directory of the project) to build a Wordpress application:

version: '2'
services:
db:
image: mysql:5.7
volumes:
- "./.data/db:/var/lib/mysql"
restart: always
environment:
MYSQL_ROOT_PASSWORD: wordpress
MYSQL_DATABASE: wordpress
MYSQL_USER: wordpress
MYSQL_PASSWORD: wordpress

wordpress:
depends_on:
- db
image: wordpress:latest
links:
- db
ports:
- "8000:80"
restart: always
environment:
WORDPRESS_DB_HOST: db:3306
WORDPRESS_DB_PASSWORD: wordpress

Running docker-compose up will create two containers called db and wordpress. They will be put into the same virtual network.


6. Container Networking

Containers that are built with docker-compose are put into the same virtual network. This can be configured however you like (from the YAML) but there are some things to understand:

  1. Containers (if permitted) use the name of the container as the hostname. For example if the wordpress container wanted to connect to the db container it would use db as the hostname.
  2. By default containers are not able to connect to other containers unless configured to do so. To allow the wordpress service to access the db service it has to be explicitly stated in the links property.
  3. By default the host operating system cannot access ports on a container. Since wordpress is a web server we need to map the internal port 80 to a local port on the host of 8080. This means when we put localhost:8080 into the browser of the host operating system we can see our Wordpress application.


Summing Up

This was an extremely brief overview. I hope to explore individual topics in more detail in future articles, so stay tuned!

There is a more in-depth article on containers and images.