As a trendy software engineer, I use Docker because it’s a nice way to try software without environment setup hassle. But as an SRE/DevOps kinda guy I also create my own images – for CI environment, for experimenting and sometimes even for production.
We all know that Docker images are built with Dockerfiles but in my not so humble opinion, Dockerfiles are silly - they are fragile, makes bloated images and look like crap. For me, building Docker images was tedious and grumpy work until I’ve found Ansible. The moment when you have your first Ansible playbook work you’ll never look back. I immediately felt grateful for Ansible’s simple automation tools and I started to use Ansible to provision Docker containers. During that time I’ve found Ansible Container project and tried to use it but in 2016 it was not ready for me. Soon after I’ve found Hashicorp’s Packer that has Ansible provisioning support and from that moment I use this powerful combo to build all of my Docker images.
Hereafter, I want to show you an example of how it all works together, but first let’s return to my point about Dockerfiles.
In short, because each line in Dockerfile creates a new layer. While it’s awesome to see the layered fs and be able to reuse the layers for other images, in reality, it’s madness. Your images size grows without control and now you have a 2GB image for a python app, and 90% of your layers are not reused. So, actually, you don’t need all these layers.
To squash layers, you either use do some additional steps like invoking
docker-squash
or you have to give as little commands as possible. And that’s why in real
production Dockerfiles we see way too much &&
s because chaining RUN
commands with &&
will create a single layer.
To illustrate my point, look at the 2 Dockerfiles for the one of the most popular docker images – Redis and nginx. The main part of these Dockerfiles is the giant chain of commands with newline escaping, inplace config patching with sed and cleanup as the last command.
RUN set -ex; \
\
buildDeps=' \
wget \
\
gcc \
libc6-dev \
make \
'; \
apt-get update; \
apt-get install -y $buildDeps --no-install-recommends; \
rm -rf /var/lib/apt/lists/*; \
\
wget -O redis.tar.gz "$REDIS_DOWNLOAD_URL"; \
echo "$REDIS_DOWNLOAD_SHA *redis.tar.gz" | sha256sum -c -; \
mkdir -p /usr/src/redis; \
tar -xzf redis.tar.gz -C /usr/src/redis --strip-components=1; \
rm redis.tar.gz; \
\
# disable Redis protected mode [1] as it is unnecessary in context of Docker
# (ports are not automatically exposed when running inside Docker, but rather explicitly by specifying -p / -P)
# [1]: https://github.com/antirez/redis/commit/edd4d555df57dc84265fdfb4ef59a4678832f6da
grep -q '^#define CONFIG_DEFAULT_PROTECTED_MODE 1$' /usr/src/redis/src/server.h; \
sed -ri 's!^(#define CONFIG_DEFAULT_PROTECTED_MODE) 1$!\1 0!' /usr/src/redis/src/server.h; \
grep -q '^#define CONFIG_DEFAULT_PROTECTED_MODE 0$' /usr/src/redis/src/server.h; \
# for future reference, we modify this directly in the source instead of just supplying a default configuration flag because apparently "if you specify any argument to redis-server, [it assumes] you are going to specify everything"
# see also https://github.com/docker-library/redis/issues/4#issuecomment-50780840
# (more exactly, this makes sure the default behavior of "save on SIGTERM" stays functional by default)
\
make -C /usr/src/redis -j "$(nproc)"; \
make -C /usr/src/redis install; \
\
rm -r /usr/src/redis; \
\
apt-get purge -y --auto-remove $buildDeps
All of this madness is for the sake of avoiding layers creation. And that’s where I want to ask a question – is this the best way to do things in 2017? Really? For me, all these Dockerfiles looks like a poor man’s bash script. And gosh, I hate bash. But on the other hand, I like containers, so I need a neat way to fight this insanity.
Instead of putting raw bash commands we can write a reusable Ansible role invoke it from the playbook that will be used inside Docker container to provision it.
This is how I do it
FROM debian:9
# Bootstrap Ansible via pip
RUN apt-get update && apt-get install -y wget gcc make python python-dev python-setuptools python-pip libffi-dev libssl-dev libyaml-dev
RUN pip install -U pip
RUN pip install -U ansible
# Prepare Ansible environment
RUN mkdir /ansible
COPY . /ansible
ENV ANSIBLE_ROLES_PATH /ansible/roles
ENV ANSIBLE_VAULT_PASSWORD_FILE /ansible/.vaultpass
# Launch Ansible playbook from inside container
RUN cd /ansible && ansible-playbook -c local -v mycontainer.yml
# Cleanup
RUN rm -rf /ansible
RUN for dep in $(pip show ansible | grep Requires | sed 's/Requires: //g; s/,//g'); do pip uninstall -y $dep; done
RUN apt-get purge -y python-dev python-pip
RUN apt-get autoremove -y && apt-get autoclean -y && apt-get clean -y
RUN rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp* /usr/share/doc/*
# Environment setup
ENV HOME /home/test
WORKDIR /
USER test
CMD ["/bin/bash"]
Drop this Dockerfile to the root of your Ansible repo and it will build Docker image using your playbooks, roles, inventory and vault secrets.
It works, it’s reusable, e.g. I have some base roles that applied for docker container and on bare metal machines, provisioning is easier to maintain in Ansible. But still, it feels awkward.
So I went a step further and started to use Packer. Packer is a tool specifically built for creating of machine images. It can be used not only to build container image but VM images for cloud providers like AWS and GCP.
It immediately hooked me with these lines in the documentation:
Packer builds Docker containers without the use of Dockerfiles. By not using Dockerfiles, Packer is able to provision containers with portable scripts or configuration management systems that are not tied to Docker in any way. It also has a simple mental model: you provision containers much the same way you provision a normal virtualized or dedicated server.
That’s what I wanted to achieve previously with my Ansiblized Dockerfiles.
So let’s see how we can build Redis image that is almost identical to the official.
First, let’s create a playground dir
$ mkdir redis-packer && cd redis-packer
Packer is controlled with a declarative configuration in JSON format. Here is ours:
{
"builders": [{
"type": "docker",
"image": "debian:jessie-slim",
"commit": true,
"changes": [
"VOLUME /data",
"WORKDIR /data",
"EXPOSE 6379",
"ENTRYPOINT [\"docker-entrypoint.sh\"]",
"CMD [\"redis-server\"]"
]
}],
"provisioners": [{
"type": "ansible",
"user": "root",
"playbook_file": "provision.yml"
}],
"post-processors": [[ {
"type": "docker-tag",
"repository": "docker.io/alexdzyoba/redis-packer",
"tag": "latest"
} ]]
}
Put this in redis.json
file and let’s figure out what all of this means.
First, we describe our builders – what kind of image we’re going to build. In
our case, it’s a Docker image based on debian:jessie-slim
. commit: true
tells
that after all the setup we want to have changes committed. The other option is
export to tar archive with the export_path
option.
Next, we describe our provisioner and that’s where Ansible will step in the game. Packer has support for Ansible in 2 modes – local and remote.
Local mode ("type": "ansible-local"
) means that Ansible will be launched
inside the Docker container – just like my previous setup. But Ansible won’t be
installed by Packer so you have to do this by yourself with shell
provisioner
– similar to my Ansible bootstrapping in Dockerfile.
Remote mode means that Ansible will be run on your build host and connect to the container via SSH, so you don’t need a full-blown Ansible installed in Docker container – just a Python interpreter.
So, I’m using remote Ansible that will connect as root user and launch
provision.yml
playbook.
After provisioning is done, Packer does post-processing. I’m doing just the tagging of the image but you can also push to the Docker registry.
Now let’s see the provision.yml playbook:
---
- name: Provision Python
hosts: all
gather_facts: no
tasks:
- name: Boostrap python
raw: test -e /usr/bin/python || (apt-get -y update && apt-get install -y python-minimal)
- name: Provision Redis
hosts: all
tasks:
- name: Ensure Redis configured with role
import_role:
name: alexdzyoba.redis
- name: Create workdir
file:
path: /data
state: directory
owner: root
group: root
mode: 0755
- name: Put runtime programs
copy:
src: files/{{ item }}
dest: /usr/local/bin/{{ item }}
mode: 0755
owner: root
group: root
with_items:
- gosu
- docker-entrypoint.sh
- name: Container cleanup
hosts: all
gather_facts: no
tasks:
- name: Remove python
raw: apt-get purge -y python-minimal && apt-get autoremove -y
- name: Remove apt lists
raw: rm -rf /var/lib/apt/lists/*
The playbook consists of 3 plays:
To provision container (or any other host) for Ansible, we need to install
Python. But how install Python via Ansible for Ansible?
There is a special Ansible raw
module for exactly this
case – it doesn’t require Python interpreter because it does bare shell
commands over SSH. We need to invoke it with gather_facts: no
to skip invoking
facts gathering which is done in Python.
Redis provisioning is done with my Ansible role
that does exactly the same steps as in official Redis Dockerfile – it creates
redis
user and group, it downloads source tarball, disables protected mode,
compile it and do the afterbuild cleanup. Check out the details
on Github.
Finally, we do the container cleanup by removing Python and cleaning up package management stuff.
There are only 2 things left – gosu and docker-entrypoint.sh files. These files along with Packer config and Ansible role are available at my redis-packer Github repo
Finally, all we do is launch it like this
$GOPATH/bin/packer build redis.json
You can see example output in this gist
In the end, we got an image that is even a bit smaller than official:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/alexdzyoba/redis-packer latest 05c7aebe901b 3 minutes ago 98.9 MB
docker.io/redis 3.2 d3f696a9f230 4 weeks ago 99.7 MB
Of course, my solution has its own drawbacks. First, you have to learn new tools – Packer and Ansible. But I strongly advise for learning Ansible, because you’ll need it for other kinds of automation in your projects. And you DO automate your tasks, right?
The second drawback is that now container building is more involved with all the packer config, ansible roles and playbooks and stuff. Counting by the lines of code there are 174 lines now
$ (find alexdzyoba.redis -type f -name '*.yml' -exec cat {} \; && cat redis.json provision.yml) | wc -l
174
While originally it was only 77:
$ wc -l Dockerfile
77 Dockerfile
And again I would advise you to go this path because:
packer build redis.json
command to produce ready and tagged image.redis_version
and redis_download_sha
variables. No new Dockerfile needed.So that’s my Docker image building setup for now. It works well for me and I kinda enjoy the process now. I would also like to look at Ansible Container again but that will be another post, so stay tuned – this blog has Atom feed and I also post on twitter @AlexDzyoba