Let's code a reusable workflow for building state-of-the-art, multi-platform Docker images with GitHub Actions ✨
I wrote a reusable GitHub Action workflow for one of my pet projects, rickroller. The workflow itself is very small and uses mostly well-known Actions, but encompasses a lot of complex? ideas and logic.
So let's go together through all the steps required to create a robust and reusable workflow to build state-of-the-art, multi-architecture images with GitHub Actions.
The full workflow is available here: github.com/derlin/rickroller/blob/main/.git... Use this link to consult the exact version at the time of writing.
As rickroller is a pretext to play with Google Cloud Run, GitHub actions and to try Open Source best practices, you can find a lot of details in the README about GitHub Actions, but not only. Feel free to have a look and leave a ⭐ !
NOTE: I assume you have a basic knowledge of GitHub Actions and Docker.
The context and goal
Being in 2022, my python project is deployed using a Docker image. This image is built and (pushed to a registry) in multiple "processes":
during a PR - so I can test my container (tag like
pr-4
),when there is a new push to main - so people can preview new features (pulling the image
latest
ormain-eeffaa2
),and from a release (with tag
v1.0.0
).
It thus makes sense to create one reusable workflow, that can be called in different CI contexts.
Having a Mac M1 myself, it is important users with AMD or ARM processors can run rickroller. It must be available for multiple architectures.
Speed and security are also important topics: the Dockerfile should be scanned for vulnerabilities, and the workflow should avoid building the same layer over and over when it can actually just build and cache it once.
To sum it up, I want:
a reusable workflow,
that builds a multi-arch image (at least AMD64 and ARM64),
with caching to speed up the process,
running some security scan,
able to push to GitHub Registry,
with meaningful image tags (
latest
,pr-xxxx
,main-ef34221
vX.X.X
) and annotations
Let's get started!
The basics: a reusable workflow
GitHub introduced reusable workflows and composite actions about a year ago as an attempt to enhance reusability and DRY-ness (Don't Repeat Yourself).
In short, composite actions let you create actions that are calling other actions. They are still actions, that need to be called from within a series of steps.
A reusable workflow - or callable workflow is instead a complete job (with checkout, etc), that runs in a specific runner. It is less flexible, as you cannot add a step before or after it (you can, however, define another job that generates its inputs or depends on its output).
The documentation is very well written, so I won't go too much into the details. In short, to create a reusable workflow, you simply set its triggers to on: workflow_call
. You can then define inputs, secrets, etc. that can be later referenced using ${{ inputs.some_input }}
or ${{ secrets.some_secret }}
:
name: Reusable Workflow Example
on:
workflow_call:
inputs:
some_input:
description: Just an example input
type: boolean
required: false
default: false
secrets:
#...
jobs:
my_job:
name: Do Something
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
# ...
For my reusable workflow, I have 2 inputs:
publish
(boolean): whether to publish the image (to GitHub Registry),version
(string): the version being released (empty if not called from a release workflow).
The steps
The GitHub community has all the Actions I need already made up to perform the painful tasks of setting up the environment, building, and pushing. The only difficulty is finding out which ones to use, and how to configure them.
Extracting Labels and Tags
No need to manually and painfully determine the proper labels and tags, as an Action is already available for this task: docker/metadata-action - v3
at the time of writing.
Labels
Docker image labels are a way to attach (any) key-value metadata to an image. These metadata are not available to the running container but are rather used to share information like where the source code for the image resides, who supports it, or what CI build/codebase version generated it.
It is common to use the OCI - Open Container Initiative - set of standardized labels, all prefixed with org.opencontainers.image.*
. The most common being .source
, .version
, and .revision
.
The docker/metadata-action
automatically extracts OCI Labels based on the commit and the repository's metadata. Here is what it computes for my rickroller repository:
{
"org.opencontainers.image.created": "2022-04-15T06:13:02.269Z",
"org.opencontainers.image.description": "A simple Python app to test GitHub CI",
"org.opencontainers.image.licenses": "",
"org.opencontainers.image.revision": "83592e8567ee6bfb01caccae5e791ab38b5ee7e0",
"org.opencontainers.image.source": "https://github.com/derlin/rickroller",
"org.opencontainers.image.title": "rickroller",
"org.opencontainers.image.url": "https://github.com/derlin/rickroller",
"org.opencontainers.image.version": "main-83592e8"
}
Tags
Before building, we need to determine the ID(s) of the images, for example, ghcr.io/derlin/rickroller:latest
. There are three parts: ghcr.io
, which matches the name of the registry you push the image to, derlin/rickroller
, the user+ID of your project (usually static), and finally the tag or version, in this case, latest
.
Ideally, tags should be meaningful. Images from PRs should be versioned following e.g. the format pr-{id}
, images from branch main something like main-{sha (short)}
, and releases vX.X.X
. The version latest
is known as a moving tag, as it points to a different image (usually from the latest successful build on the main branch) as time goes on.
That's a lot of combinations and ifs... Fortunately, docker/metadata-action can do everything, provided the right set of inputs.
Here is my configuration for rickroller:
- name: Extract Docker metadata
id: meta
uses: docker/metadata-action@v3
with:
images: ghcr.io/${{ github.repository }}
tags: |
type=semver,pattern={{version}},value=${{ inputs.version }}
type=semver,pattern={{major}}.{{minor}},value=${{ inputs.version }}
type=semver,pattern={{major}},value=${{ inputs.version }}
type=ref,event=branch,suffix=-{{ sha }}
type=ref,event=pr
type=raw,value=latest,enable=${{ github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/tags/') }}
flavor: |
latest=false
Let's break it down. First, I give an id
to the step, so I can refer to its output later in the workflow.
Next, I provide a list of image names, without the tag. Since I only push to GitHub Registry (ghcr.io), I can use ghcr.io/${{ github.repository }}
which will be resolved as ghcr.io/derlin/rickroller
.
The magic is in the tags
input. The Action supports a list of "conditions" that will be evaluated to generate the right tags depending on things like branches, triggers, etc.
In this specific case, the first lines starting with type=semver
are used when a version
is passed to the workflow (e.g. from a release). The version input must have the form X.Y.Z
- or more accurately {major}.{minor}.{patch}
(semantic versioning). With those 3 lines, the Action returns the Docker tags X.Y.Z
, X.Y
, and X
when the value=
isn't evaluated to an empty string (i.e. when the version
input is provided).
Next, there are the type=ref,event={event}
lines to conditionally get tags based on the trigger of the workflow (event=
). If this is from a pr
the Action generates the tag pr-{number}
. If it is from a branch, it generates {branch_name}
. Since I want to have the branch name suffixed with a unique id, I also specify suffix=-{{ sha }}
. The {{ sha }}
will be evaluated by the Action to the short SHA of the git commit (git rev-parse --short
), and appended to the branch name (for example main-eeffaa3
).
Finally, to have the latest
tag set only for the main
branch and a release, I disable the default "latest flavor" (which always adds the tag latest
) and provide instead the expression:
type=raw,value=latest,enable=${{ github.ref ==
'refs/heads/main' || startsWith(github.ref, 'refs/tags/') }}
This line says "use the raw value "latest", but only if the condition enable=
evaluates to true".
As you can see, it is quite powerful!
Output
The metadata Action returns the metadata and tags in multiple formats, but what is important is that I can later reference them using:
# meta is the id we gave to the metadata action
${{ steps.meta.outputs.tags }}
${{ steps.meta.outputs.labels }}
Scanning the Dockerfile for vulnerabilities
In production settings, the full Docker images should be scanned with state-of-the-art tools such as sysdig, Docker scan, snyk, or the like. Those tools scan the built image itself and are thus able to detect vulnerabilities not only in your Dockerfile, but in all its above layers, as well as in the softwares your Dockerfile brings in.
There are plenty to choose from, but none of them is completely free. You always need an account, a specific setup, and are often limited in the number of scans or repositories.
So in this workflow, I chose to limit myself to a "linter", checkov. Feel free to augment this example with a real scan!
Checkov supports the evaluation of policies on your Dockerfile files. [...] it will validate if the file is compliant with Docker best practices such as not using root user, making sure health check exists and not exposing SSH port.
The full list of Dockerfile policies it checks can be found here.
- name: Lint Dockerfile using Checkov
id: checkov
uses: bridgecrewio/checkov-action@master
with:
directory: .
framework: dockerfile # only ask for dockerfile scans
quiet: true # show only failed checks
container_user: 1000 # UID to run the container under
# to prevent permission issues
Checkov will parse the Dockerfile, and report the results:
Passed checks: 9, Failed checks: 0, Skipped checks: 0
...
Check: CKV_DOCKER_3: "Ensure that a user for the container has been created"
PASSED for resource: Dockerfile.USER
File: Dockerfile:49-49
Guide: https://docs.bridgecrew.io/docs/ensure-that-a-user-for-the-container-has-been-created
...
If you have python installed, it is possible to run the same check locally:
pip install checkov
checkov --framework dockerfile -f Dockerfile
Building and pushing a multi-arch image
Logging into the registry
To be able to push to a registry, I need to log in first. By default, the workflow inherits the GITHUB_TOKEN
, so logging to ghcr.io is as easy as calling:
- name: Login to Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
Setting up QEMU and buildx
QEMU lets you "run operating systems for any machine, on any supported architecture", while buildx is "a Docker CLI plugin for extended build capabilities with BuildKit". Both are necessary to build Docker images targeting another platform/architecture.
Setting them up is again available through simple Actions:
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
To learn how to use buildx locally to build multi-platform images, have a look at the docs and docker.com's blog post: How to Rapidly Build Multi-Architecture Images with Buildx.
Building and pushing the image (finally)
We finally have all the bricks in place to build and push a multi-arch Docker image using the docker/build-push-action:
- name: Build and push Docker image
uses: docker/build-push-action@v3
with:
# the Dockerfile is at the root of the workspace
context: .
# Build for AMD and ARM (requires buildx+qemu)
platforms: linux/amd64,linux/arm64
# only push when requested
push: ${{ inputs.publish }}
# pass the output of the metadata action
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
# use layer caching
# The mode=max is to also cache the builder image
# (vs only the final image - mode: min)
cache-from: type=gha
cache-to: type=gha,mode=max
(for cache-from
and cache-to
, keep reading !)
Layer caching
The only thing I haven't talked about is Docker layer caching (DLC), which is a great feature when building Docker images as a regular part of the CI process. The idea is to cache the individual layers of Docker images built in CI jobs, and then reuse unchanged image layers on subsequent runs, rather than rebuilding the entire image from scratch every time.
This caching mechanism is a given when building Docker images locally (see Docker's documentation - leverage build cache). However, in CI, a new runner is started each time, so the cache is always empty.
The build-push-action from Docker supports multiple types of caches. In this workflow (see precedent section), I use the GitHub cache (gha
). It is rather straightforward to turn on. Simply set the cache-from
and cache-to
parameters:
- uses: docker/build-push-action@v3
with:
# ...
cache-from: type=gha
cache-to: type=gha,mode=max
One important detail is the mode=max
, which instructs the Action to cache all layers, and not only the ones from the final image. It is very important if the Dockerfile is using multi-stage builds: without it, the layers from the builder image are ignored.
At the time of writing, GHA is limited to 10G, and isn't shared between branches.
Quick reminder: if a top layer changes, all subsequent layers will change as well, so devise your Dockerfiles wisely!
Calling the workflow
To call this reusable workflow from the same repository, I simply use:
name: ...
jobs:
# ... other jobs ? ...
docker:
uses: ./.github/workflows/reusable_docker-build-and-push.yaml
with:
publish: true
# ... other inputs ...
it is also possible to call it from another repository, in which case the repository path must be provided:
uses: derlin/rickroller/.github/workflows/reusable_docker-build-and-push.yaml
We did it !
Wow. What a journey.
The full workflow as well as usages are available in my repo: {% embed github.com/derlin/rickroller %}.
I hope you learned something, happy coding ✨
BONUS: optionally publish the Image to Docker Hub
As a bonus, I wanted to be able to also push to Docker Hub, but only in specific situations (e.g. from a release, but not from a PR build) and with a different set of tags (no main-{sha}
).
This conditionality was a bummer, as it is not supported by default. After playing around a bit, I designed a hack to implement this "if" without duplicating the whole workflow.
If you are interested, have a look at the updated workflow: https://github.com/derlin/rickroller/blob/e67546a90218c8300676c1270699cccdb5f7e053/.github/workflows/reusable_docker-build-and-push.yaml and let me know in the comments if you would like a post about it!