Best practices for building docker container
Here I am trying to document different best practices that I like to keep it handy while building docker containers.
Use secrets instead of build-arg
While handling sensitive data, let’s use --secret
provided by latest `BuildKit` build system
export MYSECRET=secretpassword
docker build --secret id=MYSECRET .
Using secret inside docker:
FROM ...
COPY build-script.sh .
RUN --mount=type=secret,id=MYSECRET ./build-script.sh
The build-script.sh
will be able to find the secret as a file in path /run/secrets/MYSECRET
.
Combine run commands
Each RUN command results in a layer. In general, it is always a good practice to combine multiple RUN into a single RUN command. There are edge cases — Multiple RUN vs. single chained RUN in Dockerfile, which is better?
Pull out long RUN commands to shell script
However if we have a long list of commands and if something goes wrong in between, debugging is a problem.
ARG python_version=3.10
FROM python:${python_version}-slim
# Move multi run steps to logical shell script outside
COPY --chown=root:root --chmod=744 install-packages.sh .
# All installation is now in a single docker layer
RUN ./install-packages.sh
Inside the shell script, make them easier to debug:
- Bash “strict mode”, to help catch problems and bugs in the shell script. More details.
- Meaningful echo statements would help view the progress and spot the issue. In a normal dockerfile with a long RUN command, it is hard to spot which package installation failed
- Meaningful comments for developers to improve the script further
#!/bin/bash
# strict mode
set -euo pipefail
export DEBIAN_FRONTEND=noninteractive
apt-get update
# security updates
apt-get -y upgrade
# --no-install-recommends flag reduces unnecessary transitive dependencies
apt-get -y install --no-install-recommends syslog-ng
# Install a new package, without unnecessary recommended packages:
apt-get -qq -y install --no-install-recommends \
default-libmysqlclient-dev libpq-dev build-essential \
tini libc6 libnss3 procps net-tools \
bash curl wget git expect \
libssl-dev libffi-dev
apt-get -qq -y install --no-install-recommends \
openjdk-17-jdk-headless
echo 'Installed JDK and system dependencies'
# Installing Apache Spark
export SPARK_HOME=/opt/spark
wget -qO - https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3-scala2.13.tgz | \
tar -xzf - -C ${SPARK_HOME} --strip-component 1
cp /opt/log4j2.properties ${SPARK_HOME}/conf/log4j2.properties
rm -rf ${SPARK_HOME}/examples
rm -rf ${SPARK_HOME}/data
cp ${SPARK_HOME}/kubernetes/dockerfiles/spark/decom.sh /opt/
echo 'Installed Apache Spark'
# Delete cached files we don't need anymore (note that if you're
# using official Docker images for Debian or Ubuntu, this happens
# automatically, you don't need to do it yourself):
apt-get clean
# Delete index files we don't need anymore:
rm -rf /var/lib/apt/lists/*
BuildKit progress output
By default buildKit progress is shown in a modern tty
mode. This is nice but when debugging issues, it is hard to know what went wrong.
For debugging., we could set following :
export BUILDKIT_PROGRESS=plain
docker build ....
# now we can view more detailed progress and debug issues easily
Use buildKit cache feature
# syntax = docker/dockerfile:1.3
FROM python:3.11-slim-buster
COPY requirements.txt .
RUN --mount=type=cache,target=/root/.cache \
pip install -r requirements.txt
- Pipenv and Poetry will store cache in
/root/.cache
directory pip
uses~/.cache/pip
- This approach caches a directory across builds
- This approach needs BuildKit enabled
Using scratch & multistage build
FROM your_image as initial
FROM scratch
COPY --from=initial / /
- The above steps copies everything to final image.
- Doesn’t maintain all docker layers to minimize final docker image.