Best practices for building docker container

3 min readJun 24, 2023

Here I am trying to document different best practices that I like to keep it handy while building docker containers.

Use secrets instead of build-arg

While handling sensitive data, let’s use --secret provided by latest `BuildKit` build system

export MYSECRET=secretpassword
docker build --secret id=MYSECRET .

Using secret inside docker:

FROM ...
COPY build-script.sh .
RUN --mount=type=secret,id=MYSECRET ./build-script.sh

The build-script.sh will be able to find the secret as a file in path /run/secrets/MYSECRET.

Combine run commands

Each RUN command results in a layer. In general, it is always a good practice to combine multiple RUN into a single RUN command. There are edge cases — Multiple RUN vs. single chained RUN in Dockerfile, which is better?

Pull out long RUN commands to shell script

However if we have a long list of commands and if something goes wrong in between, debugging is a problem.

ARG python_version=3.10
FROM python:${python_version}-slim

# Move multi run steps to logical shell script outside
COPY --chown=root:root --chmod=744 install-packages.sh .
# All installation is now in a single docker layer
RUN ./install-packages.sh

Inside the shell script, make them easier to debug:

Bash “strict mode”, to help catch problems and bugs in the shell script. More details.
Meaningful echo statements would help view the progress and spot the issue. In a normal dockerfile with a long RUN command, it is hard to spot which package installation failed
Meaningful comments for developers to improve the script further

#!/bin/bash

# strict mode
set -euo pipefail

export DEBIAN_FRONTEND=noninteractive
apt-get update

# security updates
apt-get -y upgrade

# --no-install-recommends flag reduces unnecessary transitive dependencies
apt-get -y install --no-install-recommends syslog-ng

# Install a new package, without unnecessary recommended packages:
apt-get -qq -y install --no-install-recommends \
    default-libmysqlclient-dev libpq-dev build-essential \
    tini libc6 libnss3 procps net-tools \
    bash curl wget git expect \
    libssl-dev libffi-dev 

apt-get -qq -y install --no-install-recommends \
    openjdk-17-jdk-headless
echo 'Installed JDK and system dependencies'


# Installing Apache Spark
export SPARK_HOME=/opt/spark
wget -qO - https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3-scala2.13.tgz | \
    tar -xzf - -C ${SPARK_HOME} --strip-component 1
cp /opt/log4j2.properties ${SPARK_HOME}/conf/log4j2.properties
rm -rf ${SPARK_HOME}/examples
rm -rf ${SPARK_HOME}/data
cp ${SPARK_HOME}/kubernetes/dockerfiles/spark/decom.sh /opt/
echo 'Installed Apache Spark'

# Delete cached files we don't need anymore (note that if you're
# using official Docker images for Debian or Ubuntu, this happens
# automatically, you don't need to do it yourself):
apt-get clean
# Delete index files we don't need anymore:
rm -rf /var/lib/apt/lists/*

BuildKit progress output

By default buildKit progress is shown in a modern tty mode. This is nice but when debugging issues, it is hard to know what went wrong.

For debugging., we could set following :

export BUILDKIT_PROGRESS=plain
docker build ....
# now we can view more detailed progress and debug issues easily

Use buildKit cache feature

# syntax = docker/dockerfile:1.3
FROM python:3.11-slim-buster
COPY requirements.txt .
RUN --mount=type=cache,target=/root/.cache \
    pip install -r requirements.txt

Pipenv and Poetry will store cache in /root/.cache directory
pip uses ~/.cache/pip
This approach caches a directory across builds
This approach needs BuildKit enabled

Using scratch & multistage build

FROM your_image as initial

FROM scratch

COPY --from=initial / /

The above steps copies everything to final image.
Doesn’t maintain all docker layers to minimize final docker image.