DevSecOps for ML

Go distroless and reduce the image size as well as the number of CVEs for machine learning containers.

Smaller container images suit the lean data and machine learning operations philosophy, but there is another reason why going distroless makes sense: fewer common vulnerabilities and exposures (CVEs). That leaves two questions:

  • How can you scan for vulnerabilities?
  • How do you create a distroless base image for Python-based frameworks?

CVE Scanning

There are several vulnerability scanners for containers, but for the purposes of this post I shall use Anchore Grype, which is FOSS and comes highly recommended.

In what follows, I shall refer to the variables IMAGE and DISTROLESS_IMAGE, the standard base image and the distroless version of it, respectively. I shall use TensorFlow 2.4.1 as an example, but the same techniques also work for other machine learning frameworks, including of course PyTorch. Why 2.4.1? As of this writing, 2.4.1 is the latest stable release of TensorFlow.

VERSION="2.4.1"
IMAGE="tensorflow/tensorflow:$VERSION"
DISTROLESS_IMAGE="databaseline/tensorflow-cpu:$VERSION"

Docker

Docker comes with its own scanner. To see it in action, we can run it against the official TensorFlow Docker image that is 1.57 GB:

docker scan "$IMAGE"

When I ran it, Docker uncovered 73 vulnerabilities, although the only high-severity one was in OpenSSL:

✗ High severity vulnerability found in openssl/libssl1.1
  Description: NULL Pointer Dereference
  Info: https://snyk.io/vuln/SNYK-UBUNTU1804-OPENSSL-1089073
  Introduced through: meta-common-packages@meta, python-pip/python3-pip@9.0.1-2.3~ubuntu1.18.04.4
  From: meta-common-packages@meta > openssl/libssl1.1@1.1.1-1ubuntu2.1~18.04.7
  From: python-pip/python3-pip@9.0.1-2.3~ubuntu1.18.04.4 > ca-certificates@20201027ubuntu0.18.04.1 > openssl@1.1.1-1ubuntu2.1~18.04.7
  Fixed in: 1.1.1-1ubuntu2.1~18.04.9

Grype

Grype scans an image with the following command:

grype "$IMAGE"

It found 275 vulnerabilities, with the following 15 high-severity vulnerabilities:

NAME             INSTALLED                  FIXED-IN                  VULNERABILITY 
cryptography     2.1.4                      2.3                       GHSA-fcf9-3qw3-gxmj
flatbuffers      1.12                                                 CVE-2020-35864
libssl1.1        1.1.1-1ubuntu2.1~18.04.7   1.1.1-1ubuntu2.1~18.04.9  CVE-2021-3449
linux-libc-dev   4.15.0-134.138             4.15.0-139.143            CVE-2021-27365
linux-libc-dev   4.15.0-134.138             4.15.0-140.144            CVE-2020-27170
linux-libc-dev   4.15.0-134.138             4.15.0-140.144            CVE-2020-27171
openssl          1.1.1-1ubuntu2.1~18.04.7   1.1.1-1ubuntu2.1~18.04.9  CVE-2021-3449
pip              20.2.4                                               CVE-2018-20225
pip              9.0.1                                                CVE-2018-20225
pip              9.0.1                                                CVE-2019-20916
protobuf         3.14.0                                               CVE-2015-5237
pycrypto         2.6.1                                                CVE-2018-6594
pyxdg            0.25                                                 CVE-2019-12761
pyxdg            0.25                       0.26                      GHSA-r6v3-hpxj-r8rv
urllib3          1.26.2                     1.26.4                    GHSA-5phf-pp7p-vc2r

CVE-2021-3449 is the same vulnerability discovered by docker scan. Note that there were no critical vulnerabilities found.

Distroless Base Image for ML

Since distroless images do not have an operating system, a multi-stage build is needed to generate artifacts needed in one stage, upon which they are copied to the distroless base image in a subsequent stage.

In the Dockerfile shown below, there is a requirements.txt file. It allows any dependencies to be included, as you cannot pip install into the distroless image afterwards. Here, the requirements file only contains a single line: tensorflow-cpu==2.4.1.

FROM python:3.7-slim AS py

WORKDIR /app
COPY requirements.txt requirements.txt

RUN  python3 -m pip install --no-cache-dir --upgrade pip && \
     python3 -m pip install --no-cache-dir -r requirements.txt

FROM gcr.io/distroless/python3:nonroot
COPY --from=py /usr/local/lib/python3.7/site-packages /site-packages

ENV PYTHONPATH=/site-packages
ENV LANG C.UTF-8

ENTRYPOINT ["/usr/bin/python3"]

For the first stage (py), the official Python 3.7 image is a sensible choice. Since we have no need of various packages that come with Debian, the ‘slim’ edition is fine. Both Debian and Ubuntu are good defaults, in case you want to rely on your own base Python image. The only reason I pick a leaner base image is that it downloads faster and comes with Python and pip; Alpine can be used too, but Python would have to be installed with apk add --update python3 first. The size of the base image for the first stage is irrelevant: we copy the site packages into the distroless container image, so whatever we use before that step disappears after the second stage.

There is nothing inherently specific in the Dockerfile with regard to TensorFlow. It follows a generic pattern for Python-based frameworks and libraries.

Please note that a non-root base image is used to avoid running the container as a privileged user.

To build the distroless image, execute:

docker build -t "$DISTROLESS_IMAGE" .

If you want a GPU-ready distroless base image, you have to also copy drivers and system libraries between stages.

The distroless image weighs 756MB, a 52% decrease in size. What about vulnerabilities?

CVEs in the Distroless Image

With grype "$DISTROLESS_IMAGE", we see only 3 high-severity vulnerabilities, down from 15:

NAME         INSTALLED  FIXED-IN  VULNERABILITY
flatbuffers  1.12                 CVE-2020-35864
pip          21.0.1               CVE-2018-20225
protobuf     3.15.8               CVE-2015-5237

All in all, an improvement of more than 50% in the image size and 80% in high-severity vulnerabilities with an overall reduction of 99% in all vulnerabilities. Note that no known critical vulnerabilities were uncovered in any of the scans.

It is important to note that not all vulnerabilities are created equal. For example, the pip vulnerability is disputed, as it describes expected behaviour. Similarly, the protobuf issue may have already been fixed in 3.4.0.

DevSecOps

With the pattern shown, it is easy to build distroless base images for machine learning applications. To ensure each base image’s Dockerfile is not only consistent but also compliant with best practices, a linter, such as hadolint, is recommended:

hadolint --ignore DL3013 Dockerfile

Thanks to CLI tools such as docker scan and grype, scanning for vulnerabilities is a breeze.

That covers base images, but what about machine learning code? The aforementioned Dockerfile allows you to install any dependencies needed for the model itself without modification, since that is covered by requirements.txt. All that is left is the model code itself, which can be copied into the distroless image. That code can be scanned separately with Bandit, if need be.

These steps ought to be be automated in a standardized D/MLOps process that ensures what ships to production follows security best practices. With templatable Dockerfiles building many distroless images for machine learning becomes manageable.