Deploying conda environments in (Docker) containers - how to do it right

Deploying conda environments inside a container looks like a straight-forward conda install. But with a bit more love for details, you can optimise the process so that the build is faster and the resulting container much smaller.

For optimising conda-environments-in-docker Jim Crist-Harif’s “Smaller Docker images with Conda” blog post has been the go-to resource for a long time. It is still as valid nowadays as it was back then. Due to changes in the default content of conda-forge packages and specific optimisation of some particular heavy packages, it will though yield fewer savings nowadays.

Over the course of time, we have also developed new techniques that help trim down the sizes of Docker containers even further. In this article, I though want to start at the beginning again and have a look at the most obvious way one would put a conda environment into a Docker container. We will look at all the flaws this approach brings with it and optimise the workflow bit-for-bit until we reach a near-perfect container.

Starting with just a few lines

There are several simple but not so effective approaches to put conda environments into a container. I have chosen a particularly bad one here to explain some more things. While I have seen this approach in the wild, most of the ones I have seen were already using the basic things I will improve in the first sections.

As an example, I have built a really simple machine learning model using scikit-learn and LightGBM that predicts a fare price for a taxi trip in New York City. The model is served via FastAPI and is already using the best practices like uvicorn there.

In our initial setting, we have checked out the git repository and are running all commands from the repository root.

FROM continuumio/anaconda3:2020.11

COPY . .
RUN conda env create
RUN conda run -n nyc-taxi-fare-prediction-deployment-example \
  python -m pip install --no-deps -e .
CMD [ \
  "conda", "run", "-n", "nyc-taxi-fare-prediction-deployment-example", \
  "gunicorn", "-k", "uvicorn.workers.UvicornWorker", "nyc_taxi_fare.serve:app" \
]

Building this image using make anaconda in the repository yields a container with the whopping size of 4.48GB.

% docker image ls nyc-taxi-anaconda
REPOSITORY          TAG       IMAGE ID       CREATED          SIZE
nyc-taxi-anaconda   latest    ebf2b267e4d8   30 seconds ago   4.48GB

The first thing we look at is “Could you make it even worse?”. A typical issue I see here is that people develop on their local machine without keeping track of their environment specification and simply export their list of installed packages using conda env export. I have posted a gist of my local environment but the important bits why this is problematic can be seen in:

name: nyc-taxi-fare-prediction-deployment-example
channels:
  - conda-forge
dependencies:
  
  - libcurl=7.71.1=h9bf37e3_8
  - libcxx=11.0.1=habf9029_0
  - libedit=3.1.20191231=hed1e85f_2
  

It suffices actually to only look at the libcxx=11.0.1=habf9029_0 line. This shows two issues that come with a local export on an operating system than your deployment operating system. The file was generated on macOS while the Docker container is a Linux system. The last bit of the version habf9029_0 is the build number/string/hash. This is different for different builds but the same version of a package. Most importantly it is different between macOS and Linux when you have compiled code inside the package. Thus conda won’t be able to find a package with this build string for Linux. We could avoid this issue by using --no-builds on the export but will directly face the next issue that native dependencies are different between operating systems. In this case, libcxx is the C++ library used for builds done with the clang compiler. This is only used on macOS in conda-forge and thus not necessary for Linux environments. While in this case, it can be installed on Linux but sometimes you also have packages that are only available on one OS and then conda wouldn’t be able to find an installable version if after you have removed the build string.

To avoid these issues without keeping track of what you have installed, you can ask conda with conda env export --from-history to export an environment file with only the package specifications you have explicitly requested on the command line. This tiny bit of RTFM is my most popular tweet to date. In this case, we would get the committed environment.yml.

Use a smaller base image

After having built a base image and already outlined another mistake you can run into, let’s improve the image a bit. For that we look at the individual sizes of the container layers:

% docker history nyc-taxi-anaconda
IMAGE          CREATED          CREATED BY                                      SIZE      COMMENT
ebf2b267e4d8   18 minutes ago   /bin/sh -c #(nop)  CMD ["conda" "run" "-n" "…   0B
a113c9d31b1a   18 minutes ago   /bin/sh -c conda run -n nyc-taxi-fare-predic…   1.05MB
04c85327c969   18 minutes ago   /bin/sh -c conda env create                     1.5GB
a749aca38f88   22 minutes ago   /bin/sh -c #(nop) COPY dir:461eb7a23d1121123…   274MB
5e5dd010ead8   2 months ago     /bin/sh -c #(nop)  CMD ["/bin/bash"]            0B
<missing>      2 months ago     /bin/sh -c wget --quiet https://repo.anacond…   2.43GB
<missing>      2 months ago     /bin/sh -c apt-get update --fix-missing &&  …   211MB
<missing>      2 months ago     /bin/sh -c #(nop)  ENV PATH=/opt/conda/bin:/…   0B
<missing>      2 months ago     /bin/sh -c #(nop)  ENV LANG=C.UTF-8 LC_ALL=C…   0B
<missing>      2 months ago     /bin/sh -c #(nop)  CMD ["bash"]                 0B
<missing>      2 months ago     /bin/sh -c #(nop) ADD file:d2abb0e4e7ac17737…   69.2MB

We see several large layers here. The ones created 2 months ago are part of the continuumio/anaconda3 base image. This base image alone is 2.71GB in size. This is due to the fact that it already brings the whole anaconda environment. This contains a vast set of scientific Python packages in it.

% docker image ls continuumio/anaconda3
REPOSITORY              TAG       IMAGE ID       CREATED        SIZE
continuumio/anaconda3   2020.11   5e5dd010ead8   2 months ago   2.71GB

Instead, we only need an image that contains an environment that can execute conda and the other basic things we need in a container build. For that you have the following three options:

We will continue here using conda-forge/mambaforge:4.9.2-5 and we will also use mamba instead of conda to create the environment as this is simply faster. This already brings our final image down to 2.13GB. Looking at the docker history output, we see that the environment creation layer is by far the largest one:

% docker history nyc-taxi-mambaforge                                                                                            :(
IMAGE          CREATED         CREATED BY                                      SIZE      COMMENT
e823c8c825c7   2 minutes ago   /bin/sh -c #(nop)  CMD ["conda" "run" "-n" "…   0B
61e5273a176f   2 minutes ago   /bin/sh -c conda run -n nyc-taxi-fare-predic…   1.04MB
5e1acb2d4a11   2 minutes ago   /bin/sh -c mamba env create                     1.45GB
07e8fc9e0576   4 minutes ago   /bin/sh -c #(nop) COPY dir:32d87108966cf6795…   274MB
73fe058d8ee5   2 hours ago     CMD ["/bin/bash"]                               0B        buildkit.dockerfile.v0
<missing>      2 hours ago     ENTRYPOINT ["tini" "--"]                        0B        buildkit.dockerfile.v0
<missing>      2 hours ago     RUN |3 MINIFORGE_NAME=Mambaforge MINIFORGE_V…   338MB     buildkit.dockerfile.v0
<missing>      2 hours ago     ENV PATH=/opt/conda/bin:/usr/local/sbin:/usr…   0B        buildkit.dockerfile.v0
<missing>      2 hours ago     ENV LANG=C.UTF-8 LC_ALL=C.UTF-8                 0B        buildkit.dockerfile.v0
<missing>      2 hours ago     ENV CONDA_DIR=/opt/conda                        0B        buildkit.dockerfile.v0
<missing>      2 hours ago     ARG TINI_VERSION=v0.18.0                        0B        buildkit.dockerfile.v0
<missing>      2 hours ago     ARG MINIFORGE_VERSION=4.9.2-5                   0B        buildkit.dockerfile.v0
<missing>      2 hours ago     ARG MINIFORGE_NAME=Miniforge3                   0B        buildkit.dockerfile.v0
<missing>      3 weeks ago     /bin/sh -c #(nop)  CMD ["/bin/bash"]            0B
<missing>      3 weeks ago     /bin/sh -c mkdir -p /run/systemd && echo 'do…   7B
<missing>      3 weeks ago     /bin/sh -c [ -z "$(apt-get indextargets)" ]     0B
<missing>      3 weeks ago     /bin/sh -c set -xe   && echo '#!/bin/sh' > /…   811B
<missing>      3 weeks ago     /bin/sh -c #(nop) ADD file:2a90223d9f00d31e3…   72.9MB

Removing unnecessary files

In the above layers, we can see that we copy 274MB of local context into the container. Instead, we only want to have the code to run the model inside the container. Thus we should exclude larger files that are not needed by the .dockerignore. With the example project, we ignore the data/ folder that contains 258MB training data and .mypy_cache that is also 3MB.

Additionally, we currently download the conda packages during the env create step and then extract them into the environment. The package tarballs are though saved in conda’s cache and thus also in the image. This is something we don’t want and thus we should also remove them using conda clean -afy.

With the above two changes we are now at an overall size of 1.31GB for the image.

Only install packages used for predict

The environment.yml we use to build the Docker container is the specification of our development environment and contains a vast set of packages that we don’t need during runtime. It would be nice if we could mark some of those as not-necessary-for-prod but the functionality is not yet available.

Instead, there two approaches you can take:

  1. Build a conda package out of your project. The run(time) dependencies of the conda package is only what you need at prediction time.
  2. Manually specify your runtime environment in a separate predict-environment.yml.

Personally, I prefer to use option 1 as this enables one to use the package also in various other conda environments. Though I can understand to have a container that directly embeds the code instead of going through the indirection of a package build.

Thus we create a predict-environment.yml with just the things we need for serving:

name: nyc-taxi-fare-prediction-deployment-example
channels:
  - conda-forge
  - nodefaults
dependencies:
  - pip
  - click
  - fastapi
  - gunicorn
  - lightgbm
  - pandas
  - scikit-learn
  - setuptools_scm
  - uvicorn

Instead of using that as an input for the container build, we are using conda-lock to render the requirements into locked pinnings for different architectures. This enables us to have a consistent environment to make developer as well as production environments reproducible. Another benefit of using conda-lock is that you can generate the lockfile for any platform from any other platform. This means we can develop on macOS and have production systems on Linux but still generate the lockfiles for all of them on either systems. We can do this using the following commands:

conda lock -p osx-64 -p osx-arm64 -p linux-64 -p linux-aarch64 -f environment.yml
conda lock -p osx-64 -p osx-arm64 -p linux-64 -p linux-aarch64 -f predict-environment.yml --filename-template 'predict-{platform}.lock'

I’m pinning here for Intel and Arm64 architectures on macOS and Linux at the same time as these are the typical environments I run my code in. If you don’t have any macOS or ARM64 systems, feel free to omit those.

In the Dockerfile, we now load the lock file first, install the dependencies and add the application code only afterwards. This enables us to cache the environment and not rebuild it on every source code change.

COPY predict-linux-64.lock .
RUN mamba create --name nyc-taxi-fare-prediction-deployment-example --file predict-linux-64.lock && \ 
    conda clean -afy

With the now reduced set of dependencies, we now get the overall container down to 851MB in size where the conda environment with 438MB accounts for roughly half the size of the container. The remaining share of the docker container is the base conda and the base Ubuntu installation.

Distroless container

As conda environments are not only Python environments but actually include all dependencies of any language, they need neither the Ubuntu distribution nor the conda installation to run the code. The only outside dependency is the libc.

Thus we only need to have a docker container that ships with a libc (and its dependencies) and nothing else. Such containers are provided by the https://github.com/GoogleContainerTools/distroless project. To build an image using these containers, we are going to use multi-stage builds. First, we build the environment in a mambaforge container and then copy it into the distroless one:

# Container for building the environment
FROM condaforge/mambaforge:4.9.2-5 as conda

COPY predict-linux-64.lock .
RUN mamba create --copy -p /env --file predict-linux-64.lock && conda clean -afy
COPY . /pkg
RUN conda run -p /env python -m pip install --no-deps /pkg

# Distroless for execution
FROM gcr.io/distroless/base-debian10

COPY --from=conda /env /env
COPY model.pkl .
CMD [ \
  "/env/bin/gunicorn", "-b", "0.0.0.0:8000", "-k", "uvicorn.workers.UvicornWorker", "nyc_taxi_fare.serve:app" \
]

With that, we can bring the total container size to 457 MiB. Of the total size, the conda environment is making up 90%. Thus we finally reach rock bottom in what we can optimise in the size of the container and the size of the environment given the current minimal set of dependencies.

% docker image ls nyc-taxi-distroless
REPOSITORY            TAG       IMAGE ID       CREATED        SIZE
nyc-taxi-distroless   latest    a08a43bbb9a7   21 hours ago   457MB
% docker image history nyc-taxi-distroless
IMAGE          CREATED        CREATED BY                                      SIZE      COMMENT
a08a43bbb9a7   21 hours ago   /bin/sh -c #(nop)  CMD ["/env/bin/gunicorn" …   0B
35001d8906ca   21 hours ago   /bin/sh -c #(nop) COPY file:985b6980c4f46538…   326kB
f413381a803c   21 hours ago   /bin/sh -c #(nop) COPY dir:474655ed326942b52…   437MB
af31651e48fe   51 years ago   bazel build ...                                 17.4MB
<missing>      51 years ago   bazel build ...                                 1.82MB

Using --mount=type=cache instead of conda clean with BuildKit

In the above approaches, we have always explicitly removed the conda cache after installing the environment. We also did download the packages fully on each non-cached build. In newer versions of docker you can use its BuildKit backend though which also supports mounting cache volumes during the build phase. For this, we need to adjust the mamba create line to:

RUN --mount=type=cache,target=/opt/conda/pkgs mamba create --copy -p /env --file predict-linux-64.lock

Additionally, we now call docker buildx build instead of the typical docker build to force it to use the BuildKit backend. This has the benefit that we don’t need to clean the cache explicitly anymore but also that build time drop on my machine from 56s down to 23s if the packages were already downloaded during a previous container build.

(Expert) Deleting unnecessary files from the environment

While we have now trimmed down our container to the minimal overhead and trimmed the package list of the conda environment down to the minimal set, we still are at 457MiB in size. This is much more than we actually need to run the web app. When you really know what you are doing, you can now look into the internals of the conda environment and delete files that aren’t needed for the specific app to trim down the image size.

For doing this, we run the last single-stage image that was mentioned in the article using docker run -ti nyc-taxi-mambaforge-predict-only-deps /bin/bash and use apt update && apt install -y ncdu to install ncdu as a command line UI to inspect the conda environment in detail. Using du -sbh nyc-taxi-fare-prediction-deployment-example we a starting size of 422MiB.

Here we can split the list of tasks into two categories: Deleting files that you can get rid of because you know exactly that you don’t need them in this case and secondly deleting files that shouldn’t be part of the conda package at all.

With the knowledge that we are running a LightGBM model inside a FastAPI app inside of gunicorn, we can get rid of the following things and saves storage space:

In addition, the following removals also brought us a bit of space. These removals are not specific to our usage of the packages but should in general not be part of the main conda package of these artefacts.

Overall, with all these changes, we can bring the total container size down to 296MiB.

Conclusion

Starting from a 4.48GiB image with a lot of beginner mistakes, we brought the Docker container for the FastAPI+gunicorn+LightGBM+scikit-learn setup down to 296MiB. It is probably possible to strip this even further down but less than 300MiB sounds like an acceptable size for such a service.

We have settled on the following, multi-stage, BuildKit-caching distroless Dockerfile:

# Container for building the environment
FROM condaforge/mambaforge:4.9.2-5 as conda

COPY predict-linux-64.lock .
RUN --mount=type=cache,target=/opt/conda/pkgs mamba create --copy -p /env --file predict-linux-64.lock && echo 4
COPY . /pkg
RUN conda run -p /env python -m pip install --no-deps /pkg
# Clean in a separate layer as calling conda still generates some __pycache__ files
RUN find -name '*.a' -delete && \
  rm -rf /env/conda-meta && \
  rm -rf /env/include && \
  rm /env/lib/libpython3.9.so.1.0 && \
  find -name '__pycache__' -type d -exec rm -rf '{}' '+' && \
  rm -rf /env/lib/python3.9/site-packages/pip /env/lib/python3.9/idlelib /env/lib/python3.9/ensurepip \
    /env/lib/libasan.so.5.0.0 \
    /env/lib/libtsan.so.0.0.0 \
    /env/lib/liblsan.so.0.0.0 \
    /env/lib/libubsan.so.1.0.0 \
    /env/bin/x86_64-conda-linux-gnu-ld \
    /env/bin/sqlite3 \
    /env/bin/openssl \
    /env/share/terminfo && \
  find /env/lib/python3.9/site-packages/scipy -name 'tests' -type d -exec rm -rf '{}' '+' && \
  find /env/lib/python3.9/site-packages/numpy -name 'tests' -type d -exec rm -rf '{}' '+' && \
  find /env/lib/python3.9/site-packages/pandas -name 'tests' -type d -exec rm -rf '{}' '+' && \
  find /env/lib/python3.9/site-packages -name '*.pyx' -delete && \
  rm -rf /env/lib/python3.9/site-packages/uvloop/loop.c

# Distroless for execution
FROM gcr.io/distroless/base-debian10

COPY --from=conda /env /env
COPY model.pkl .
CMD [ \
  "/env/bin/gunicorn", "-b", "0.0.0.0:8000", "-k", "uvicorn.workers.UvicornWorker", "nyc_taxi_fare.serve:app" \
]

This container can be build and run using the following two commands:

docker buildx build -t nyc-taxi-distroless-buildkit-expert --load -f docker/Dockerfile.distroless-buildkit-expert .
docker run -ti -p 8000:8000 nyc-taxi-distroless-buildkit-expert

As a follow-up, I’ll write a second article that is more in a cheatsheet version that you can refer as a general reminder on how to containerise conda environments.

Another follow-up could be a comparison of the container we have built here with the various versions that are offered by @tiangolo’s uvicorn-gunicorn-fastapi-docker. We will definitely see differences in size but it would also be interesting on how they compete in build time and runtime performance.

Given that I look at a lot of problems with Data Science / Machine Learning in mind, we could also have a look on how to reduce the runtime dependencies by using runtimes like ONNX or treelite.

Improving the Stats Quo

It would be easy but also a bit boring to just report on the status quo. Thus during the writing of the blog post, I made the following pull requests to improve the status quo a bit:

Title picture: Photo by Jonas Smith on Unsplash