Over ons 🤖

Laten we elkaar leren kennen

Vertel me de missie en visie

Leg het verhaal achter Mach8 uit

Stel een vraag!

Hallo daar 👋

Hoe kunnen we je helpen?

Volledige naam

E-mail

Bericht

Mijn gegevens mogen worden gebruikt om me op de hoogte te houden van relevant nieuws van Mach8

Bellen

+31 13 71 13 708

•

E-mail

innovation@mach8.io

Knowledge base›Implementation & Technology

Implementation & Technology·7 min·4 May 2025

Containerisation of AI applications: Docker and deployment best practices

An AI application that works locally but crashes in production due to dependency problems is a classic scenario. Containerisation solves that: you package your application including all dependencies into an image that runs the same way everywhere.

Docker has been the standard for containerising web applications for years. The same benefits apply to AI applications, but there are additional considerations: larger images due to ML libraries, GPU support for on-premise models, and how you inject environment variables and secrets. This article describes the approach.

Why containerise?

Containerisation offers three main benefits:

Reproducibility: Exactly the same environment locally, in staging, and in production. No more "works on my machine".
Portability: The same container runs on AWS, Google Cloud, Azure, your own server, or a laptop.
Isolation: Dependencies from different applications are kept separate and do not conflict.

For AI applications, reproducibility is especially valuable: ML libraries have complex dependencies and small version differences can lead to different model behaviour or crashes.

A basic Dockerfile for a Python AI application

A simple Dockerfile looks like this:

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python", "main.py"]

Use slim images to limit image size. Install dependencies before copying the rest of the code so Docker can cache the layer: if your code changes but your dependencies do not, Docker only needs to rebuild the top layer.

GPU support for on-premise models

If you run models locally on a GPU, you need a different base image. NVIDIA provides nvidia/cuda images that include CUDA support. Combine that with torch or transformers for the model infrastructure.

GPU containers also require the Docker runtime on the host to have GPU access. For NVIDIA GPUs you use the NVIDIA Container Toolkit. In a Kubernetes environment you use the NVIDIA device plugin.

This is more complex than regular CPU containers and requires specific infrastructure. Make sure you figure this out early in the project, not at the moment of deployment.

Secrets and environment variables

Never bake secrets into the Docker image. Always inject them at runtime via:

docker run -e OPENAI_API_KEY=... myapp
A .env file via docker run --env-file .env myapp
The secrets facilities of your deployment platform (Kubernetes Secrets, AWS ECS task definitions)

Check your .dockerignore file: it must exclude your .env file so it does not accidentally end up in the image.

Multi-stage builds for smaller images

For production you want images as small as possible. Multi-stage builds help with this: you build in a large builder image and copy only the necessary files to a lean production image.

FROM python:3.11 AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
CMD ["python", "main.py"]

Deployment options

Once created, you can deploy Docker images via:

Docker Compose: for simple multi-container setups on a single server
Kubernetes: for scalable, distributed production environments
Managed container services: AWS ECS/Fargate, Google Cloud Run, Azure Container Apps — fully managed solutions without cluster management

For most smaller AI applications, a managed service like Cloud Run is the simplest choice: you push an image and the platform handles the rest.

Health checks and graceful shutdown

Add a health check endpoint to your application so the platform can verify the container is healthy. With AI applications, startup time can be longer due to loading models or initialising connections.

Also implement graceful shutdown: ensure running AI calls are completed before the container stops, rather than being abruptly terminated.

Conclusion

Containerisation is not a luxury but a professional standard for AI applications in production. At Mach8 we deploy AI systems via containers as standard, making them reproducible, portable, and easy to update.

Want to know how Mach8 brings AI applications to production? View our AI agents service or get in touch.