An AI application that works locally but crashes in production due to dependency problems is a classic scenario. Containerisation solves that: you package your application including all dependencies into an image that runs the same way everywhere.
Docker has been the standard for containerising web applications for years. The same benefits apply to AI applications, but there are additional considerations: larger images due to ML libraries, GPU support for on-premise models, and how you inject environment variables and secrets. This article describes the approach.
Containerisation offers three main benefits:
For AI applications, reproducibility is especially valuable: ML libraries have complex dependencies and small version differences can lead to different model behaviour or crashes.
A simple Dockerfile looks like this:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "main.py"]
Use slim images to limit image size. Install dependencies before copying the rest of the code so Docker can cache the layer: if your code changes but your dependencies do not, Docker only needs to rebuild the top layer.
If you run models locally on a GPU, you need a different base image. NVIDIA provides nvidia/cuda images that include CUDA support. Combine that with torch or transformers for the model infrastructure.
GPU containers also require the Docker runtime on the host to have GPU access. For NVIDIA GPUs you use the NVIDIA Container Toolkit. In a Kubernetes environment you use the NVIDIA device plugin.
This is more complex than regular CPU containers and requires specific infrastructure. Make sure you figure this out early in the project, not at the moment of deployment.
Never bake secrets into the Docker image. Always inject them at runtime via:
docker run -e OPENAI_API_KEY=... myapp.env file via docker run --env-file .env myappCheck your .dockerignore file: it must exclude your .env file so it does not accidentally end up in the image.
For production you want images as small as possible. Multi-stage builds help with this: you build in a large builder image and copy only the necessary files to a lean production image.
FROM python:3.11 AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
CMD ["python", "main.py"]
Once created, you can deploy Docker images via:
For most smaller AI applications, a managed service like Cloud Run is the simplest choice: you push an image and the platform handles the rest.
Add a health check endpoint to your application so the platform can verify the container is healthy. With AI applications, startup time can be longer due to loading models or initialising connections.
Also implement graceful shutdown: ensure running AI calls are completed before the container stops, rather than being abruptly terminated.
Containerisation is not a luxury but a professional standard for AI applications in production. At Mach8 we deploy AI systems via containers as standard, making them reproducible, portable, and easy to update.
Want to know how Mach8 brings AI applications to production? View our AI agents service or get in touch.
We help you go from strategy to implementation. Schedule a no-obligation call.
Schedule a call