Skip to content

2025

Run LLMs Anywhere as a Single File with Docker

The Problem

Deploying large language models (LLMs) can be a hassle on local machines. It involves setting up Python environments, CUDA drivers, model downloads, and platform-specific quirks. But as a developer who often switches between different machines and operating systems (Ubuntu, Windows, and macOS), I needed something portable. Something that would run the same way everywhere.

The Solution: A Single Dockerfile

The answer lies in combining Mozilla’s Llamafiles with Docker. "Each llamafile contains both server code and model weights, making the deployment of an LLM as easy as downloading and executing a single file. It also leverages the popular llama.cpp project for fast model inference." Wrapping that in Docker gives us a universally portable container that runs the API server with a single command.