Containerizing server workloads is growing ever more popular, and it’s becoming increasingly common to see web server deployments running in containers. Can the same benefits be applied to databases?
Docker Can Handle Stateful Workloads
It’s best to start this off by asking a different question: Can you even run a database in Docker? In general, Docker isn’t designed for stateful services. One of the major selling points of containers is that they can be stopped and started at will, usually connecting to an authoritative data source, like a database, to store their state. All the data in the container is ephemeral, and is destroyed when the container is deleted.
This makes running stateful workloads particularly challenging, but luckily Docker does have some tools for dealing with state: volume and bind mounts. These allow you to mount a location on the host machine to a location in the container, which will store data even when the container shuts down. This way, you can run containers long-term without worrying about the data being lost.
Volume mounts are the preferred way of handling most scenarios. They allow you to create a volume, which is managed by Docker…
docker volume create my-volume
…hen mount that volume to a target location inside the container:
docker run --mount source=my-volume,target=/app
Bind mounts are simpler. They’re what volumes use under the hood, but they allow you to manually set the location on the host disk rather than having it managed through Docker.
docker run ~/nginxlogs:/var/log/nginx
In practice, using these mounts can be a bit more complicated. Many managed Docker services, like AWS’s ECS, or managed Kubernetes, don’t give you direct access to the underlying server, and as such you won’t be able to directly make bind mount connections. Usually this is solved with a service like EFS, which allows mounting to ECS containers, or with an external datastore, like a database.
Should You Choose Docker for Your Database?
Docker is generally not great for handling state. Docker-based workloads usually outsource this problem to databases. With a database being the solution to the problem, is it practical to put your database in Docker?
Largely, the answer is “not usually.” Docker has come a long way since its inception, and it isn’t a terrible or “wrong” idea to containerize databases anymore. It certainly can be done, and has some benefits to it. However, for most general workloads, the benefits don’t outweigh the complications.
To see why, let’s look at the benefits Docker brings to the table:
- Easy scaling: servers can be created and destroyed quickly to meet demand
- Easier CI/CD tooling: automated builds are trivial
- Codification of your infrastructure: all underlying libraries and setup are handled in the Dockerfile
Most of these don’t exactly transfer well to database workloads, which are often long-term endeavors that favor data integrity above all else. You don’t generally want to be autoscaling most databases; they don’t usually themselves receive regular code updates, and as such don’t benefit as much from running in containers. And, if you’re just mounting a local storage drive anyway, why not run it outside Docker?
If you’re looking to free yourself from the complexities of managing databases, Docker isn’t the tool for the job. It’s simply unnecessary complication for a workload that can easily run on a standard VPS. You’ll likely be much better off using a fully managed database-as-a-service, like AWS’s RDS. This brings a lot of the automation that Docker is good for, without any of the headache of doing it yourself.
The main place where Docker can be useful for database workloads is in development environments. Docker makes it easy to spin up new databases with different configuration, which makes for quick testing. In production, however, the rules are generally stricter.