Why I Started Using Health Checks in Docker Compose
I run multiple containers on my Proxmox setup—databases, web apps, automation tools like n8n, and monitoring services. For a while, I relied on Docker's restart: unless-stopped policy and assumed that was enough. If a container crashed, Docker would restart it. Simple.
Then I started noticing a problem: containers would sometimes stay running but become unresponsive. The process was alive, but the service inside wasn't working. Docker had no idea because the container itself hadn't exited. I'd only find out when something downstream failed or when I manually checked.
That's when I added health checks to my compose files. They helped with startup ordering and gave me visibility into whether services were actually functional, not just running.
What I Expected (and What Actually Happened)
When I first configured health checks, I assumed Docker would automatically restart containers that became unhealthy. I had restart: unless-stopped set, so I thought: unhealthy state → restart. That's not how it works.
Docker's restart policies only trigger on container exit. A container marked as unhealthy stays running unless something explicitly stops it. The health check is purely informational by default—it updates the container's status but doesn't take action.
I confirmed this by watching a container go unhealthy and stay that way indefinitely. Docker flagged it, but nothing happened. I had to restart it manually.
The Options I Considered
Option 1: Let the Application Handle It
The most direct approach is to make the health check endpoint actually exit the process if something is wrong. For example, in a Node.js app:
app.get('/health', (req, res) => {
if (!isHealthy()) {
process.exit(1);
}
res.status(200).send('OK');
});
This works if you control the application code and want tight coupling between health status and container lifecycle. I didn't use this because I run third-party images where I can't modify the app logic, and I wanted a solution that worked across all my containers.
Option 2: Use an External Monitor Container
There's a container called autoheal that watches other containers and restarts unhealthy ones. It needs access to the Docker socket, which made me pause. I don't know the maintainer, and the project isn't officially backed. Giving socket access to something I haven't vetted felt risky, even if it has a lot of GitHub stars.
I decided against it—not because it's bad, but because I wanted something I understood and controlled.
Option 3: A systemd Timer with a Simple Script
I found a one-liner on Stack Overflow that does exactly what I needed:
docker ps -q -f health=unhealthy | xargs --no-run-if-empty docker restart
This checks for unhealthy containers and restarts them. It's straightforward, doesn't require extra containers, and I can see exactly what it does.
I wrapped it in a systemd service and timer instead of using cron because I wanted better logging and integration with my system's service management. It's probably overkill, but it felt cleaner to me.
What I Actually Configured
I created three files in ~/.config/systemd/user:
The Script (restart-unhealthy.sh)
#!/bin/bash
docker ps -q -f health=unhealthy | xargs --no-run-if-empty docker restart
This runs the Docker command and only restarts containers if any are unhealthy. The --no-run-if-empty flag prevents xargs from running if there's no input.
The Service (restart-unhealthy.service)
[Unit]
Description=Restart unhealthy docker containers
After=docker.service
Wants=docker.service
[Service]
Type=oneshot
ExecStart=/home/${USER}/.config/systemd/user/restart-unhealthy.sh
This defines the service that runs the script. Type=oneshot means it runs once and exits, which is what I want for a periodic check.
The Timer (restart-unhealthy.timer)
[Unit]
Description=Run docker unhealthy restart every 5 minutes
Requires=restart-unhealthy.service
After=docker.service
Wants=docker.service
[Timer]
OnCalendar=*:0/5
Persistent=true
[Install]
WantedBy=timers.target
This triggers the service every 5 minutes. Persistent=true ensures missed runs (if the system was off) get executed when it comes back up.
I enabled and started it with:
systemctl --user daemon-reload
systemctl --user enable restart-unhealthy.timer
systemctl --user start restart-unhealthy.timer
What This Doesn't Do
This setup restarts containers when they become unhealthy, but it doesn't prevent data loss if a restart happens mid-operation. If a container is writing to a database or processing a job, restarting it abruptly could corrupt data or leave things in an inconsistent state.
For services where that matters, I use volume mounts to persist data outside the container. The restart only affects the running process, not the stored data. I also configure health checks to fail gracefully—if a service can't recover on its own, I want to know about it rather than have it restart endlessly.
What I Learned
Health checks in Docker Compose are useful for visibility and dependency management, but they don't trigger restarts by default. You have to add that behavior yourself.
The simplest approach that worked for me was a periodic script that checks for unhealthy containers and restarts them. It's not fancy, but it's reliable and doesn't require running extra containers or modifying application code.
If you're running critical services where a restart could cause data loss, make sure you're using persistent volumes and that your health checks are designed to catch real issues, not transient hiccups.
This setup has been running on my Proxmox server for several months now, and I haven't had to manually restart a stuck container since.