Why I Built This
I run several containerized services on my Proxmox homelab, and I've had deployments break in ways that weren't immediately obvious. A new image would pull, containers would restart, health checks would pass initially, but then something would fail under actual load—a database migration issue, a broken API endpoint, or a configuration incompatibility.
The problem was always the same: I'd notice something was wrong minutes or hours later, then scramble to manually roll back. I needed a way to automatically detect when a deployment actually failed and revert without me being there to babysit it.
I looked at Docker Compose's watch feature and health checks, thinking I could combine them into something that would monitor deployments and roll back automatically if things went sideways. This is what I built and what actually worked.
What Docker Compose Watch Actually Does
Docker Compose watch is designed for development—it watches your local files and syncs changes into running containers. It's not a deployment monitor. I initially misunderstood this and thought I could use it to watch container health after updates.
I was wrong. The watch feature doesn't monitor container state or health checks. It watches filesystem changes. So the title premise here doesn't work as stated—you can't use docker compose watch to monitor health check thresholds for rollbacks.
What I actually built instead was a shell script that monitors health checks and handles rollbacks. It's less elegant than a built-in feature would be, but it works.
My Actual Setup
I have a compose file for a small web application that looks like this:
version: '3.8'
services:
web:
image: myapp:${VERSION:-latest}
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 10s
timeout: 5s
retries: 3
start_period: 30s
ports:
- "8080:8080"
The key parts: I use a VERSION environment variable instead of hardcoding latest, and I have a health check that actually tests if the application responds correctly.
The start_period is important—it gives the container 30 seconds to start up before health checks count as failures. Without this, slow-starting containers would fail health checks during normal startup.
The Rollback Script
I wrote a bash script that handles the deployment and monitors health. It's not pretty, but it does the job:
#!/bin/bash
COMPOSE_FILE="docker-compose.yml"
SERVICE_NAME="web"
HEALTH_CHECK_DURATION=60
HEALTH_CHECK_INTERVAL=5
# Record current version
CURRENT_VERSION=$(docker compose images -q $SERVICE_NAME)
echo "Current version: $CURRENT_VERSION"
# Pull and deploy new version
docker compose pull
docker compose up -d
# Wait for start period to complete
echo "Waiting for start period..."
sleep 35
# Monitor health for specified duration
echo "Monitoring health for ${HEALTH_CHECK_DURATION} seconds..."
elapsed=0
while [ $elapsed -lt $HEALTH_CHECK_DURATION ]; do
health=$(docker inspect --format='{{.State.Health.Status}}' $(docker compose ps -q $SERVICE_NAME))
if [ "$health" != "healthy" ]; then
echo "Health check failed: $health"
echo "Rolling back to $CURRENT_VERSION"
docker compose down
docker tag $CURRENT_VERSION myapp:latest
docker compose up -d
exit 1
fi
sleep $HEALTH_CHECK_INTERVAL
elapsed=$((elapsed + HEALTH_CHECK_INTERVAL))
done
echo "Deployment successful"
This script captures the current image hash, deploys the new version, then polls the health status every 5 seconds for 60 seconds. If health ever becomes unhealthy, it rolls back.
What Didn't Work
My first attempt tried to be clever with Docker events. I used docker events to watch for health status changes, thinking I could trigger rollbacks that way. The problem was that events are noisy and don't give you a clear "this deployment failed" signal. I was getting false positives from normal container restarts.
I also tried using docker compose watch with a custom sync configuration, thinking I could watch the container state somehow. That was a complete misunderstanding of what the feature does. It's for development file syncing, not production monitoring.
The health check retries setting also tripped me up initially. I set it to 1, thinking that would make failures fast. But that meant any transient issue during startup would trigger a rollback. Setting it to 3 gave enough tolerance for normal startup hiccups without masking real problems.
The Image Tagging Problem
One issue I hit: when you pull a new latest tag, Docker doesn't keep the old image tagged. It becomes <none>. My rollback script needs the old image hash to revert.
I solved this by capturing the image ID before pulling, then using docker tag to re-tag it as latest during rollback. This works, but it means I can only roll back one version. If I deploy twice in a row, the first version is gone unless I manually tag it.
A better approach would be to use explicit version tags (like v1.2.3) instead of latest, and track the last known good version in a file. I haven't implemented this yet because my current setup is simple enough that single-version rollback is sufficient.
Health Check Design Matters
The health check endpoint itself needs to be meaningful. I initially just checked if the HTTP server responded with 200. That wasn't enough—the server could be up but the database connection could be broken.
Now my /health endpoint actually tests:
- Database connectivity
- Critical configuration loaded correctly
- Any external dependencies are reachable
This catches real problems. The trade-off is that the health check takes longer to run (about 2 seconds), which is why I set the timeout to 5 seconds.
Limitations I'm Living With
This approach only works for single-container services or services where all containers must be healthy. If you have a multi-container setup where some containers can be unhealthy temporarily, you'd need more sophisticated logic.
The monitoring window is fixed at 60 seconds. Some issues only appear under sustained load or after several minutes. I'm not catching those. For my use case, most deployment failures show up quickly, so this is acceptable.
There's no notification system. If a rollback happens and I'm not watching, I only find out later when I check logs. I should add a webhook call or email notification, but I haven't gotten around to it.
What I Learned
Docker Compose watch is not a deployment monitoring tool. I wasted time trying to make it do something it wasn't designed for.
Health checks need a proper start period, or you'll get false failures during normal startup. 30 seconds works for my applications, but this varies.
Keeping old image versions around requires explicit tagging. The latest tag is convenient but makes rollbacks harder.
Automated rollbacks only work if your health checks actually test what matters. A simple "is the server up" check isn't enough.
This entire approach is a workaround for not having a proper deployment pipeline. For anything critical, I should be using proper versioning, blue-green deployments, or a real orchestration system. But for homelab services where I'm the only user, this script does the job.