Why I Started Looking Into This
I run multiple Docker Compose stacks on my Proxmox homelab—some for n8n workflows, some for monitoring tools, and a few experimental projects. The pattern was always the same: run docker-compose up, watch the web container crash 2-3 times with “connection refused” errors, wait about 20 seconds, then everything works fine.
It annoyed me, but I lived with it. Until I added a new container that needed to connect to PostgreSQL during its initialization phase. That container would fail, exit, and never restart automatically. I had to manually restart the stack every single time.
That’s when I stopped accepting “just restart it” as a solution and actually dug into why depends_on wasn’t doing what I thought it should.
What I Actually Learned About depends_on
The Docker documentation is clear about this, but I had never read it carefully:
depends_on only controls startup order, not readiness.
When you write depends_on: db, Docker starts the database container first. But “started” just means the container process is running. It doesn’t mean PostgreSQL has finished initializing, loading configurations, or is ready to accept connections.
For PostgreSQL specifically, here’s what happens:
- Container starts (depends_on releases here)
- PostgreSQL initializes the data directory
- Loads configuration files
- Runs any init scripts
- Finally opens the port and accepts connections
That whole process takes 10-15 seconds on my setup. If your application tries to connect at second 2, it fails.
The Three Conditions Nobody Talks About
I didn’t know this until I read the Compose spec carefully: depends_on supports three different conditions.
services:
web:
depends_on:
db:
condition: service_started # Default
# condition: service_healthy # What we actually need
# condition: service_completed_successfully # For init containers
service_started is the default. It continues as soon as the container is running, which is why we see connection failures.
service_healthy waits for the container’s healthcheck to pass before starting dependent services. This is what I needed.
service_completed_successfully waits for the container to exit with code 0. I use this for migration containers that need to run once before the main app starts.
My PostgreSQL Healthcheck Configuration
PostgreSQL’s official image includes pg_isready, which checks if the database is actually accepting connections. Here’s what I use:
version: '3.8'
services:
web:
image: node:20-alpine
depends_on:
db:
condition: service_healthy
restart: true
environment:
DATABASE_URL: postgresql://postgres:password@db:5432/mydb
command: npm start
db:
image: postgres:16
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
POSTGRES_DB: mydb
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres -d mydb"]
interval: 10s
timeout: 5s
retries: 3
start_period: 60s
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:
Why These Specific Values
interval: 10s — Checks every 10 seconds. I tried 5s initially but it felt excessive for my use case.
timeout: 5s — If pg_isready hangs for more than 5 seconds, something is wrong. This has never actually triggered in my setup.
retries: 3 — Three consecutive failures before marking unhealthy. This handles brief network hiccups or database load spikes.
start_period: 60s — This is the most important one. Failures during the first 60 seconds don’t count toward the retry limit. PostgreSQL needs time to initialize, and without this grace period, the container would be marked unhealthy before it even finished starting.
I initially set start_period to 30s and saw occasional false negatives on slower hardware. 60s has been reliable across my Proxmox VMs and even on my Synology NAS.
The -U and -d Flags Matter
I originally used just pg_isready without arguments. It worked, but filled my logs with warnings about connecting as the wrong user. Adding -U postgres -d mydb specifies the actual user and database, which silences those warnings and makes the check more accurate.
What Didn’t Work
Using Shell Scripts
I tried the popular wait-for-it.sh approach first—adding a script to the container that polls the database before starting the application. It worked, but it felt messy. I had to add the script to my images, modify entrypoints, and maintain extra code that Docker’s healthcheck system already handles better.
Short start_period Values
My first healthcheck used start_period: 15s. On my main Proxmox host with NVMe storage, it worked fine. But when I deployed the same stack to my older backup server with spinning disks, PostgreSQL took 25 seconds to initialize. The container was marked unhealthy before it even had a chance.
I bumped it to 60s everywhere. Better to wait a bit longer than deal with intermittent failures.
Checking Port Availability Instead of Service Health
I experimented with using nc -z localhost 5432 to check if the port was open. The port opens before PostgreSQL is fully ready, so this check passed too early and my application still crashed.
pg_isready actually attempts a connection handshake, which is a much better indicator of readiness.
MySQL Is Similar But Different
I run MySQL for one legacy project. The healthcheck uses mysqladmin ping instead:
db:
image: mysql:8.0
environment:
MYSQL_ROOT_PASSWORD: password
MYSQL_DATABASE: mydb
healthcheck:
test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-ppassword"]
interval: 10s
timeout: 5s
retries: 3
start_period: 60s
Note the -ppassword with no space. That’s how mysqladmin expects it.
The Password Environment Variable Trap
I wanted to avoid hardcoding the password, so I tried:
test: ["CMD-SHELL", "mysqladmin ping -u root -p$MYSQL_ROOT_PASSWORD"]
This failed because Docker Compose interpolates $MYSQL_ROOT_PASSWORD on my host machine before the container even starts. The container received the literal value from my host environment, not the container’s environment.
The fix is using $$:
test: ["CMD-SHELL", "mysqladmin ping -u root -p$$MYSQL_ROOT_PASSWORD"]
The double dollar sign tells Compose to leave it alone and let the container’s shell parse it.
Debugging Unhealthy Containers
When a container stays unhealthy, I check the health status directly:
docker inspect --format='{{json .State.Health}}' container_name | jq
This shows the actual healthcheck output, exit codes, and timestamps. Most of my issues were either:
- Wrong username or database name in the healthcheck command
- start_period too short
- Actual PostgreSQL startup failures (usually permission issues with mounted volumes)
What I Do Now
Every new Docker Compose stack I create includes healthchecks from the start. It takes 5 minutes to configure and eliminates an entire class of startup race conditions.
My standard template:
- Database containers: Always have healthchecks with 60s start_period
- Application containers: Use
condition: service_healthyin depends_on - Migration containers: Use
condition: service_completed_successfully
I also set restart: true in the depends_on block so if the database container restarts, dependent containers restart too. This has saved me from subtle issues where the database restarted but the application kept a stale connection.
Key Takeaways
depends_on alone is not enough. It only controls startup order, not readiness. Use condition: service_healthy to wait for actual service availability.
start_period matters more than you think. Set it high enough to cover slow hardware and initialization scripts. 60 seconds is a safe default for databases.
Use the right tool for the check. PostgreSQL has pg_isready, MySQL has mysqladmin ping. Don’t reinvent these with shell scripts or port checks.
Test on your slowest hardware. A healthcheck that works on your development machine might fail on slower production hardware or when the system is under load.
The goal isn’t perfect startup orchestration—it’s predictable, reliable container initialization without manual intervention. Healthchecks get you there.