Debugging systemd service dependencies when running self-hosted map servers with PostGIS and Nominatim in Docker Compose

Why I Worked on This

I run a self-hosted mapping stack on my Proxmox server using Docker Compose. The setup includes PostGIS for spatial data and Nominatim for geocoding. When I first deployed this, the services would randomly fail to start after a reboot. Sometimes PostGIS would be ready, but Nominatim would timeout. Other times, both would start but in the wrong order, causing Nominatim to crash because it couldn’t connect to the database.

The problem wasn’t the containers themselves—they worked fine when started manually. The issue was systemd not understanding the dependency chain between Docker Compose and the services inside the containers. I needed to figure out how to make systemd wait for the right conditions before declaring success or failure.

My Real Setup

I’m running this on Proxmox 8.x with an Ubuntu 22.04 LXC container dedicated to mapping services. Inside that container:

Docker Compose manages three containers: PostGIS (postgres:15-postgis-3.4), Nominatim (mediagis/nominatim:4.4), and a simple nginx reverse proxy
PostGIS needs to initialize its database schema on first boot
Nominatim depends on PostGIS being fully ready, not just “started”
I use a systemd service unit to start Docker Compose on boot

The Docker Compose file itself handles internal dependencies with depends_on, but that only controls container start order—not readiness. Systemd doesn’t know when PostGIS is actually accepting connections or when Nominatim has finished its initialization queries.

What Didn’t Work

Just Using After= and Requires=

My first attempt was a basic systemd unit file:

[Unit]
Description=Map Services
After=docker.service
Requires=docker.service

[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/opt/mapstack
ExecStart=/usr/bin/docker-compose up -d
ExecStop=/usr/bin/docker-compose down

[Install]
WantedBy=multi-user.target

This started Docker Compose, but systemd immediately marked the service as “active” once the command returned. The containers were still initializing. When I checked systemctl status mapstack, it showed green, but docker-compose logs revealed Nominatim was crashing with “connection refused” errors to PostGIS.

Type=forking With PIDFile

I tried switching to Type=forking and pointing to Docker’s PID file. This failed because Docker Compose doesn’t create a single PID file for all managed containers. The service would timeout or fail unpredictably. journalctl -u mapstack showed:

mapstack.service: Can't open PID file /var/run/mapstack.pid (yet?) after start: Operation not permitted

I abandoned this approach quickly.

Polling Scripts That Guessed

I wrote a bash script to check if PostGIS was ready by attempting a psql connection in a loop. This worked sometimes, but introduced a new problem: if the script took too long, systemd would kill it due to TimeoutStartSec. If I increased the timeout to something huge like 300 seconds, the system would hang during boot if something was actually broken.

The script also didn’t account for Nominatim’s own readiness. I needed a better way to signal when the entire stack was operational.

What Worked

Using Type=notify With a Wrapper Script

The breakthrough was switching to Type=notify and using systemd-notify to signal readiness explicitly. I created a wrapper script that:

Starts Docker Compose in detached mode
Waits for PostGIS to accept connections using pg_isready
Waits for Nominatim to respond to HTTP health checks
Sends a ready notification to systemd only after both checks pass

Here’s the actual script I use (/opt/mapstack/start-with-health.sh):

#!/bin/bash
set -e

cd /opt/mapstack

# Start containers
/usr/bin/docker-compose up -d

# Wait for PostGIS (max 60 seconds)
echo "Waiting for PostGIS..."
for i in {1..60}; do
  if docker exec mapstack-postgis-1 pg_isready -U nominatim -d nominatim > /dev/null 2>&1; then
    echo "PostGIS ready"
    break
  fi
  if [ $i -eq 60 ]; then
    echo "PostGIS failed to become ready"
    exit 1
  fi
  sleep 1
done

# Wait for Nominatim (max 120 seconds)
echo "Waiting for Nominatim..."
for i in {1..120}; do
  if curl -sf http://localhost:8080/status > /dev/null 2>&1; then
    echo "Nominatim ready"
    break
  fi
  if [ $i -eq 120 ]; then
    echo "Nominatim failed to become ready"
    exit 1
  fi
  sleep 1
done

# Signal systemd that we're ready
systemd-notify --ready

echo "Map stack fully operational"

The updated systemd unit file:

[Unit]
Description=Map Services with Health Checks
After=docker.service network-online.target
Requires=docker.service
Wants=network-online.target

[Service]
Type=notify
NotifyAccess=all
WorkingDirectory=/opt/mapstack
ExecStart=/opt/mapstack/start-with-health.sh
ExecStop=/usr/bin/docker-compose down
TimeoutStartSec=300
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target

Key changes:

Type=notify tells systemd to wait for an explicit ready signal
NotifyAccess=all allows the script to send notifications
TimeoutStartSec=300 gives enough time for large Nominatim datasets to initialize
Restart=on-failure with RestartSec=10 handles transient failures without hammering the system

Debugging With journalctl

When things went wrong, I relied heavily on journalctl -u mapstack -f to watch the startup process in real time. This showed me exactly where the script was stalling. For example, I initially forgot to redirect stderr in the pg_isready check, and the logs were filled with connection error spam until PostGIS was ready.

I also used journalctl -u mapstack --since "10 minutes ago" after reboots to see the full startup sequence without scrolling through unrelated logs.

Container-Level Health Checks

Inside my docker-compose.yml, I added proper health checks to make the container state more observable:

services:
  postgis:
    image: postgres:15-postgis-3.4
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U nominatim -d nominatim"]
      interval: 10s
      timeout: 5s
      retries: 5

  nominatim:
    image: mediagis/nominatim:4.4
    depends_on:
      postgis:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/status"]
      interval: 30s
      timeout: 10s
      retries: 3

This made docker-compose ps show actual health status instead of just “Up”. My wrapper script could then use docker inspect to check health state instead of manual probing, but I found the direct checks (pg_isready, curl) to be more reliable and faster.

Limitations and Trade-offs

This approach works for my setup, but it has clear limits:

The wrapper script is specific to my container names and ports. If I change the Compose project name or port mappings, the script breaks.
The fixed timeout values (60s for PostGIS, 120s for Nominatim) are tuned for my hardware and dataset size. A larger Nominatim import would need longer timeouts.
If PostGIS starts but is unhealthy (e.g., corrupted data), my script will still report success because pg_isready only checks connectivity.
The script doesn’t handle partial failures well. If PostGIS is ready but Nominatim fails, Docker Compose is still running, and I have to manually clean up.

I also considered using systemd socket activation, but that doesn’t fit this use case—PostGIS and Nominatim need to be persistently running, not started on-demand.

Key Takeaways

Type=notify is the right tool when you need to wait for complex readiness conditions that systemd can’t detect on its own. Don’t rely on Type=oneshot or Type=forking for services where “started” and “ready” are different states.

Health check scripts should fail fast with clear exit codes. My initial version had silent failures that left systemd waiting until timeout. Explicit error messages in the script output made debugging much faster.

Docker Compose’s depends_on with condition: service_healthy is useful but not sufficient. Systemd still needs to know when the entire stack is operational, not just when containers have started.

journalctl -u <service> -f during boot is invaluable. I keep a terminal open with this running whenever I’m testing changes to the service unit or startup script.

Timeouts should be realistic but not infinite. I initially set TimeoutStartSec=0 (no limit) which caused boot hangs when something was misconfigured. A 5-minute timeout is long enough for my stack but short enough to catch real problems.

If your service depends on network resources (like external tile servers or APIs), add After=network-online.target and Wants=network-online.target. I learned this the hard way when my LXC container’s network wasn’t fully up before Docker tried to pull images during first boot.

Tech Expert & Vibe Coder

Why I Worked on This

My Real Setup

What Didn’t Work

Just Using After= and Requires=

Type=forking With PIDFile

Polling Scripts That Guessed

What Worked

Using Type=notify With a Wrapper Script

Debugging With journalctl

Container-Level Health Checks

Limitations and Trade-offs

Key Takeaways

Category:

Setting Up Automated Home...

Implementing Prometheus...

Leave a Comment Cancel reply

Categories

Related Posts

Setting Up Automated Home Assistant Backup...

Implementing Prometheus Alerting for Silent Cron...

Building N8n Workflows for Cross-platform Package...

About Me

Vipin PG

Tech Expert & Vibe Coder

Debugging systemd service dependencies when running self-hosted map servers with PostGIS and Nominatim in Docker Compose

Why I Worked on This

My Real Setup

What Didn’t Work

Just Using After= and Requires=

Type=forking With PIDFile

Polling Scripts That Guessed

What Worked

Using Type=notify With a Wrapper Script

Debugging With journalctl

Container-Level Health Checks

Limitations and Trade-offs

Key Takeaways

Category:

Setting Up Automated Home...

Implementing Prometheus...

Leave a Comment Cancel reply

Subscribe to Newsletter

Categories

Related Posts

Setting Up Automated Home Assistant Backup...

Implementing Prometheus Alerting for Silent Cron...

Building N8n Workflows for Cross-platform Package...

About Me

Vipin PG