Creating a bash script to monitor Docker Compose stack health across multiple hosts with centralized logging to Loki

Why I Built This

I run Docker Compose stacks across multiple Proxmox LXC containers and VMs. When something breaks—a container crashes, a volume fills up, or a service stops responding—I need to know immediately. Checking each host manually wastes time and doesn’t scale.

I already had Grafana and Loki running for general monitoring, but I needed something specific: a script that could check Docker Compose stack health on each host and push structured logs to Loki. Not metrics. Not generic container stats. Just clear health checks with context I could query later.

My Real Setup

I have four hosts running Docker Compose stacks:

One control host (Proxmox LXC) running Grafana and Loki
Three remote hosts (mix of LXCs and VMs) running various application stacks

Each remote host needed to run a health check script on a schedule and send results to the central Loki instance. I didn’t want to install Promtail on every host just for this—too much overhead for what I needed.

The script had to:

Check if Docker Compose stacks are running
Identify unhealthy or stopped containers
Send structured logs directly to Loki’s HTTP API
Run via cron without manual intervention

The Script I Actually Use

Here’s the bash script I wrote. It’s not elegant, but it works reliably:

#!/bin/bash

LOKI_URL="http://<control-host-ip>:3100/loki/api/v1/push"
HOSTNAME=$(hostname)
TIMESTAMP=$(date +%s%N)

check_stack_health() {
    local stack_dir=$1
    local stack_name=$(basename "$stack_dir")
    
    cd "$stack_dir" || return 1
    
    if [ ! -f "docker-compose.yml" ]; then
        send_log "error" "No docker-compose.yml found in $stack_dir"
        return 1
    fi
    
    local containers=$(docker-compose ps -q)
    
    if [ -z "$containers" ]; then
        send_log "warning" "No containers found for stack: $stack_name"
        return 0
    fi
    
    local unhealthy=0
    local stopped=0
    
    for container in $containers; do
        local status=$(docker inspect --format='{{.State.Status}}' "$container")
        local health=$(docker inspect --format='{{.State.Health.Status}}' "$container" 2>/dev/null)
        local name=$(docker inspect --format='{{.Name}}' "$container" | sed 's////')
        
        if [ "$status" != "running" ]; then
            send_log "error" "Container $name in stack $stack_name is $status"
            ((stopped++))
        elif [ "$health" = "unhealthy" ]; then
            send_log "warning" "Container $name in stack $stack_name is unhealthy"
            ((unhealthy++))
        fi
    done
    
    if [ $unhealthy -eq 0 ] && [ $stopped -eq 0 ]; then
        send_log "info" "Stack $stack_name is healthy"
    fi
}

send_log() {
    local level=$1
    local message=$2
    
    local json_payload=$(cat <<EOF
{
  "streams": [
    {
      "stream": {
        "job": "docker-health-check",
        "host": "$HOSTNAME",
        "level": "$level"
      },
      "values": [
        ["$TIMESTAMP", "$message"]
      ]
    }
  ]
}
EOF
)
    
    curl -s -X POST "$LOKI_URL" 
        -H "Content-Type: application/json" 
        -d "$json_payload" > /dev/null
}

# Main execution
STACK_DIRS=(
    "/opt/stacks/app1"
    "/opt/stacks/app2"
    "/opt/stacks/app3"
)

for stack_dir in "${STACK_DIRS[@]}"; do
    check_stack_health "$stack_dir"
done

I saved this as /usr/local/bin/check-docker-health.sh on each remote host and made it executable.

How I Schedule It

I added a cron job to run the script every 5 minutes:

*/5 * * * * /usr/local/bin/check-docker-health.sh

This runs silently. All output goes to Loki, not to email or local logs.

What Worked

The script correctly identifies stopped and unhealthy containers. I can query Loki in Grafana using LogQL:

{job="docker-health-check", level="error"}

This shows me all failures across all hosts in one view. I can filter by host, stack name, or time range.

The structured logging format makes it easy to build alerts. I set up a Grafana alert that triggers when any host reports an error-level log. It sends a notification to my Discord webhook.

Using Loki’s HTTP API directly was the right choice. No extra agents, no file tailing, no log rotation issues. The script just pushes JSON and moves on.

What Didn’t Work

My first version tried to parse docker-compose ps output as text. This broke when container names had spaces or special characters. I switched to using docker-compose ps -q to get container IDs, then used docker inspect for reliable status checks.

I initially sent logs at nanosecond precision (date +%s%N), but Loki expects timestamps in nanoseconds as a string. If you send microseconds or milliseconds, logs won’t appear in order. This took me an hour to figure out.

Health checks don’t work for containers without a HEALTHCHECK defined in their Dockerfile or Compose file. The script returns “no value” for health status in those cases. I had to add explicit health checks to several of my stacks to make this useful.

Curl failures are silent in the script. If Loki is down or unreachable, the script doesn’t know. I considered adding error handling, but decided against it—if Loki is down, I have bigger problems and will notice through other monitoring.

Key Takeaways

Direct HTTP logging to Loki works well for custom health checks. You don’t need Promtail for everything.

Structured logs with consistent labels make querying and alerting straightforward. I use job, host, and level as my base labels.

Docker health checks must be explicitly defined. Relying on container state alone isn’t enough for real health monitoring.

Bash scripts for monitoring should fail silently and log remotely. Local debugging output just creates noise.

This setup has been running for three months across my hosts. It’s caught several issues I would have missed otherwise—mostly containers that restarted but came up in an unhealthy state.

Tech Expert & Vibe Coder

Why I Built This

My Real Setup

The Script I Actually Use

How I Schedule It

What Worked

What Didn’t Work

Key Takeaways

Category:

Setting Up Automated Home...

Implementing Prometheus...

Leave a Comment Cancel reply

Categories

Related Posts

Setting Up Automated Home Assistant Backup...

Implementing Prometheus Alerting for Silent Cron...

Building N8n Workflows for Cross-platform Package...

About Me

Vipin PG

Tech Expert & Vibe Coder

Creating a bash script to monitor Docker Compose stack health across multiple hosts with centralized logging to Loki

Why I Built This

My Real Setup

The Script I Actually Use

How I Schedule It

What Worked

What Didn’t Work

Key Takeaways

Category:

Setting Up Automated Home...

Implementing Prometheus...

Leave a Comment Cancel reply

Subscribe to Newsletter

Categories

Related Posts

Setting Up Automated Home Assistant Backup...

Implementing Prometheus Alerting for Silent Cron...

Building N8n Workflows for Cross-platform Package...

About Me

Vipin PG