Implementing cron-based health checks for self-hosted services with ntfy push notifications and automatic docker container restarts on failure

Why I Built This

I run about 15 Docker containers across my Proxmox cluster and Synology NAS. Most of them just work, but some have this annoying habit of silently dying. A backup script stops running. A monitoring container crashes. An API service hangs without logging anything useful.

I’d find out days later when I actually needed the service. That’s embarrassing when it’s your own infrastructure.

I looked at Healthchecks.io first—it’s well-designed and handles cron monitoring properly. But I didn’t want to send pings to an external service for internal infrastructure checks. I also didn’t want to self-host their full Django application just for basic monitoring.

What I needed was simple: check if containers are responding, get notified immediately if they’re not, and restart them automatically. All running locally.

My Actual Setup

Here’s what I’m working with:

Proxmox VE 8.x running multiple LXC containers and VMs
Docker hosts spread across different machines
ntfy running in a container for push notifications (I already use this for other alerts)
Standard cron on the Docker hosts
Shell scripts with curl for health checks

I chose ntfy because I already had it running, and it sends notifications to my phone without requiring API keys or external services. It’s just HTTP POST requests to a local endpoint.

How I Check Container Health

Most health check tutorials assume your service has a built-in health endpoint. Mine don’t. So I check three things:

1. Is the container running?

#!/bin/bash
CONTAINER="n8n"

if ! docker ps --format '{{.Names}}' | grep -q "^${CONTAINER}$"; then
    echo "Container ${CONTAINER} is not running"
    exit 1
fi

This catches containers that have stopped or crashed completely.

2. Is it responding on its port?

#!/bin/bash
SERVICE_URL="http://localhost:5678"

if ! curl -f -s -m 10 "${SERVICE_URL}" > /dev/null; then
    echo "Service at ${SERVICE_URL} is not responding"
    exit 1
fi

The -f flag makes curl fail on HTTP errors. The -m 10 sets a 10-second timeout. I learned to add the timeout after a hung container made the script wait indefinitely.

3. Can it actually do something?

For services with APIs, I do a simple operation:

#!/bin/bash
API_RESPONSE=$(curl -s -m 10 "http://localhost:8080/api/health")

if [[ "${API_RESPONSE}" != *"ok"* ]]; then
    echo "API health check failed"
    exit 1
fi

This caught cases where the container was running and the port was open, but the application inside had deadlocked.

Sending Notifications Through ntfy

When a check fails, I send a notification immediately:

#!/bin/bash
NTFY_URL="http://ntfy.local:8080/monitoring"
CONTAINER="n8n"

send_notification() {
    local title="$1"
    local message="$2"
    local priority="${3:-default}"
    
    curl -H "Title: ${title}" 
         -H "Priority: ${priority}" 
         -d "${message}" 
         "${NTFY_URL}"
}

# Run health checks here...
if [ $? -ne 0 ]; then
    send_notification 
        "Container Failed: ${CONTAINER}" 
        "Health check failed at $(date)" 
        "urgent"
    exit 1
fi

The priority field makes my phone actually alert me instead of just logging the notification silently. I use “urgent” for failures and “default” for recovery notifications.

Automatic Container Restarts

Here’s where I had to make a decision: restart immediately or wait?

I tried immediate restarts first. Bad idea. Some containers fail during deployment or updates, and the script would restart them mid-update, corrupting data.

My current approach: check twice, five minutes apart, then restart.

#!/bin/bash
CONTAINER="n8n"
NTFY_URL="http://ntfy.local:8080/monitoring"
FAILURE_FLAG="/tmp/healthcheck_${CONTAINER}_failed"

run_health_check() {
    # All the health check logic from above
    # Returns 0 on success, 1 on failure
}

if ! run_health_check; then
    if [ -f "${FAILURE_FLAG}" ]; then
        # Second consecutive failure - restart
        send_notification 
            "Restarting ${CONTAINER}" 
            "Two consecutive failures detected" 
            "urgent"
        
        docker restart "${CONTAINER}"
        
        sleep 30
        
        if run_health_check; then
            send_notification 
                "${CONTAINER} Recovered" 
                "Container restarted successfully" 
                "default"
            rm -f "${FAILURE_FLAG}"
        else
            send_notification 
                "${CONTAINER} Restart Failed" 
                "Container still unhealthy after restart" 
                "urgent"
        fi
    else
        # First failure - just flag it
        touch "${FAILURE_FLAG}"
        send_notification 
            "${CONTAINER} Health Check Failed" 
            "Will restart if next check fails" 
            "high"
    fi
else
    # Health check passed - clear any failure flags
    rm -f "${FAILURE_FLAG}"
fi

The flag file approach is crude but works reliably. I considered using a proper state file with timestamps, but this is simpler and has never failed me.

Setting Up the Cron Jobs

I run health checks every 5 minutes for critical services, every 15 minutes for everything else:

# Critical services - every 5 minutes
*/5 * * * * /opt/scripts/healthcheck_n8n.sh >> /var/log/healthchecks/n8n.log 2>&1

# Standard services - every 15 minutes  
*/15 * * * * /opt/scripts/healthcheck_syncthing.sh >> /var/log/healthchecks/syncthing.log 2>&1
*/15 * * * * /opt/scripts/healthcheck_homeassistant.sh >> /var/log/healthchecks/homeassistant.log 2>&1

I log everything because when something breaks at 3 AM, I want to see the full timeline. The logs showed me that one container was failing health checks for exactly 2 minutes every night at midnight—turned out to be an internal maintenance job that locked the database.

What Didn’t Work

Docker’s built-in health checks

Docker has HEALTHCHECK instructions in Dockerfiles. I tried using those with docker inspect to check container health. The problem: most of my containers don’t have health checks defined, and adding them means rebuilding images or maintaining custom Dockerfiles.

External health checks work with any container, regardless of how it was built.

Monitoring container stats

I tried checking CPU and memory usage as health indicators. A container using 0% CPU must be dead, right? Wrong. Some of my services are event-driven and sit idle most of the time. Others spike to 100% CPU during normal operation.

Resource monitoring is useful for capacity planning, not health checks.

Single-check restarts

As mentioned earlier, restarting on the first failure caused problems during updates and created restart loops when a service had a persistent configuration issue.

Checking too frequently

I initially ran checks every minute. This generated too many notifications during legitimate issues (like when I’m deliberately restarting something) and put unnecessary load on the services being checked.

Five minutes is frequent enough to catch real problems quickly but slow enough to avoid false positives.

Real Problems This Caught

Within the first week, the health checks caught:

My n8n container dying every few days due to a memory leak (fixed by adding a memory limit and automatic restart policy)
A DNS container that stopped responding but didn’t crash (never figured out why, but now it restarts when it happens)
A backup script that silently failed when the destination disk filled up
Network issues between containers that made services unreachable even though they were running

The notification about the DNS container came through while I was traveling. I was able to SSH in and check the logs before users (my family) noticed anything was wrong.

Current Limitations

This setup doesn’t handle:

Containers that need specific startup sequences or dependencies
Stateful services where restart order matters
Problems that require more than a simple restart (like full data corruption)
Cascading failures where one service depends on another

For those cases, I still get the notification, but I have to intervene manually.

I also don’t check services running on my Synology NAS the same way because I don’t have direct shell access configured there. Those still need manual monitoring.

Key Takeaways

Simple health checks with cron, curl, and ntfy work well for small self-hosted setups. You don’t need a complex monitoring platform to catch most container failures.

Checking twice before restarting prevents false positives and avoids making problems worse during updates.

Logging everything is worth the disk space. When something breaks, having the timeline matters more than saving a few megabytes.

External health checks are more flexible than Docker’s built-in HEALTHCHECK because they work with any container and can test actual functionality, not just process liveness.

The notification part is as important as the detection. Getting alerted immediately means I can fix things before they become bigger problems or before users notice.

Tech Expert & Vibe Coder

Why I Built This

My Actual Setup

How I Check Container Health

1. Is the container running?

2. Is it responding on its port?

3. Can it actually do something?

Sending Notifications Through ntfy

Automatic Container Restarts

Setting Up the Cron Jobs

What Didn’t Work

Docker’s built-in health checks

Monitoring container stats

Single-check restarts

Checking too frequently

Real Problems This Caught

Current Limitations

Key Takeaways

Category:

Setting Up Automated Home...

Implementing Prometheus...

Leave a Comment Cancel reply

Categories

Related Posts

Setting Up Automated Home Assistant Backup...

Implementing Prometheus Alerting for Silent Cron...

Building N8n Workflows for Cross-platform Package...

About Me

Vipin PG

Tech Expert & Vibe Coder

Implementing cron-based health checks for self-hosted services with ntfy push notifications and automatic docker container restarts on failure

Why I Built This

My Actual Setup

How I Check Container Health

1. Is the container running?

2. Is it responding on its port?

3. Can it actually do something?

Sending Notifications Through ntfy

Automatic Container Restarts

Setting Up the Cron Jobs

What Didn’t Work

Docker’s built-in health checks

Monitoring container stats

Single-check restarts

Checking too frequently

Real Problems This Caught

Current Limitations

Key Takeaways

Category:

Setting Up Automated Home...

Implementing Prometheus...

Leave a Comment Cancel reply

Subscribe to Newsletter

Categories

Related Posts

Setting Up Automated Home Assistant Backup...

Implementing Prometheus Alerting for Silent Cron...

Building N8n Workflows for Cross-platform Package...

About Me

Vipin PG