Creating a Docker Compose health check system that monitors F-Droid repository mirrors with automatic container restarts

Why I Built a Health Check System for F-Droid Mirrors

I run multiple F-Droid repository mirrors in Docker containers on my Proxmox setup. The problem I kept hitting was silent failures—containers would appear “running” in Docker’s eyes, but the actual mirror service inside had stalled, wasn’t syncing properly, or had lost network access to upstream repos. Users would hit dead mirrors, and I’d only notice hours later when checking logs manually.

I needed a system that could detect when a mirror was actually broken and restart it automatically, not just check if the container process was alive.

My Setup and Requirements

I host three F-Droid mirrors spread across different containers. Each one:

Runs a web server (nginx) serving the mirrored repository files
Has a sync script that pulls updates from upstream F-Droid repos
Needs to respond to HTTP requests reliably
Must have recent data—stale mirrors are useless

My initial approach was just using Docker’s built-in restart policies, but that only catches container crashes. If the sync process hung or the web server stopped responding while the main process stayed alive, Docker did nothing.

Building the Health Check Logic

I structured the health checks in my docker-compose.yml with three layers of verification:

Layer 1: Basic HTTP Response

First check: can the web server respond at all? I use curl to hit a known endpoint that should always exist in an F-Droid repo—the index file:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8080/repo/index-v1.json"]
  interval: 60s
  timeout: 10s
  retries: 3
  start_period: 120s

The -f flag makes curl fail on HTTP errors. If the web server is dead or returning 500s, this catches it.

Layer 2: File Freshness Check

An F-Droid mirror that hasn’t synced in days is broken even if it serves files. I added a check that verifies the index file was modified recently:

test: ["CMD-SHELL", "test $(( $(date +%s) - $(stat -c %Y /data/repo/index-v1.json) )) -lt 86400"]

This fails if the index file is older than 24 hours. I use CMD-SHELL here because I need shell arithmetic that plain CMD doesn’t support.

Layer 3: Combined Check

My actual implementation combines both checks with an AND condition:

test: ["CMD-SHELL", "curl -f http://localhost:8080/repo/index-v1.json && test $(( $(date +%s) - $(stat -c %Y /data/repo/index-v1.json) )) -lt 86400"]

This only passes if both the HTTP response works AND the file is fresh.

Handling the Missing curl Problem

My first attempt failed immediately. The health check kept marking containers unhealthy even though I could manually verify they were fine. After digging through docker inspect output, I found the actual error:

"Output": "sh: 1: curl: not found"

My base image didn’t include curl. This is a common trap—you write a health check assuming basic tools exist, but minimal container images strip everything non-essential.

I had two options:

Install curl in the Dockerfile
Use a tool that was already present

I chose option 1 because curl’s error handling is clearer than alternatives like wget. Added to my Dockerfile:

RUN apk add --no-cache curl

After rebuilding the image, health checks started working properly.

Setting Realistic Timing Parameters

The timing values matter more than I initially thought. My first settings were too aggressive:

interval: 10s
timeout: 5s
retries: 2
start_period: 30s

This caused false positives. The sync process sometimes takes 15-20 seconds during heavy upstream activity, which would trigger timeouts. Container restarts during active syncs corrupted data.

My current settings reflect actual operational patterns:

interval: 60s – Checking every minute is frequent enough. F-Droid repos don’t change that fast.
timeout: 10s – Allows for slow disk I/O or network hiccups without false failures.
retries: 3 – Three consecutive failures means something is actually broken, not just a transient issue.
start_period: 120s – Initial sync after container start can take 90+ seconds. This grace period prevents premature unhealthy marking.

Automatic Restart Configuration

Health checks alone don’t restart containers—they just mark status. I added restart policies that act on health status:

services:
  fdroid-mirror-1:
    image: my-fdroid-mirror:latest
    restart: unless-stopped
    healthcheck:
      # ... health check config

The unless-stopped policy combined with health checks means Docker will restart the container if it becomes unhealthy, but won’t restart it if I manually stopped it.

For more aggressive recovery, I also use depends_on with health conditions for services that rely on the mirrors:

services:
  mirror-monitor:
    depends_on:
      fdroid-mirror-1:
        condition: service_healthy

This ensures my monitoring service only starts after mirrors are confirmed healthy.

What Didn’t Work

Several approaches failed before I landed on the current setup:

External Monitoring Scripts

I tried running a separate monitoring container that would check mirrors and use Docker API to restart them. This added complexity and introduced new failure points. The monitoring container itself could fail, and managing API permissions was messy.

Checking Only the Main Process

My initial health check just verified the nginx master process existed:

test: ["CMD", "pgrep", "nginx"]

This was useless. The process could be alive but completely unresponsive due to worker process crashes or resource exhaustion.

Overly Complex Multi-Step Checks

I built a health check that verified multiple endpoints, checked disk space, validated repo signatures, and more. It was thorough but slow—taking 30+ seconds to complete. This defeated the purpose since timeouts would trigger before legitimate checks finished.

Monitoring Health Check Results

I track health check outcomes with a simple script that runs via cron:

#!/bin/bash
docker ps --format "table {{.Names}}t{{.Status}}" | grep -E "unhealthy|starting"

If any mirrors are unhealthy for more than 5 minutes, I get an alert through my existing n8n automation workflow. This catches cases where the automatic restart itself fails.

I also log health check failures to a shared volume that my monitoring stack ingests:

volumes:
  - ./health-logs:/var/log/health

The health check script appends failures to this log, which helps identify patterns—like if a specific mirror consistently fails at certain times due to upstream maintenance windows.

Real-World Behavior

After three months of running this setup:

Automatic restarts happen 2-3 times per week across all mirrors
Most failures are network-related—temporary upstream unavailability
One mirror had a recurring issue where the sync process would deadlock every 4-5 days. Health checks caught and recovered from this automatically until I fixed the underlying bug.
False positives dropped to near zero after tuning the timing parameters

The system isn’t perfect. Occasionally a restart happens during a legitimate long sync, but the data integrity checks I have in place prevent corruption.

Key Lessons

What I learned building this:

Health checks must verify actual functionality, not just process existence
Always test health check commands manually inside the container first
Timing parameters need to match real operational patterns, not theoretical ideals
Simple checks that run reliably beat complex checks that introduce their own failure modes
Log health check results—patterns in failures reveal underlying issues

The setup has made my F-Droid mirrors significantly more reliable. Users rarely hit dead mirrors now, and I spend less time manually restarting containers. The health checks don’t prevent all problems, but they catch and recover from the most common failure modes automatically.

Tech Expert & Vibe Coder

Why I Built a Health Check System for F-Droid Mirrors

My Setup and Requirements

Building the Health Check Logic

Layer 1: Basic HTTP Response

Layer 2: File Freshness Check

Layer 3: Combined Check

Handling the Missing curl Problem

Setting Realistic Timing Parameters

Automatic Restart Configuration

What Didn’t Work

External Monitoring Scripts

Checking Only the Main Process

Overly Complex Multi-Step Checks

Monitoring Health Check Results

Real-World Behavior

Key Lessons

Category:

Debugging Docker Compose...

Building Automated Container...

Leave a Comment Cancel reply

Categories

Related Posts

Debugging Docker Compose Healthcheck Failures...

Building Automated Container Rollback Pipelines: ...

Debugging Container Timezone and Locale...

About Me

Vipin PG

Tech Expert & Vibe Coder

Creating a Docker Compose health check system that monitors F-Droid repository mirrors with automatic container restarts

Why I Built a Health Check System for F-Droid Mirrors

My Setup and Requirements

Building the Health Check Logic

Layer 1: Basic HTTP Response

Layer 2: File Freshness Check

Layer 3: Combined Check

Handling the Missing curl Problem

Setting Realistic Timing Parameters

Automatic Restart Configuration

What Didn’t Work

External Monitoring Scripts

Checking Only the Main Process

Overly Complex Multi-Step Checks

Monitoring Health Check Results

Real-World Behavior

Key Lessons

Category:

Debugging Docker Compose...

Building Automated Container...

Leave a Comment Cancel reply

Subscribe to Newsletter

Categories

Related Posts

Debugging Docker Compose Healthcheck Failures...

Building Automated Container Rollback Pipelines: ...

Debugging Container Timezone and Locale...

About Me

Vipin PG