Why I Built a Health Check System for F-Droid Mirrors
I run multiple F-Droid repository mirrors in Docker containers on my Proxmox setup. The problem I kept hitting was silent failures—containers would appear "running" in Docker's eyes, but the actual mirror service inside had stalled, wasn't syncing properly, or had lost network access to upstream repos. Users would hit dead mirrors, and I'd only notice hours later when checking logs manually.
I needed a system that could detect when a mirror was actually broken and restart it automatically, not just check if the container process was alive.
My Setup and Requirements
I host three F-Droid mirrors spread across different containers. Each one:
- Runs a web server (nginx) serving the mirrored repository files
- Has a sync script that pulls updates from upstream F-Droid repos
- Needs to respond to HTTP requests reliably
- Must have recent data—stale mirrors are useless
My initial approach was just using Docker's built-in restart policies, but that only catches container crashes. If the sync process hung or the web server stopped responding while the main process stayed alive, Docker did nothing.
Building the Health Check Logic
I structured the health checks in my docker-compose.yml with three layers of verification:
Layer 1: Basic HTTP Response
First check: can the web server respond at all? I use curl to hit a known endpoint that should always exist in an F-Droid repo—the index file:
healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8080/repo/index-v1.json"] interval: 60s timeout: 10s retries: 3 start_period: 120s
The -f flag makes curl fail on HTTP errors. If the web server is dead or returning 500s, this catches it.
Layer 2: File Freshness Check
An F-Droid mirror that hasn't synced in days is broken even if it serves files. I added a check that verifies the index file was modified recently:
test: ["CMD-SHELL", "test $(( $(date +%s) - $(stat -c %Y /data/repo/index-v1.json) )) -lt 86400"]
This fails if the index file is older than 24 hours. I use CMD-SHELL here because I need shell arithmetic that plain CMD doesn't support.
Layer 3: Combined Check
My actual implementation combines both checks with an AND condition:
test: ["CMD-SHELL", "curl -f http://localhost:8080/repo/index-v1.json && test $(( $(date +%s) - $(stat -c %Y /data/repo/index-v1.json) )) -lt 86400"]
This only passes if both the HTTP response works AND the file is fresh.
Handling the Missing curl Problem
My first attempt failed immediately. The health check kept marking containers unhealthy even though I could manually verify they were fine. After digging through docker inspect output, I found the actual error:
"Output": "sh: 1: curl: not found"
My base image didn't include curl. This is a common trap—you write a health check assuming basic tools exist, but minimal container images strip everything non-essential.
I had two options:
- Install curl in the Dockerfile
- Use a tool that was already present
I chose option 1 because curl's error handling is clearer than alternatives like wget. Added to my Dockerfile:
RUN apk add --no-cache curl
After rebuilding the image, health checks started working properly.
Setting Realistic Timing Parameters
The timing values matter more than I initially thought. My first settings were too aggressive:
interval: 10s timeout: 5s retries: 2 start_period: 30s
This caused false positives. The sync process sometimes takes 15-20 seconds during heavy upstream activity, which would trigger timeouts. Container restarts during active syncs corrupted data.
My current settings reflect actual operational patterns:
- interval: 60s - Checking every minute is frequent enough. F-Droid repos don't change that fast.
- timeout: 10s - Allows for slow disk I/O or network hiccups without false failures.
- retries: 3 - Three consecutive failures means something is actually broken, not just a transient issue.
- start_period: 120s - Initial sync after container start can take 90+ seconds. This grace period prevents premature unhealthy marking.
Automatic Restart Configuration
Health checks alone don't restart containers—they just mark status. I added restart policies that act on health status:
services:
fdroid-mirror-1:
image: my-fdroid-mirror:latest
restart: unless-stopped
healthcheck:
# ... health check config
The unless-stopped policy combined with health checks means Docker will restart the container if it becomes unhealthy, but won't restart it if I manually stopped it.
For more aggressive recovery, I also use depends_on with health conditions for services that rely on the mirrors:
services:
mirror-monitor:
depends_on:
fdroid-mirror-1:
condition: service_healthy
This ensures my monitoring service only starts after mirrors are confirmed healthy.
What Didn't Work
Several approaches failed before I landed on the current setup:
External Monitoring Scripts
I tried running a separate monitoring container that would check mirrors and use Docker API to restart them. This added complexity and introduced new failure points. The monitoring container itself could fail, and managing API permissions was messy.
Checking Only the Main Process
My initial health check just verified the nginx master process existed:
test: ["CMD", "pgrep", "nginx"]
This was useless. The process could be alive but completely unresponsive due to worker process crashes or resource exhaustion.
Overly Complex Multi-Step Checks
I built a health check that verified multiple endpoints, checked disk space, validated repo signatures, and more. It was thorough but slow—taking 30+ seconds to complete. This defeated the purpose since timeouts would trigger before legitimate checks finished.
Monitoring Health Check Results
I track health check outcomes with a simple script that runs via cron:
#!/bin/bash
docker ps --format "table {{.Names}}\t{{.Status}}" | grep -E "unhealthy|starting"
If any mirrors are unhealthy for more than 5 minutes, I get an alert through my existing n8n automation workflow. This catches cases where the automatic restart itself fails.
I also log health check failures to a shared volume that my monitoring stack ingests:
volumes: - ./health-logs:/var/log/health
The health check script appends failures to this log, which helps identify patterns—like if a specific mirror consistently fails at certain times due to upstream maintenance windows.
Real-World Behavior
After three months of running this setup:
- Automatic restarts happen 2-3 times per week across all mirrors
- Most failures are network-related—temporary upstream unavailability
- One mirror had a recurring issue where the sync process would deadlock every 4-5 days. Health checks caught and recovered from this automatically until I fixed the underlying bug.
- False positives dropped to near zero after tuning the timing parameters
The system isn't perfect. Occasionally a restart happens during a legitimate long sync, but the data integrity checks I have in place prevent corruption.
Key Lessons
What I learned building this:
- Health checks must verify actual functionality, not just process existence
- Always test health check commands manually inside the container first
- Timing parameters need to match real operational patterns, not theoretical ideals
- Simple checks that run reliably beat complex checks that introduce their own failure modes
- Log health check results—patterns in failures reveal underlying issues
The setup has made my F-Droid mirrors significantly more reliable. Users rarely hit dead mirrors now, and I spend less time manually restarting containers. The health checks don't prevent all problems, but they catch and recover from the most common failure modes automatically.