Tech Expert & Vibe Coder

With 14+ years of experience, I specialize in self-hosting, AI automation, and Vibe Coding – building applications using AI-powered tools like Google Antigravity, Dyad, and Cline. From homelabs to enterprise solutions.

Building a Cron-Based Website Availability Monitor That Saves Plain HTML Snapshots During Outages

Why I Built This

I run several self-hosted services on my Proxmox cluster and Synology NAS. Some are critical—like my DNS resolver and reverse proxy—while others are just personal tools I rely on daily. The problem: I don't always know when something breaks until I try to use it.

Commercial uptime monitors exist, but I didn't want to hand over my internal URLs to a third party. I also didn't need fancy dashboards or multi-channel alerts. I just wanted a simple system that would check if a service was reachable and, if it wasn't, save exactly what the failure looked like.

So I built a cron-based monitor that saves plain HTML snapshots during outages. It runs every few minutes, checks HTTP status codes, and only writes files when something fails. No database. No external dependencies. Just a script, a log file, and raw HTML captures.

My Real Setup

I run this on a Debian VM inside Proxmox. The VM has minimal resources—512MB RAM, one vCPU—because all it does is run curl checks via cron.

The script monitors about a dozen endpoints:

  • Internal services (Proxmox web UI, Synology DSM, Pi-hole admin)
  • Self-hosted apps (n8n, Cronicle, a few Docker containers)
  • External dependencies (my ISP's DNS, Cloudflare's resolver)

I chose Python because I already use it for other automation tasks, and subprocess makes it trivial to call curl with specific options. I didn't use requests or httpx because I wanted full control over timeouts, redirects, and TLS behavior.

The Core Script

The script does four things:

  1. Sends an HTTP request using curl with a 10-second timeout
  2. Checks the HTTP status code
  3. Logs success or failure to a timestamped log file
  4. If the status is not 200-399, saves the full HTML response to a dated directory

Here's the actual code I use:

#!/usr/bin/env python3
import subprocess
import datetime
import os

URL = "https://example.com"
LOG_FILE = os.path.expanduser("~/monitor/status.log")
SNAPSHOT_DIR = os.path.expanduser("~/monitor/snapshots")

os.makedirs(SNAPSHOT_DIR, exist_ok=True)

timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
date_prefix = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")

try:
    result = subprocess.run(
        ["curl", "-s", "-o", "/tmp/response.html", "-w", "%{http_code}", 
         "--max-time", "10", "--connect-timeout", "5", URL],
        capture_output=True,
        text=True,
        timeout=15
    )
    
    status_code = result.stdout.strip()
    
    if status_code and status_code.isdigit():
        code = int(status_code)
        
        if 200 <= code < 400:
            log_entry = f"{timestamp} | {URL} | SUCCESS | {status_code}\n"
        else:
            log_entry = f"{timestamp} | {URL} | FAILURE | {status_code}\n"
            
            # Save HTML snapshot
            snapshot_file = os.path.join(SNAPSHOT_DIR, f"{date_prefix}_{status_code}.html")
            subprocess.run(["cp", "/tmp/response.html", snapshot_file])
    else:
        log_entry = f"{timestamp} | {URL} | ERROR | No status code\n"
        
except subprocess.TimeoutExpired:
    log_entry = f"{timestamp} | {URL} | TIMEOUT | curl exceeded 15s\n"
except Exception as e:
    log_entry = f"{timestamp} | {URL} | ERROR | {str(e)}\n"

with open(LOG_FILE, "a") as log:
    log.write(log_entry)

I run separate instances of this script for each service I monitor. Each one writes to its own log file and snapshot directory.

Cron Configuration

I schedule the checks using crontab. For critical services, I run checks every 2 minutes. For less important ones, every 5 or 10 minutes.

*/2 * * * * /usr/bin/python3 ~/monitor/check_proxmox.py
*/5 * * * * /usr/bin/python3 ~/monitor/check_synology.py
*/10 * * * * /usr/bin/python3 ~/monitor/check_pihole.py

I initially tried running all checks from a single script with a loop, but that caused timing issues when one check hung. Separate cron jobs are more reliable.

What Worked

The HTML snapshots turned out to be more useful than I expected. When my reverse proxy misconfigured itself after a Docker update, the snapshot showed me the exact nginx error page—including the broken upstream reference. I didn't have to reproduce the failure or dig through logs.

Saving only during failures keeps disk usage minimal. Over six months of monitoring, the snapshot directory is under 50MB. Most failures are transient network blips that resolve within minutes.

Using curl instead of a Python HTTP library gave me better control over connection behavior. I can specify exact timeout values, follow or ignore redirects, and handle TLS validation however I need. For internal services with self-signed certs, I add -k to skip verification.

The plain text log format makes it trivial to grep for patterns:

grep "FAILURE" ~/monitor/status.log
grep "2024-01-15" ~/monitor/status.log | grep "TIMEOUT"

I don't need a database or log aggregation tool. The log file is the database.

What Didn't Work

My first version checked only HTTP status codes and logged "up" or "down." That was useless when a service returned 200 but showed an error page. Now I save the full HTML response, which captures application-level failures that return technically valid HTTP responses.

I tried using tail -f to watch the log in real time, but cron's timing isn't precise enough to make that useful. The entries don't appear at exact intervals, and watching a file that updates every few minutes is boring. I check the log manually when I suspect an issue.

I initially set curl's timeout to 30 seconds. That was too long. If a service is unresponsive, I don't want to wait half a minute to log it. I dropped it to 10 seconds for the request and 5 for the initial connection. If it takes longer than that, something is wrong anyway.

I didn't account for DNS failures at first. When my Pi-hole went down, curl just hung until timeout. Now I monitor the DNS resolver itself as a separate endpoint, so I know if the problem is DNS or the actual service.

Log rotation was an afterthought. The log files grew faster than I expected—about 1MB per month per service. I added a simple logrotate config to compress and archive old logs weekly. Not automated initially, which was a mistake.

Key Takeaways

Saving HTML snapshots during failures is more useful than just logging status codes. The raw response often contains the exact error message or misconfiguration.

Separate cron jobs per service are more reliable than a single script checking multiple endpoints. One timeout doesn't block the others.

Plain text logs and file-based storage are enough for small-scale monitoring. I don't need a database unless I'm tracking hundreds of services.

Timeouts matter. Set them short enough to detect real problems but long enough to avoid false positives from slow networks.

This approach works for internal services where you control the infrastructure. For monitoring external websites or APIs you don't control, you'd need to handle rate limiting, authentication, and more complex failure scenarios.

I still use this system daily. It's not sophisticated, but it tells me what I need to know: is the service reachable, and if not, what did the failure look like?