Why I Built WireGuard Configuration Monitoring
I run WireGuard across multiple sites—home, VPS endpoints, and a few remote locations I manage. The problem I kept hitting wasn't setting up peers initially. It was noticing when something changed after the fact.
A peer's endpoint IP shifted because a dynamic DNS entry updated. A public key got rotated during a rebuild and I forgot to update the other side. Someone (me) fat-fingered a configuration change and didn't realize it until connectivity broke hours later.
I needed a way to know when peer configurations drifted from their expected state, ideally before I noticed things weren't working.
What Tailsnitch Does (and Why It Mattered)
I came across Tailsnitch while looking at Tailscale security tooling. It's a Go-based auditor that checks Tailscale configurations for misconfigurations—things like overly broad ACLs, reusable auth keys, or devices without key expiry enabled.
What caught my attention wasn't the Tailscale-specific checks. It was the pattern: periodically pull current state from an API, compare it against expected baseline rules, and flag deviations with severity levels.
I don't use Tailscale in my main setup. I use WireGuard directly because I want full control over the configuration and don't need the coordination layer Tailscale provides. But the audit concept transferred cleanly.
My Implementation for WireGuard
WireGuard doesn't have an API to query. Configuration lives in /etc/wireguard/wg0.conf files or similar, and runtime state comes from wg show output.
I wrote a Python script that:
- Parses WireGuard config files to extract peer definitions (public keys, allowed IPs, endpoints)
- Runs
wg show all dumpto get current runtime state - Compares the two and flags differences
- Stores a baseline snapshot on first run
- On subsequent runs, checks current state against both the config file and the baseline
I run this via cron every 6 hours on each WireGuard host and send results to a shared log directory on my NAS via rsync.
What I Actually Check
The script flags these specific conditions:
- Endpoint changes: If a peer's endpoint IP or port differs from the config file
- Public key mismatches: If the runtime public key doesn't match what's in the config
- Allowed IP drift: If the allowed IPs list changed (indicates someone edited the config without reloading)
- Peer presence: If a peer exists in the config but not in runtime state (down or misconfigured)
- Unknown peers: If
wg showlists a peer not defined in the config file
Each finding gets a severity tag: high for public key or unknown peer issues, medium for endpoint changes, low for allowed IP drift.
The Baseline Problem
Storing a baseline turned out to be necessary but tricky. On first run, the script saves a JSON snapshot of the current config. But if I intentionally change a peer's endpoint, I need to update the baseline or I'll get false positives forever.
I handle this with a --accept-current flag that overwrites the baseline with the current state. After making a legitimate change, I run:
sudo python3 /opt/scripts/wg-audit.py --accept-current
This isn't elegant, but it works. The alternative—automatically accepting any change—defeats the purpose of monitoring.
What Worked
The script caught two real issues in the first month:
- A VPS endpoint changed IPs after a provider migration. The peer was still connected using the old cached endpoint, but new handshakes would have failed. I updated the config before that became a problem.
- I rebuilt a remote Raspberry Pi and regenerated its WireGuard keys without updating the server side. The audit flagged an unknown public key in the handshake attempts, which led me to fix the mismatch.
Both would have caused outages if I'd noticed them only when connectivity broke.
The severity tagging also helped. I get a lot of low-severity findings (mostly allowed IP lists that don't perfectly match due to formatting differences), but I can filter those out and focus on high/medium issues.
What Didn't Work
My first version tried to parse wg show output as plain text using regex. This broke immediately when endpoint formatting varied (IPv6 addresses, missing ports, etc.). I switched to wg show all dump, which outputs tab-separated values. Much more reliable.
I also initially tried to detect "stale" peers by checking the last handshake timestamp. The idea was to flag peers that hadn't communicated in 7+ days. This generated too many false positives—some peers are intentionally idle for weeks (backup routes, rarely-used admin access). I removed that check.
Logging was a mess at first. I was dumping full JSON output to syslog, which made it unreadable. Now I log only a summary (number of findings by severity) to syslog and write detailed JSON to /var/log/wg-audit/.
Integration with Monitoring
I run Uptime Kuma for service monitoring. It doesn't natively parse log files, so I added a simple HTTP endpoint to the audit script.
When run with --http-check, the script:
- Returns HTTP 200 if no high-severity findings exist
- Returns HTTP 500 if high-severity findings are present
- Includes a JSON body with the finding count
Uptime Kuma hits this endpoint every hour. If it gets a 500, I get a notification.
This setup is fragile—if the script crashes, Uptime Kuma doesn't know. But it's better than no alerting, and I haven't had a script failure yet.
Key Takeaways
- WireGuard's simplicity is a strength, but it means you have to build your own monitoring
- Comparing runtime state to config files catches drift that logs alone won't reveal
- Baseline snapshots need manual updates, or you'll drown in false positives
- Parsing
wg showas structured data (dump format) is far more reliable than regex on human-readable output - Severity levels are essential—without them, you can't distinguish real issues from noise
The script isn't polished. It doesn't have a CLI UI, automatic remediation, or fancy dashboards. But it tells me when my WireGuard configs stop matching reality, which is exactly what I needed.