Implementing Tailsnitch-inspired security audits for WireGuard peer configurations to detect unauthorized endpoint changes

Why I Built WireGuard Configuration Monitoring

I run WireGuard across multiple sites—home, VPS endpoints, and a few remote locations I manage. The problem I kept hitting wasn’t setting up peers initially. It was noticing when something changed after the fact.

A peer’s endpoint IP shifted because a dynamic DNS entry updated. A public key got rotated during a rebuild and I forgot to update the other side. Someone (me) fat-fingered a configuration change and didn’t realize it until connectivity broke hours later.

I needed a way to know when peer configurations drifted from their expected state, ideally before I noticed things weren’t working.

What Tailsnitch Does (and Why It Mattered)

I came across Tailsnitch while looking at Tailscale security tooling. It’s a Go-based auditor that checks Tailscale configurations for misconfigurations—things like overly broad ACLs, reusable auth keys, or devices without key expiry enabled.

What caught my attention wasn’t the Tailscale-specific checks. It was the pattern: periodically pull current state from an API, compare it against expected baseline rules, and flag deviations with severity levels.

I don’t use Tailscale in my main setup. I use WireGuard directly because I want full control over the configuration and don’t need the coordination layer Tailscale provides. But the audit concept transferred cleanly.

My Implementation for WireGuard

WireGuard doesn’t have an API to query. Configuration lives in /etc/wireguard/wg0.conf files or similar, and runtime state comes from wg show output.

I wrote a Python script that:

Parses WireGuard config files to extract peer definitions (public keys, allowed IPs, endpoints)
Runs wg show all dump to get current runtime state
Compares the two and flags differences
Stores a baseline snapshot on first run
On subsequent runs, checks current state against both the config file and the baseline

I run this via cron every 6 hours on each WireGuard host and send results to a shared log directory on my NAS via rsync.

What I Actually Check

The script flags these specific conditions:

Endpoint changes: If a peer’s endpoint IP or port differs from the config file
Public key mismatches: If the runtime public key doesn’t match what’s in the config
Allowed IP drift: If the allowed IPs list changed (indicates someone edited the config without reloading)
Peer presence: If a peer exists in the config but not in runtime state (down or misconfigured)
Unknown peers: If wg show lists a peer not defined in the config file

Each finding gets a severity tag: high for public key or unknown peer issues, medium for endpoint changes, low for allowed IP drift.

The Baseline Problem

Storing a baseline turned out to be necessary but tricky. On first run, the script saves a JSON snapshot of the current config. But if I intentionally change a peer’s endpoint, I need to update the baseline or I’ll get false positives forever.

I handle this with a --accept-current flag that overwrites the baseline with the current state. After making a legitimate change, I run:

sudo python3 /opt/scripts/wg-audit.py --accept-current

This isn’t elegant, but it works. The alternative—automatically accepting any change—defeats the purpose of monitoring.

What Worked

The script caught two real issues in the first month:

A VPS endpoint changed IPs after a provider migration. The peer was still connected using the old cached endpoint, but new handshakes would have failed. I updated the config before that became a problem.
I rebuilt a remote Raspberry Pi and regenerated its WireGuard keys without updating the server side. The audit flagged an unknown public key in the handshake attempts, which led me to fix the mismatch.

Both would have caused outages if I’d noticed them only when connectivity broke.

The severity tagging also helped. I get a lot of low-severity findings (mostly allowed IP lists that don’t perfectly match due to formatting differences), but I can filter those out and focus on high/medium issues.

What Didn’t Work

My first version tried to parse wg show output as plain text using regex. This broke immediately when endpoint formatting varied (IPv6 addresses, missing ports, etc.). I switched to wg show all dump, which outputs tab-separated values. Much more reliable.

I also initially tried to detect “stale” peers by checking the last handshake timestamp. The idea was to flag peers that hadn’t communicated in 7+ days. This generated too many false positives—some peers are intentionally idle for weeks (backup routes, rarely-used admin access). I removed that check.

Logging was a mess at first. I was dumping full JSON output to syslog, which made it unreadable. Now I log only a summary (number of findings by severity) to syslog and write detailed JSON to /var/log/wg-audit/.

Integration with Monitoring

I run Uptime Kuma for service monitoring. It doesn’t natively parse log files, so I added a simple HTTP endpoint to the audit script.

When run with --http-check, the script:

Returns HTTP 200 if no high-severity findings exist
Returns HTTP 500 if high-severity findings are present
Includes a JSON body with the finding count

Uptime Kuma hits this endpoint every hour. If it gets a 500, I get a notification.

This setup is fragile—if the script crashes, Uptime Kuma doesn’t know. But it’s better than no alerting, and I haven’t had a script failure yet.

Key Takeaways

WireGuard’s simplicity is a strength, but it means you have to build your own monitoring
Comparing runtime state to config files catches drift that logs alone won’t reveal
Baseline snapshots need manual updates, or you’ll drown in false positives
Parsing wg show as structured data (dump format) is far more reliable than regex on human-readable output
Severity levels are essential—without them, you can’t distinguish real issues from noise

The script isn’t polished. It doesn’t have a CLI UI, automatic remediation, or fancy dashboards. But it tells me when my WireGuard configs stop matching reality, which is exactly what I needed.

Tech Expert & Vibe Coder

Why I Built WireGuard Configuration Monitoring

What Tailsnitch Does (and Why It Mattered)

My Implementation for WireGuard

What I Actually Check

The Baseline Problem

What Worked

What Didn’t Work

Integration with Monitoring

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG

Tech Expert & Vibe Coder

Implementing Tailsnitch-inspired security audits for WireGuard peer configurations to detect unauthorized endpoint changes

Why I Built WireGuard Configuration Monitoring

What Tailsnitch Does (and Why It Mattered)

My Implementation for WireGuard

What I Actually Check

The Baseline Problem

What Worked

What Didn’t Work

Integration with Monitoring

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Subscribe to Newsletter

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG