Hardening Tailscale After State File Encryption Changes: Implementing Custom Encryption Wrappers and ACL Policies

Why I Hardened My Tailscale Setup After the State File Changes

I run Tailscale across multiple environments—Proxmox VMs, Docker containers on my Synology NAS, and a few bare-metal machines. When Tailscale rolled back TPM state file encryption to opt-in on January 7, 2026, I had to rethink my security posture. Not because I was using TPM encryption (I wasn’t—my mixed environment made it impractical), but because the change forced me to acknowledge what I was actually protecting and how.

The state file contains everything needed to impersonate a node: private keys, authentication tokens, network configs. On Linux, that’s /var/lib/tailscale/tailscaled.state. If someone gets disk access without root, they can clone that file and pretend to be my machine. That’s the threat TPM encryption addressed—and the threat I now had to handle differently.

My Real Setup and What I Actually Needed

Here’s what I’m running:

Three Proxmox VMs (Ubuntu) handling DNS, monitoring, and automation
Four Docker containers on Synology DSM 7.2
Two physical machines (one Windows, one Linux)
ACLs managed through Tailscale’s admin console

TPM encryption was never going to work for me. The VMs get snapshotted regularly. The containers restart frequently. My Windows machine has a TPM, but I’ve had BIOS updates reset it twice in the past year, which would have bricked Tailscale each time.

What I needed was:

Protection if someone stole a disk or cloned a VM
Assurance that a compromised node couldn’t access everything
Something that wouldn’t break during routine operations

Custom Encryption Wrapper: What Actually Worked

I wrote a simple wrapper script that encrypts the state file at rest using age (a modern alternative to GPG). The wrapper decrypts on startup, runs Tailscale, then re-encrypts on shutdown. It’s not hardware-backed, but it doesn’t need to be—my threat model is disk theft, not memory extraction by an attacker with root.

Here’s the core of what I implemented on my Linux systems:

#!/bin/bash
STATE_FILE="/var/lib/tailscale/tailscaled.state"
ENCRYPTED_FILE="${STATE_FILE}.age"
KEY_FILE="/root/.tailscale-key"

# Decrypt state file if encrypted version exists
if [ -f "$ENCRYPTED_FILE" ]; then
    age -d -i "$KEY_FILE" -o "$STATE_FILE" "$ENCRYPTED_FILE"
    rm "$ENCRYPTED_FILE"
fi

# Run tailscaled
/usr/sbin/tailscaled "$@" &
TAILSCALE_PID=$!

# On exit, re-encrypt state file
trap 'kill $TAILSCALE_PID; age -r $(cat ${KEY_FILE}.pub) -o "$ENCRYPTED_FILE" "$STATE_FILE"; rm "$STATE_FILE"' EXIT INT TERM

wait $TAILSCALE_PID

I generate the key pair once per machine using age-keygen and store the private key in /root/.tailscale-key with 600 permissions. The public key lives in /root/.tailscale-key.pub.

This approach has limitations:

If the system is compromised while Tailscale is running, the decrypted state file is in memory and on disk
The encryption key is on the same filesystem, so full-disk encryption is still necessary
It adds complexity to startup and shutdown

But it works for my use case. If someone pulls a disk or clones a VM snapshot, they get an encrypted blob. They’d need both the disk and the key file—and if they have root access to get the key, they can already do worse things.

ACL Policies: The Part That Actually Matters

The state file encryption—whether TPM-based or my custom wrapper—only protects against one narrow attack. What matters more is limiting what a compromised node can do. That’s where Tailscale’s ACL policies become critical.

I segment my network into three groups:

Core infrastructure: Proxmox hosts, DNS servers, NAS
Services: Docker containers running n8n, Cronicle, monitoring tools
Clients: Laptops, phones, anything I use to access services

My ACL policy enforces these rules:

{
  "groups": {
    "group:infrastructure": ["tag:proxmox", "tag:dns", "tag:nas"],
    "group:services": ["tag:automation", "tag:monitoring"],
    "group:clients": ["[email protected]"]
  },
  "acls": [
    {
      "action": "accept",
      "src": ["group:clients"],
      "dst": ["group:services:*"]
    },
    {
      "action": "accept",
      "src": ["group:services"],
      "dst": ["group:infrastructure:22,443"]
    },
    {
      "action": "accept",
      "src": ["group:infrastructure"],
      "dst": ["*:*"]
    }
  ],
  "ssh": [
    {
      "action": "accept",
      "src": ["group:clients"],
      "dst": ["group:infrastructure"],
      "users": ["root"]
    }
  ]
}

This means:

My laptop can access service containers but not SSH into infrastructure
Service containers can only reach infrastructure on specific ports
Infrastructure nodes have full access (they need it for backups and monitoring)
SSH access requires my user account, not just network access

If someone clones a service container’s state file and spins up an imposter node, they get access to other services—but not to my core infrastructure. They can’t SSH into Proxmox or my NAS. That’s the real defense.

What Didn’t Work

I tried a few things that failed:

Attempt 1: Systemd service with pre/post scripts
I initially tried using systemd’s ExecStartPre and ExecStopPost to handle encryption/decryption. This broke because systemd killed the pre-script before Tailscale fully started, leaving the state file unencrypted. I had to move to a wrapper that keeps the encryption logic in the same process.

Attempt 2: Encrypted filesystem for /var/lib/tailscale
I mounted an encrypted LUKS partition at /var/lib/tailscale and tried unlocking it at boot with a key file. This worked on bare metal but broke in VMs—Proxmox snapshots don’t capture LUKS headers correctly, and restoring from backup became a nightmare. I abandoned this approach after the second failed restore.

Attempt 3: Tailscale’s built-in ACL testing
Tailscale has an ACL test mode (tailscale debug acl-test), but it only validates syntax. It doesn’t simulate actual traffic or show you what a compromised node could reach. I ended up using tailscale ping and nc from each node type to manually verify isolation. Tedious, but necessary.

Key Takeaways

State file encryption is a narrow defense. It protects against disk theft without root compromise—a scenario that’s rare in practice. What matters more:

ACL policies define blast radius. If a node is compromised, strict ACLs limit what the attacker can reach.
Full-disk encryption is still required. My custom state file encryption only works if the underlying disk is also encrypted (LUKS on Linux, BitLocker on Windows).
TPM encryption is impractical for mixed environments. VMs, containers, and frequent hardware changes make TPM-based protection unreliable.
Test your ACLs with real traffic. Syntax validation isn’t enough—manually verify that nodes can’t reach what they shouldn’t.
Simplicity matters for long-term maintenance. My wrapper script is 30 lines. It’s easy to audit, debug, and modify. Complex solutions break during upgrades.

I don’t run TPM encryption. I don’t need to. My custom wrapper handles the narrow threat it addressed, and my ACL policies handle the broader one. That’s the real lesson from Tailscale’s rollback: hardware-backed security sounds good until it breaks your systems. Pragmatic security that you can actually maintain beats theoretical perfection.

Tech Expert & Vibe Coder

Why I Hardened My Tailscale Setup After the State File Changes

My Real Setup and What I Actually Needed

Custom Encryption Wrapper: What Actually Worked

ACL Policies: The Part That Actually Matters

What Didn’t Work

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG

Tech Expert & Vibe Coder

Hardening Tailscale After State File Encryption Changes: Implementing Custom Encryption Wrappers and ACL Policies

Why I Hardened My Tailscale Setup After the State File Changes

My Real Setup and What I Actually Needed

Custom Encryption Wrapper: What Actually Worked

ACL Policies: The Part That Actually Matters

What Didn’t Work

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Subscribe to Newsletter

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG