Debugging Tailscale ACL policies that break when mixing subnet routes with exit nodes across different firewall zones

Why I Worked on This

I run a split network setup at home: a Proxmox cluster on one subnet, IoT devices on another, and a DMZ for internet-facing services. I use Tailscale to connect everything without punching holes in my firewall or exposing services directly to the internet.

At first, this worked fine. I could reach my Proxmox VMs from my phone, access my NAS from anywhere, and route traffic through an exit node when I needed to. But when I started advertising subnet routes from multiple devices and using exit nodes at the same time, things broke in confusing ways.

Connections would randomly time out. Some devices could reach each other, others couldn’t. My ACL policy looked correct, but traffic just wouldn’t flow. It took me a week of testing and reading Tailscale’s documentation to understand what was happening.

My Real Setup

Here’s what I was running:

A Proxmox node advertising my main LAN subnet (192.168.1.0/24) as a subnet route
A Raspberry Pi advertising my IoT subnet (192.168.2.0/24) as a subnet route
A VPS running Tailscale as an exit node for outbound internet traffic
My laptop, phone, and a few other devices connecting to Tailscale normally

My firewall zones were configured like this:

LAN zone: trusted, allows all traffic
IoT zone: restricted, only allows outbound DNS and NTP
DMZ zone: isolated, only allows specific inbound ports

I wanted to use Tailscale to bypass these restrictions when I needed to access IoT devices remotely, but still keep them isolated from the LAN when not using Tailscale.

What Broke First

The first problem appeared when I tried to access an IoT device from my laptop while connected to the exit node. The connection would hang, then time out after 30 seconds.

I checked my ACL policy. It looked fine:

{
  "acls": [
    {
      "action": "accept",
      "src": ["autogroup:member"],
      "dst": ["*:*"]
    }
  ]
}

This should have allowed all authenticated users to reach everything. But it didn’t work.

I ran tailscale status on my laptop and saw both the subnet routes and the exit node listed. I ran tailscale ping to test connectivity to the IoT device. The ping worked, but HTTP connections didn’t.

The Real Problem: Overlapping Routes and Exit Node Conflicts

After a lot of testing, I realized the issue wasn’t the ACL policy itself. It was how Tailscale handles traffic when you have both subnet routes and an exit node active at the same time.

When I enabled the exit node, Tailscale added a default route (0.0.0.0/0) to my routing table. This route takes precedence over the more specific subnet routes advertised by my Proxmox node and Raspberry Pi.

So when I tried to reach 192.168.2.5 (an IoT device), my laptop sent the traffic to the exit node instead of the subnet router. The exit node had no idea how to reach 192.168.2.0/24, so the connection failed.

The ACL policy was working correctly. The routing was wrong.

What I Tried (That Didn’t Work)

I tried several things before figuring this out:

Making the ACL policy more explicit: I added specific rules for each subnet, thinking maybe the wildcard wasn’t working. No change.
Restarting Tailscale on all devices: Didn’t help. The routing conflict persisted.
Disabling and re-enabling subnet routes: This temporarily fixed it, but only until I re-enabled the exit node.
Changing the ACL order: I thought maybe the order of rules mattered. It doesn’t, at least not for this issue.

None of these addressed the real problem: the exit node route was overriding the subnet routes.

What Actually Worked

The solution was to use Tailscale’s --accept-routes flag correctly and understand how it interacts with exit nodes.

On my laptop, I ran:

tailscale up --accept-routes --exit-node=my-vps-exit-node

This tells Tailscale to accept both subnet routes and use the exit node, but prioritize subnet routes for traffic that matches them.

I also had to make sure my subnet routers were advertising their routes with --advertise-routes and that I had approved those routes in the Tailscale admin console.

On the Proxmox node:

tailscale up --advertise-routes=192.168.1.0/24 --snat-subnet-routes=false

On the Raspberry Pi:

tailscale up --advertise-routes=192.168.2.0/24 --snat-subnet-routes=false

The --snat-subnet-routes=false flag was important. Without it, Tailscale would NAT the traffic, and my firewall would see all connections as coming from the subnet router’s IP instead of the original Tailscale device. This broke return traffic in some cases.

The ACL Policy That Actually Mattered

Once the routing was fixed, I went back to my ACL policy. The wildcard rule was too permissive for my setup. I wanted to control which devices could reach which subnets, especially since I was mixing trusted and untrusted zones.

Here’s what I ended up with:

{
  "tagOwners": {
    "tag:subnet-router": ["autogroup:admin"],
    "tag:exit-node": ["autogroup:admin"]
  },
  "acls": [
    {
      "action": "accept",
      "src": ["autogroup:member"],
      "dst": ["tag:subnet-router:*"]
    },
    {
      "action": "accept",
      "src": ["autogroup:member"],
      "dst": ["192.168.1.0/24:*"]
    },
    {
      "action": "accept",
      "src": ["autogroup:member"],
      "dst": ["192.168.2.0/24:*"]
    }
  ]
}

I tagged my subnet routers with tag:subnet-router and the exit node with tag:exit-node. This let me write ACL rules that applied to the routers themselves, not just the subnets they advertised.

The explicit subnet rules (192.168.1.0/24 and 192.168.2.0/24) made it clear which networks were accessible. I removed the wildcard rule entirely.

What Still Didn’t Work

Even after fixing the routing, I ran into another issue: devices on the IoT subnet couldn’t initiate connections back to my laptop when I was using the exit node.

This was because of my firewall rules. The IoT zone was configured to block all inbound traffic except replies to outbound connections. When my laptop’s traffic came through the exit node, the firewall saw it as a new connection from an unknown source and dropped it.

I had to add a specific firewall rule to allow traffic from the Tailscale subnet (100.64.0.0/10) to the IoT zone. This wasn’t a Tailscale problem—it was my firewall configuration.

Testing and Verification

To make sure everything was working, I tested these scenarios:

Laptop with exit node enabled, accessing LAN device: Worked. Traffic went through the subnet router, not the exit node.
Laptop with exit node enabled, accessing IoT device: Worked. Same as above.
Laptop with exit node enabled, accessing the internet: Worked. Traffic went through the exit node.
Phone without exit node, accessing LAN device: Worked. Traffic went directly through the subnet router.
IoT device initiating connection to laptop: Failed. This required the firewall rule change.

I used tailscale status and ip route to verify the routing table on my laptop. The subnet routes had higher priority than the default route, which is what I wanted.

Key Takeaways

ACL policies in Tailscale are not the same as routing rules. The ACL controls whether traffic is allowed once it reaches the destination. The routing table controls where the traffic goes in the first place.

When you mix subnet routes with exit nodes, you have to think about route priority. Tailscale does this automatically if you use --accept-routes, but it’s not always obvious what’s happening.

Firewall zones add another layer of complexity. Tailscale traffic bypasses your normal network paths, so your firewall might not recognize it as legitimate traffic. You need to explicitly allow Tailscale’s subnet in your firewall rules.

Tags are useful for organizing devices in your ACL policy, especially when you have multiple subnet routers or exit nodes. They make it easier to write rules that apply to groups of devices instead of individual IPs.

The --snat-subnet-routes=false flag matters more than I expected. Without it, return traffic gets confused because the firewall sees the wrong source IP. This is especially important when you have strict firewall rules between zones.

Testing each scenario separately helped me isolate the problem. I wasted time trying to fix the ACL policy when the real issue was routing and firewall configuration.

Tech Expert & Vibe Coder

Why I Worked on This

My Real Setup

What Broke First

The Real Problem: Overlapping Routes and Exit Node Conflicts

What I Tried (That Didn’t Work)

What Actually Worked

The ACL Policy That Actually Mattered

What Still Didn’t Work

Testing and Verification

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG

Tech Expert & Vibe Coder

Debugging Tailscale ACL policies that break when mixing subnet routes with exit nodes across different firewall zones

Why I Worked on This

My Real Setup

What Broke First

The Real Problem: Overlapping Routes and Exit Node Conflicts

What I Tried (That Didn’t Work)

What Actually Worked

The ACL Policy That Actually Mattered

What Still Didn’t Work

Testing and Verification

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Subscribe to Newsletter

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG