Debugging MTU black holes in Tailscale subnets causing silent packet drops for containerized services behind NAT

Why I Had to Debug This

I run several containerized services on my home network behind NAT, with Tailscale handling remote access. Everything worked fine locally, but when I accessed these services through Tailscale from outside my network, some requests would just hang. Not timeout—hang. No errors, no logs, nothing.

The pattern was specific: small requests worked. Large responses (file downloads, API responses with data) would start, then freeze. Sometimes they’d complete after minutes. Sometimes they’d never finish.

This wasn’t a bandwidth issue. My connection could handle the load. It wasn’t a timeout—connections stayed open. It was silent packet loss, and it took me days to realize I was dealing with an MTU black hole.

My Setup and Where It Broke

Here’s what I was running:

Proxmox host advertising subnet routes via Tailscale
Docker containers behind NAT on that host
Tailscale clients on my phone and laptop accessing these services remotely
Services included n8n, a custom API, and a file server

The Tailscale interface on my Proxmox host had an MTU of 1280 by default. My physical interface was 1500. Docker’s bridge network was also 1500. This mismatch, combined with NAT and containerization, created a perfect storm for MTU black holes.

What happens: when a container sends a large packet (say, 1500 bytes), it goes through Docker’s bridge, hits NAT, then reaches the Tailscale interface. If that packet has the “Don’t Fragment” (DF) flag set—which many TCP implementations use—and it’s too big for the Tailscale MTU, it should trigger an ICMP “Packet Too Big” response.

But in my case, those ICMP messages weren’t making it back to the container. Maybe NAT was dropping them. Maybe Docker’s network stack wasn’t handling them correctly. Maybe both. The result: packets vanished silently, TCP retransmissions eventually kicked in, and everything became painfully slow or stalled completely.

How I Confirmed the Problem

I started with tcpdump on the Tailscale interface:

tcpdump -i tailscale0 -n icmp

No ICMP messages. That was suspicious. Then I tested MTU discovery manually from a client:

ping -M do -s 1472 [tailscale-ip]

This sends a 1500-byte packet (1472 + 28 bytes of headers) with the DF flag. It failed. Lowering the size to 1252 bytes (1280 MTU minus headers) worked fine.

That confirmed the MTU ceiling was real. But I needed to see if the problem was consistent across different paths. I tested:

Direct Tailscale connections (worked)
DERP-relayed connections (also worked, surprisingly)
Connections to containers behind NAT (broke)

The issue only appeared when traffic went through Docker NAT before hitting Tailscale. That narrowed it down.

What Didn’t Work

My first attempt was to increase the Tailscale MTU to 1500 to match everything else:

ip link set dev tailscale0 mtu 1500

This fixed the problem immediately—until the Tailscale daemon restarted. Then it reset to 1280. I needed persistence.

I tried setting it in Tailscale’s configuration files, but there’s no native option for MTU in the current version I’m running. I considered patching the daemon, but that felt fragile and would break on updates.

I also tried adjusting the MTU on Docker’s bridge network:

docker network create --opt com.docker.network.driver.mtu=1280 custom-bridge

This helped slightly by preventing oversized packets from leaving Docker in the first place, but it didn’t solve the root issue. It just shifted where fragmentation would happen, and some applications didn’t respect the bridge MTU anyway.

What Actually Worked

I needed the Tailscale interface MTU to be set automatically whenever the interface came up. On Linux, udev rules handle this kind of persistent network configuration.

I created a udev rule at /etc/udev/rules.d/99-tailscale-mtu.rules:

ACTION=="add", SUBSYSTEM=="net", KERNEL=="tailscale0", RUN+="/sbin/ip link set dev tailscale0 mtu 1500"

Then reloaded and triggered udev:

udevadm control --reload-rules
udevadm trigger --subsystem-match=net --action=add

This worked. Every time the Tailscale interface initialized, the MTU was automatically set to 1500. I verified it survived reboots and Tailscale restarts.

After this change, all my containerized services became accessible again without hangs or stalls. File downloads completed normally. API responses came through cleanly.

The Catch I Didn’t Expect

This fix worked for my setup, but it introduced a new constraint: every device on my Tailscale network now needed to support the higher MTU. My phone and laptop were fine, but I have a few IoT devices that don’t allow MTU adjustments.

For those devices, I had to choose: either leave them on the default 1280 and accept some fragmentation, or exclude them from accessing the containerized services. I went with the former, and it’s been mostly fine—those devices don’t transfer large amounts of data anyway.

I also tested this on DERP-relayed connections (when direct connections aren’t possible). Interestingly, the higher MTU caused packet loss in that scenario. DERP seems to have its own MTU constraints, and exceeding them led to dropped packets and degraded performance.

So my final rule: use 1500 MTU only for direct connections where all devices support it. For DERP or mixed environments, stick with 1280.

Key Takeaways

MTU black holes are silent. No errors, no logs—just hanging connections.
Containerization and NAT can block ICMP Path MTU Discovery messages, making the problem worse.
Tailscale defaults to 1280 MTU for compatibility, but that can cause issues if the rest of your network runs at 1500.
Increasing MTU only works if every device in the path supports it. Mixed environments need careful testing.
DERP-relayed connections may not handle higher MTUs well. Test before deploying.
udev rules on Linux provide a clean way to persist network interface settings without patching daemons.

This problem taught me that network layers interact in ways that aren’t always obvious. A setting that works at one level can break things at another, especially when virtualization, containers, and VPNs are all in the mix.

Tech Expert & Vibe Coder

Why I Had to Debug This

My Setup and Where It Broke

How I Confirmed the Problem

What Didn’t Work

What Actually Worked

The Catch I Didn’t Expect

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG

Tech Expert & Vibe Coder

Debugging MTU black holes in Tailscale subnets causing silent packet drops for containerized services behind NAT

Why I Had to Debug This

My Setup and Where It Broke

How I Confirmed the Problem

What Didn’t Work

What Actually Worked

The Catch I Didn’t Expect

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Subscribe to Newsletter

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG