Debugging Docker overlay network packet loss when running distributed applications across multiple Docker Swarm nodes with MTU mismatches

Why I Worked on This

I run a multi-node Docker Swarm cluster at home across three Proxmox hosts. One day, a distributed application I was testing started dropping packets randomly. Pings would succeed, then fail, then succeed again. Services would time out intermittently. The logs showed nothing obvious, and CPU and memory usage were normal.

After ruling out hardware issues and DNS problems, I suspected the overlay network. What I found was an MTU mismatch between my physical network interfaces and the Docker overlay network’s VXLAN encapsulation. This wasn’t immediately obvious because most traffic worked fine—only certain packet sizes triggered the issue.

My Real Setup

I have three Proxmox nodes running Docker in Swarm mode:

Node 1: Dell OptiPlex 7050 (manager node)
Node 2: HP EliteDesk 800 G3 (worker)
Node 3: Lenovo ThinkCentre M720q (worker)

All nodes connect through a Ubiquiti EdgeRouter X with gigabit ethernet. The physical interfaces use the default MTU of 1500 bytes. Docker Swarm creates overlay networks using VXLAN, which adds 50 bytes of overhead for encapsulation.

I was running a distributed Redis cluster and a custom Python application that processed data between nodes. Both used an overlay network I created called app-net.

How I Found the Problem

The first clue was inconsistency. Small requests worked perfectly. Larger payloads sometimes succeeded, sometimes failed. I started by checking basic connectivity:

docker exec -it redis-node-1 ping redis-node-2

Ping worked, but with occasional packet loss—around 5-10%. That ruled out complete network failure but pointed to something specific about packet handling.

I checked the overlay network details:

docker network inspect app-net

The network existed and showed all containers properly attached. No errors in the output. I moved to the host level and ran tcpdump on the physical interface while generating traffic:

sudo tcpdump -i enp0s31f6 -n host 192.168.1.12

I saw fragmented packets and retransmissions. That was the smoking gun. Fragmentation meant packets were too large for the network path, and retransmissions meant they were being dropped somewhere.

Understanding the MTU Problem

Docker’s overlay network uses VXLAN to encapsulate traffic between nodes. VXLAN adds 50 bytes of headers to each packet. If your physical network has an MTU of 1500 bytes, the effective MTU inside the overlay network is 1450 bytes.

When a container sends a 1500-byte packet through the overlay network, Docker tries to encapsulate it, resulting in a 1550-byte packet. This exceeds the physical interface’s MTU, causing the packet to be fragmented or dropped, depending on the DF (Don’t Fragment) flag.

I confirmed this by checking the MTU on the overlay network interface inside a container:

docker exec -it redis-node-1 ip link show eth0

The output showed mtu 1450, which was correct. But my application wasn’t respecting this—it was trying to send larger packets, likely because it was configured with a default MTU of 1500.

What Worked

I fixed this in two steps: adjusting the Docker overlay network MTU and verifying application behavior.

Step 1: Set the Correct MTU on the Overlay Network

I recreated the overlay network with an explicit MTU setting:

docker network rm app-net
docker network create 
  --driver overlay 
  --opt com.docker.network.driver.mtu=1450 
  app-net

This ensured the overlay network explicitly used 1450 bytes, accounting for VXLAN overhead. I redeployed the services:

docker service update --network-rm app-net --network-add app-net redis-cluster
docker service update --network-rm app-net --network-add app-net python-app

Step 2: Verify with iperf3

I deployed two test containers on different nodes and ran iperf3 to measure throughput and packet behavior:

docker run -d --name iperf-server --network app-net networkstatic/iperf3 -s
docker run --rm --network app-net networkstatic/iperf3 -c iperf-server -t 30

After the MTU fix, throughput stabilized and packet loss dropped to zero. Before the fix, I saw 5-10% packet loss and significantly lower throughput.

Step 3: Persistent Configuration

To avoid recreating the network manually every time, I added the MTU setting to my Docker Compose file:

networks:
  app-net:
    driver: overlay
    driver_opts:
      com.docker.network.driver.mtu: 1450

This ensured the setting persisted across deployments.

What Didn’t Work

Before finding the MTU issue, I tried several things that didn’t help:

Increasing kernel buffer sizes: I adjusted net.core.rmem_max and net.core.wmem_max thinking it was a buffering problem. It made no difference because the issue wasn’t buffer exhaustion—it was packet size.
Switching to host networking: I tested running the Redis cluster with --network host to bypass the overlay network entirely. This worked but broke the distributed setup because services couldn’t discover each other across nodes.
Disabling IPv6: I read somewhere that IPv6 can cause issues with Docker networking. I disabled it on all nodes. No change. The problem was specific to packet size, not IP version.
Restarting Docker services: I restarted the Docker daemon on all nodes multiple times. This temporarily cleared some state but didn’t fix the underlying MTU mismatch.

Debugging Tools I Used

These tools were essential for diagnosing the issue:

tcpdump: Captured packets on the physical interface to see fragmentation and retransmissions.
iperf3: Measured throughput and packet loss between containers.
docker network inspect: Verified overlay network configuration and container attachments.
ip link show: Checked MTU settings on both physical and virtual interfaces.
ping with specific packet sizes: ping -M do -s 1450 192.168.1.12 tested whether packets of a specific size could traverse the network without fragmentation.

Key Takeaways

MTU mismatches are subtle. Most traffic works fine, so the problem doesn’t show up in basic tests. You only notice it when applications send larger packets or when network conditions change.

Docker’s overlay network uses VXLAN, which adds 50 bytes of overhead. If your physical network has an MTU of 1500, your overlay network should use 1450. Always set this explicitly when creating overlay networks.

Packet loss and fragmentation are clear signs of MTU issues. Use tcpdump and iperf3 to confirm before changing configurations.

Host networking bypasses overlay issues but breaks service discovery in multi-node setups. It’s not a real solution for distributed applications.

Always test network changes with real traffic, not just pings. Pings use small packets and won’t reveal MTU problems.

Tech Expert & Vibe Coder

Why I Worked on This

My Real Setup

How I Found the Problem

Understanding the MTU Problem

What Worked

Step 1: Set the Correct MTU on the Overlay Network

Step 2: Verify with iperf3

Step 3: Persistent Configuration

What Didn’t Work

Debugging Tools I Used

Key Takeaways

Category:

Debugging Docker Compose...

Building Automated Container...

Leave a Comment Cancel reply

Categories

Related Posts

Debugging Docker Compose Healthcheck Failures...

Building Automated Container Rollback Pipelines: ...

Debugging Container Timezone and Locale...

About Me

Vipin PG

Tech Expert & Vibe Coder

Debugging Docker overlay network packet loss when running distributed applications across multiple Docker Swarm nodes with MTU mismatches

Why I Worked on This

My Real Setup

How I Found the Problem

Understanding the MTU Problem

What Worked

Step 1: Set the Correct MTU on the Overlay Network

Step 2: Verify with iperf3

Step 3: Persistent Configuration

What Didn’t Work

Debugging Tools I Used

Key Takeaways

Category:

Debugging Docker Compose...

Building Automated Container...

Leave a Comment Cancel reply

Subscribe to Newsletter

Categories

Related Posts

Debugging Docker Compose Healthcheck Failures...

Building Automated Container Rollback Pipelines: ...

Debugging Container Timezone and Locale...

About Me

Vipin PG