Why I worked on this
I run a single-node Proxmox box at home that hosts about 40 Docker containers. Most are hobby stuff, but one stack is a Traefik reverse proxy that fronts a handful of public services. One night I noticed the logs were full of 502s and sporadic “no route to host” errors. Inside the containers everything looked fine—health checks passed, CPU was idle, memory usage normal. The failures felt random: one request would work, the next would timeout, then three more would succeed. After an hour of tailing logs I SSH’d to the host, ran dmesg, and saw the line:
nf_conntrack: table full, dropping packet
I hadn’t touched the firewall in months, so I didn’t expect nftables to be the culprit. But once I saw that message, the weird symptoms made sense: the kernel was silently discarding new connections before Traefik or the containers ever saw them.
My real setup
- Proxmox 8.1 on a 2018-era Xeon E-2174G, 64 GB RAM, 1 Gbps symmetric fibre.
- One VM (Debian 12, 5.15 kernel) that runs everything: Docker 24, Traefik 3.0, and about 40 micro-services.
- nftables ruleset I built myself (no firewalld, no ufw). It’s only 30 lines: allow established, ssh, http(s), then drop the rest.
- Traefik listens on 80/443 and routes >1 000 requests/minute to 15 different containers. Some of those containers call each other through Traefik as well (I know, not ideal, but it keeps TLS simple).
What worked (and why)
1. Confirm the table was actually full
cat /proc/sys/net/netfilter/nf_conntrack_count 131060 cat /proc/sys/net/netfilter/nf_conntrack_max 131072
16 entries left—basically zero headroom. I also checked the breakdown:
awk '$4 == "TIME_WAIT" {tw++} $4 == "ESTABLISHED" {est++} END {print "TW:",tw,"EST:",est}' /proc/net/nf_conntrack
TW: 86542 EST: 28432
TIME_WAIT dominated. That told me connections were closing properly but the default 120 s timeout was keeping them in the table too long for this traffic pattern.
2. Bump the limit immediately
I doubled it first to survive the night:
sysctl -w net.netfilter.nf_conntrack_max=262144
Packet drops stopped within seconds. Errors in Traefik dropped to zero. That confirmed the diagnosis, but 262 k still felt tight for a box that might see 5–10 k new connections per minute during peak cron jobs.
3. Make the change persistent and add timeouts
I added a drop-in file /etc/sysctl.d/90-conntrack.conf:
net.netfilter.nf_conntrack_max = 524288 net.netfilter.nf_conntrack_tcp_timeout_time_wait = 30 net.netfilter.nf_conntrack_tcp_timeout_close_wait = 15 net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 15
Then sysctl --system. I picked 512 k because memory is cheap on this box (each entry ≈ 320 bytes, so ~160 MB worst case) and I never want to think about it again.
4. Enable HTTP keep-alive inside the Docker network
Most of my services speak HTTP/1.1 but I’d never enabled keep-alive in their reverse-proxy configs. I edited the three busiest services to reuse connections:
# whoami service (nginx) snippet
upstream traefik {
server traefik:80;
keepalive 32;
}
Connection churn dropped by ~40 %, which lowered the rate new entries were inserted into the table.
5. Added a tiny Prometheus alert
I already run node-exporter in a container. I added a one-line recording rule:
- alert: ConntrackTableUsage expr: node_nf_conntrack_entries / node_nf_conntrack_entries_limit > 0.8
It fired once during the next night’s backup job; headroom was still fine, but now I’ll know before users do.
What didn’t work
- Shortening
tcp_timeout_establishedbelow a day. I tried 12 h first; next morning a long-running websocket kept reconnecting every few minutes. 24 h (default) keeps my home-assistant socket happy. - Switching to nftables “flow offload” objects. Debian 12 ships a 5.15 kernel; the flow table module exists but I couldn’t get it to attach to my bridge interface without breaking Docker’s userland proxy. I gave up after two reboots—raising the limit was easier.
- Disabling conntrack entirely. I attempted
notrackfor the internal Docker bridge. Traffic between containers stopped cold because Docker relies on NAT for the userland-proxy hair-pinning. I reverted in ten minutes.
Key takeaways
- Random 502/time-out errors that don’t appear in application logs often originate one layer below—check
dmesgfirst. - On a box that terminates thousands of short-lived HTTP calls, the default 128 k conntrack table is too small. A single evening of cron scripts plus health checks can chew through it.
- TIME_WAIT entries are the biggest consumer in my workload; cutting that timeout from 120 s to 30 s freed half the table with no side effects.
- Keep-alive between internal services is worth the five-minute config change—it halves connection churn and buys headroom.
- Make the limit big once, monitor it, and move on. Conntrack tuning isn’t glamorous, but it beats 3 a.m. pages.