Why I Set This Up
I run multiple homelab nodes—some Proxmox VMs, a few LXC containers, and a Synology NAS—all hosting duplicate services for redundancy and experimentation. The problem was simple: I needed a single entry point for SSH access and database connections without manually tracking which node was up or which IP to use. Hard-coding IPs into scripts and configs was brittle. When I moved a service or a node went down, everything broke.
I’d used Nginx for HTTP load balancing before, but I wanted something similar for raw TCP traffic—SSH on port 22, PostgreSQL on 5432, and MySQL on 3306. The Nginx stream module does exactly this: Layer 4 load balancing without touching HTTP at all.
My Real Setup
I run Nginx in an LXC container on Proxmox. The container has a static IP on my homelab network and acts as the gateway for all TCP services I want to distribute. Behind it, I have:
- Three Ubuntu VMs running identical PostgreSQL instances (for testing replication and failover)
- Two Debian containers running SSH servers (one for backups, one for general access)
- A MySQL instance on my Synology NAS
The Nginx container doesn’t terminate TLS or inspect traffic. It just forwards TCP streams to backend servers based on which port a client connects to.
Configuration I Actually Used
I added a new file under /etc/nginx/stream.conf.d/ and included it from the main nginx.conf with a stream { include /etc/nginx/stream.conf.d/*.conf; } block.
Here’s the SSH load balancer config I wrote:
upstream ssh_backends {
least_conn;
server 192.168.1.101:22 max_fails=2 fail_timeout=10s;
server 192.168.1.102:22 max_fails=2 fail_timeout=10s;
}
server {
listen 2222;
proxy_pass ssh_backends;
proxy_connect_timeout 5s;
}
I used least_conn because SSH sessions vary in duration—some are quick commands, others are long-running tunnels. Round-robin would send new connections to a node already handling a heavy session.
For PostgreSQL, I did something similar but kept it on port 5432:
upstream postgres_backends {
server 192.168.1.111:5432 max_fails=3 fail_timeout=15s;
server 192.168.1.112:5432 max_fails=3 fail_timeout=15s;
server 192.168.1.113:5432 max_fails=3 fail_timeout=15s;
}
server {
listen 5432;
proxy_pass postgres_backends;
}
I didn’t use least_conn here because my database clients are short-lived connection pools from n8n and Cronicle. Round-robin was fine.
Why I Didn’t Use Active Health Checks
Nginx Open Source doesn’t have built-in active health checks for the stream module. It only marks a backend as down after max_fails connection failures within fail_timeout. I considered Nginx Plus for this, but I didn’t want to pay for a license just to get active probes.
Instead, I wrote a small script that runs every minute via cron. It attempts a TCP connection to each backend and removes dead nodes from the upstream block by regenerating the config and reloading Nginx. It’s hacky but works for my scale.
What Worked
Once configured, I pointed my SSH client to nginx-lb.local:2222 and it connected to whichever backend had fewer active sessions. When I killed one of the SSH containers, the next connection went to the other one automatically. No manual intervention.
For PostgreSQL, my n8n workflows now connect to a single IP instead of three different ones. If one database node goes down during testing, the connection pool retries and hits a healthy node. This removed a lot of fragility from my automation setup.
The proxy_connect_timeout setting was important. Without it, Nginx would wait too long when trying a dead backend, and clients would time out. Setting it to 5 seconds made failover feel instant.
Preserving Client IPs
I didn’t need this for SSH, but for PostgreSQL logs, I wanted to see the real client IP, not the Nginx container’s IP. I enabled the PROXY protocol on the Nginx side:
server {
listen 5432 proxy_protocol;
proxy_pass postgres_backends;
}
Then I configured PostgreSQL to accept PROXY protocol connections by adding hostproxy entries in pg_hba.conf. This worked, but only because I control both sides. If you’re load balancing a service that doesn’t understand PROXY protocol, this won’t help.
What Didn’t Work
UDP for DNS
I tried load balancing my Pi-hole instances over UDP using the same approach. It half-worked—queries were distributed, but DNS is stateless and retries are built into clients. When a backend was slow or down, clients would retry and sometimes hit the same dead node again before Nginx marked it as failed. The experience was inconsistent.
I ended up using round-robin DNS in my router instead. Not as elegant, but more reliable for my use case.
Session Stickiness for MySQL
I assumed I could use hash $remote_addr consistent; to pin clients to a specific MySQL backend. This worked, but it caused problems when I restarted a backend. The hash would still send traffic there until max_fails was hit, and my application saw connection errors during that window.
I switched back to round-robin and accepted that MySQL connections aren’t sticky. My apps use short-lived connections anyway, so it didn’t matter in practice.
Reloading Configs Without Dropping Connections
When I run nginx -s reload, existing TCP streams stay open, but new connections briefly queue while the config reloads. For SSH, this was fine—worst case, a connection attempt took an extra second. For database connections under load, it caused occasional timeouts.
I worked around this by scheduling config reloads during low-traffic windows and keeping the changes small. There’s no perfect solution here without something like a control plane that can do zero-downtime updates.
Key Takeaways
- The Nginx stream module is straightforward for TCP load balancing if you don’t need HTTP-level features.
- Passive health checks (
max_failsandfail_timeout) are good enough for small setups. Active checks require Nginx Plus or external tooling. least_connis better for long-lived or variable-duration connections like SSH. Round-robin works fine for short-lived database queries.- PROXY protocol is useful if you control both the load balancer and the backend, but many services don’t support it.
- UDP load balancing is tricky because of retries and statelessness. Use it cautiously.
- Config reloads are not zero-downtime for TCP streams. Plan accordingly.
This setup removed a lot of manual IP management from my homelab and made failover automatic. It’s not enterprise-grade, but it’s reliable enough for my needs and taught me how Layer 4 load balancing actually behaves under real conditions.