Building a Multi-Node Jellyfin Cluster with HAProxy Session Persistence to Handle 4K HDR Transcoding Spikes

Why I Built This (And Why I Stopped)

I wanted to solve a real problem: my single Jellyfin instance would choke when multiple users tried to transcode 4K HDR content at the same time. The CPU would spike, playback would stutter, and the whole server would become unresponsive. I thought the solution was obvious—build a multi-node cluster with HAProxy handling session persistence so each user’s stream would stick to one node.

I spent weeks on this. I got it working. And then I realized it was the wrong approach entirely.

My Real Setup

I run Jellyfin in an LXC container on Proxmox, backed by a Synology NAS for media storage. The initial setup was straightforward:

Single Jellyfin instance (Ubuntu 18.04 LXC)
Intel CPU with QuickSync for hardware transcoding
HAProxy on pfSense for reverse proxy
NFS mounts from Synology for media access

When I decided to build the cluster, I spun up three identical Jellyfin nodes, each with access to the same media library via NFS. The architecture looked clean on paper:

HAProxy frontend listening on ports 80/443
Three Jellyfin backends (192.168.50.111, .112, .113)
Session persistence using source IP hashing
Health checks via /health endpoint

I configured HAProxy with this backend setup:

backend jellyfin
    balance source
    hash-type consistent
    option httpchk
    option forwardfor
    http-check send meth GET uri /health
    http-check expect string Healthy
    server jellyfin1 192.168.50.111:8096 check
    server jellyfin2 192.168.50.112:8096 check
    server jellyfin3 192.168.50.113:8096 check

The balance source directive was supposed to ensure that each client IP always hit the same backend node. This mattered because Jellyfin’s transcoding sessions are stateful—if a client switches nodes mid-stream, the transcode job dies and playback fails.

What Actually Happened

The cluster worked, technically. HAProxy distributed connections. Session persistence mostly held. Each node could transcode independently. But the problems started immediately:

Database Synchronization Was a Nightmare

Jellyfin stores its database in SQLite by default. I tried running three nodes against the same NFS-mounted database file. This was a mistake. SQLite locks the entire database for writes, and over NFS, the locking behavior became unpredictable. Sometimes writes would succeed. Sometimes they’d hang. Sometimes the database would corrupt.

I switched to having each node maintain its own local database, with a script to sync them every few minutes. This created new problems: watch status would drift between nodes, new library items wouldn’t appear consistently, and user settings would occasionally revert.

Transcode Cache Collisions

Each Jellyfin node creates temporary transcode files. I initially pointed all nodes at a shared transcode directory on NFS. Bad idea. Nodes would occasionally try to write to the same temp file, causing transcode failures. I moved to local transcode directories, but this meant wasted disk space and CPU cycles—if two users on different nodes watched the same 4K file, both nodes would transcode it separately.

Session Persistence Wasn’t Reliable Enough

HAProxy’s source IP hashing worked most of the time, but not always. If a client’s IP changed (mobile users switching between WiFi and cellular), the session would break. If HAProxy detected a node as unhealthy and failed over, the client would lose their transcode session. If a user paused for too long and the connection dropped, resuming would sometimes hit a different node.

I tried cookie-based persistence, but Jellyfin’s web client and mobile apps didn’t always respect the cookies consistently.

Hardware Transcoding Didn’t Scale

Each node had access to the same Intel QuickSync device via VA-API. But QuickSync has limits—it can handle maybe 10-15 simultaneous transcodes depending on resolution and codec. Spreading users across three nodes didn’t increase this limit. It just meant each node was underutilized most of the time, and when transcoding demand spiked, all three nodes would hit their limits simultaneously.

What I Should Have Done Instead

After months of fighting this setup, I tore it down and went back to a single Jellyfin instance with these changes:

Upgraded to a CPU with better QuickSync support (11th gen Intel)
Moved transcoding to a dedicated NVMe drive instead of NFS
Set strict transcode limits per user
Encouraged direct play wherever possible by optimizing media files upfront

The single-node setup has been far more reliable. No database sync issues. No session persistence problems. No wasted transcoding work. And when the server does hit its limits, I know exactly where the bottleneck is.

Why Multi-Node Jellyfin Doesn’t Make Sense

Jellyfin wasn’t designed to run as a distributed system. Its architecture assumes a single server with local storage and a local database. You can force it into a cluster, but you’re fighting the design at every step.

The real problems with transcoding aren’t solved by horizontal scaling:

Hardware acceleration is device-limited, not CPU-limited
Storage I/O is often the bottleneck, and NFS doesn’t help
Session state makes load balancing fragile
Most users don’t actually need transcoding if you optimize your media library

If you truly need more capacity, you’re better off running multiple independent Jellyfin servers with separate libraries, or investing in better hardware for a single powerful node.

What I Learned

This project taught me to question my assumptions. I assumed that “more nodes = more capacity” because that’s true for stateless web applications. But Jellyfin isn’t stateless, and transcoding isn’t a problem you can solve by throwing more servers at it.

I also learned that just because you can make something work doesn’t mean you should. The multi-node setup technically functioned, but it was fragile, hard to maintain, and didn’t actually solve the underlying problem.

The simplest solution—a single well-configured server—turned out to be the right one. Sometimes the boring answer is the correct answer.

Tech Expert & Vibe Coder

Why I Built This (And Why I Stopped)

My Real Setup