Configuring Nginx QUIC/HTTP3 reverse proxy with zero-downtime certificate rotation for self-hosted AI inference endpoints

Why I Worked on This

I run several self-hosted AI inference endpoints behind Nginx — mostly Ollama instances and a couple of custom models served through vLLM. These services sit on different VMs in my Proxmox cluster, and I need them accessible from outside my network without exposing the internal infrastructure directly.

The problem wasn’t just routing traffic. I wanted:

HTTP/3 support because some of my client applications can take advantage of QUIC’s multiplexing, especially when streaming token responses from LLMs
Zero-downtime certificate rotation because I use Let’s Encrypt with 90-day cycles, and I got tired of brief interruptions during renewals
A setup I could actually maintain without constantly checking logs or worrying about expired certs breaking inference calls at 2 AM

This wasn’t about chasing performance benchmarks. It was about building something stable that I could forget about.

My Real Setup

I’m running Nginx 1.25.3 compiled with --with-http_v3_module on a dedicated Ubuntu 22.04 VM. This VM acts as the single entry point for all external HTTPS traffic to my AI services.

Behind it:

Two Ollama instances (one on GPU, one CPU-only for lighter models)
One vLLM container serving a fine-tuned Mistral variant
An n8n instance that occasionally calls these endpoints for workflow automation

All backend services use plain HTTP internally. The Nginx proxy handles TLS termination, including HTTP/3 over UDP port 443.

Certificates come from Let’s Encrypt via Certbot, renewed automatically every 60 days. The challenge was making Nginx pick up new certificates without dropping active connections or requiring a full reload that could interrupt long-running inference requests.

Nginx Configuration

Here’s the core server block I use for one of the Ollama endpoints:

server {
    listen 443 ssl;
    listen 443 quic reuseport;
    http2 on;
    http3 on;

    server_name ollama.vipinpg.com;

    ssl_certificate /etc/letsencrypt/live/ollama.vipinpg.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/ollama.vipinpg.com/privkey.pem;
    ssl_protocols TLSv1.3;
    
    ssl_early_data on;
    quic_retry on;

    add_header Alt-Svc 'h3=":443"; ma=86400' always;
    add_header X-QUIC-Status $http3 always;

    location / {
        proxy_pass http://10.0.10.15:11434;
        proxy_http_version 1.1;
        
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        
        proxy_buffering off;
        proxy_request_buffering off;
        
        proxy_read_timeout 300s;
        proxy_connect_timeout 75s;
    }
}

The reuseport flag on the QUIC listener is important — it allows Nginx to bind multiple worker processes to the same UDP port without conflicts.

I disabled buffering because streaming responses from LLMs need to flow through immediately. Without this, tokens would queue up and the client would see stuttering output.

Zero-Downtime Certificate Rotation

The default Certbot renewal process works, but it typically triggers nginx -s reload, which briefly interrupts active connections. For short HTTP requests, this is barely noticeable. For a 60-second streaming inference response, it’s a problem.

My solution uses Nginx’s ability to reload configuration without killing worker processes that are handling active requests. The trick is separating the reload trigger from the certificate renewal itself.

Certbot Post-Renewal Hook

I added a deploy hook in /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh:

#!/bin/bash
nginx -t && nginx -s reload

This runs only after successful renewal. The nginx -t test catches configuration errors before attempting the reload.

But the real trick is in how Nginx handles -s reload:

Old worker processes continue serving existing requests
New worker processes start with the updated certificate
Old workers shut down only after their connections close naturally

This means an ongoing inference stream stays on the old worker until completion, while new requests immediately get the fresh certificate.

What I Changed from Default Behavior

By default, Nginx’s worker_shutdown_timeout is undefined, meaning old workers wait indefinitely for connections to close. I set it explicitly:

worker_shutdown_timeout 300s;

This matches my longest expected inference time. If a request somehow runs longer than 5 minutes, the worker will force-close it during reload. In practice, I’ve never hit this limit — most of my inference calls finish in 10-60 seconds.

HTTP/3 Behavior in Practice

Clients need to discover HTTP/3 support through the Alt-Svc header. The first request always uses HTTP/2, then subsequent requests upgrade to QUIC if the client supports it.

I added X-QUIC-Status to the response headers so I could verify which protocol was actually being used:

add_header X-QUIC-Status $http3 always;

When I tested with curl:

curl -I --http3-only https://ollama.vipinpg.com

The header showed X-QUIC-Status: h3, confirming QUIC was active.

For my Python clients calling Ollama through the API, I had to ensure the HTTP library supported HTTP/3. The standard requests library doesn’t. I switched to httpx with HTTP/3 enabled:

import httpx

client = httpx.Client(http2=True, http3=True)
response = client.post(
    "https://ollama.vipinpg.com/api/generate",
    json={"model": "mistral", "prompt": "Explain HTTP/3"},
    timeout=120.0
)

The first request used HTTP/2. Subsequent requests in the same session upgraded to HTTP/3 automatically.

What Didn’t Work

Early Data Without Replay Protection

I initially enabled ssl_early_data on without considering replay attacks. Early data allows clients to send requests in the first round-trip, reducing latency. But if an attacker captures and replays that early data, the server processes the same inference request twice.

For read-only inference endpoints, this isn’t catastrophic — just wasted compute. But I have one endpoint that logs usage metrics, and replayed requests would corrupt those counts.

I added replay protection by checking the $ssl_early_data variable and rejecting replayed requests:

location / {
    if ($ssl_early_data = "1") {
        return 425;
    }
    proxy_pass http://10.0.10.15:11434;
}

This broke legitimate early data requests. After some testing, I realized my use case doesn’t benefit enough from early data to justify the complexity. I kept ssl_early_data on in the config but stopped relying on it for critical paths.

UDP Packet Loss on Cheap Routers

QUIC runs over UDP, and my ISP-provided router occasionally dropped UDP packets under load. This caused random connection resets during inference requests.

I confirmed this by running tcpdump on the Nginx VM and watching for retransmissions:

tcpdump -i ens18 udp port 443 -vv

I saw frequent retransmits that correlated with connection errors in my client logs.

The fix was switching to a better router (UniFi Dream Machine) that handles UDP traffic properly. After the swap, retransmits dropped to near zero.

Certificate Chain Issues with Some Clients

Let’s Encrypt provides two certificate chains: one using their own root, and one cross-signed with an older root for compatibility. Nginx defaults to the shorter chain.

One of my older Android devices failed to validate the certificate because it didn’t trust Let’s Encrypt’s root. The error was vague — just “SSL handshake failed.”

I switched to the longer chain by updating the Certbot configuration:

certbot certonly --preferred-chain "ISRG Root X1" -d ollama.vipinpg.com

This added an extra certificate to the chain, increasing handshake size slightly, but fixed compatibility with older clients.

Monitoring and Debugging

I monitor certificate expiration and renewal status through a simple script that checks the certificate’s notAfter date:

#!/bin/bash
CERT="/etc/letsencrypt/live/ollama.vipinpg.com/cert.pem"
EXPIRY=$(openssl x509 -enddate -noout -in "$CERT" | cut -d= -f2)
EXPIRY_EPOCH=$(date -d "$EXPIRY" +%s)
NOW_EPOCH=$(date +%s)
DAYS_LEFT=$(( ($EXPIRY_EPOCH - $NOW_EPOCH) / 86400 ))

if [ $DAYS_LEFT -lt 14 ]; then
    echo "Certificate expires in $DAYS_LEFT days"
fi

I run this daily via cron and send alerts to my n8n instance if expiration is within 14 days. This catches any renewal failures before they become outages.

For HTTP/3 connection issues, I tail the Nginx error log filtered for QUIC-related messages:

tail -f /var/log/nginx/error.log | grep -i quic

Most issues show up as quic packet too small or quic connection timed out, which usually point to network problems rather than configuration errors.

Key Takeaways

HTTP/3 works well for streaming AI responses, but the benefit is subtle unless you’re dealing with high latency or packet loss
Zero-downtime certificate rotation is built into Nginx’s reload mechanism — you just need to set appropriate timeouts
UDP transport requires better network hardware than TCP; cheap routers will cause problems
Early data is a trap unless you explicitly handle replay attacks
Certificate chain selection matters more than I expected for client compatibility
Monitoring certificate expiration separately from Certbot’s built-in checks gives you advance warning of renewal issues

This setup has been running for about eight months now. I’ve had zero unplanned outages related to certificate renewals, and HTTP/3 adoption among my clients has gradually increased as they upgrade their libraries. The configuration is stable enough that I rarely think about it, which was the original goal.

Tech Expert & Vibe Coder

Why I Worked on This

My Real Setup

Nginx Configuration

Zero-Downtime Certificate Rotation

Certbot Post-Renewal Hook

What I Changed from Default Behavior

HTTP/3 Behavior in Practice

What Didn’t Work

Early Data Without Replay Protection

UDP Packet Loss on Cheap Routers

Certificate Chain Issues with Some Clients

Monitoring and Debugging

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG

Tech Expert & Vibe Coder

Configuring Nginx QUIC/HTTP3 reverse proxy with zero-downtime certificate rotation for self-hosted AI inference endpoints

Why I Worked on This

My Real Setup

Nginx Configuration

Zero-Downtime Certificate Rotation

Certbot Post-Renewal Hook

What I Changed from Default Behavior

HTTP/3 Behavior in Practice

What Didn’t Work

Early Data Without Replay Protection

UDP Packet Loss on Cheap Routers

Certificate Chain Issues with Some Clients

Monitoring and Debugging

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Subscribe to Newsletter

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG