Tech Expert & Vibe Coder

With 14+ years of experience, I specialize in self-hosting, AI automation, and Vibe Coding – building applications using AI-powered tools like Google Antigravity, Dyad, and Cline. From homelabs to enterprise solutions.

Debugging Caddy reverse proxy websocket timeouts when proxying Ollama streaming responses through Cloudflare tunnels

Why I Worked on This

I run Ollama on my home Proxmox server for local AI inference. I wanted to access it remotely without exposing ports directly, so I set up a Cloudflare tunnel with Caddy as the reverse proxy. Everything worked fine for basic API calls until I tried streaming responses—the kind where tokens arrive one by one as the model generates them.

The stream would start, then abruptly cut off after 30-60 seconds. No error in Ollama's logs. No clear failure in Caddy. Just silent timeouts that made streaming unusable.

I needed to figure out where the timeout was happening and how to fix it without breaking other services or creating security holes.

My Real Setup

Here's what I was running:

  • Ollama container on Proxmox, listening on localhost:11434
  • Caddy v2.8.4 as reverse proxy, also in a container
  • Cloudflare tunnel (cloudflared) connecting Caddy to the internet
  • Custom subdomain pointing to the tunnel

My initial Caddyfile looked like this:

ollama.mydomain.com {
    reverse_proxy localhost:11434
}

Simple. Clean. Completely broken for streaming.

What Didn't Work

First Assumption: Cloudflare Was Timing Out

I suspected Cloudflare's proxy layer was killing long-running connections. I checked their documentation and found that WebSocket connections through tunnels have a 100-second idle timeout by default. That seemed to match the behavior I was seeing.

But when I tested with curl directly against Caddy (bypassing Cloudflare), the timeout still happened. So it wasn't Cloudflare.

Second Try: Increasing Caddy's Timeouts

I added explicit timeout settings to the reverse proxy block:

ollama.mydomain.com {
    reverse_proxy localhost:11434 {
        transport http {
            read_timeout 300s
            write_timeout 300s
        }
    }
}

Still failed. The stream would die at the same point.

Third Try: Buffering

I thought maybe Caddy was buffering the response and timing out while waiting for a complete response. I disabled buffering:

ollama.mydomain.com {
    reverse_proxy localhost:11434 {
        flush_interval -1
    }
}

This made it worse. Responses became choppy and still timed out.

What Actually Worked

The problem was that Ollama's streaming responses use server-sent events (SSE), which keep the connection open but send data in bursts. Caddy's default behavior was treating these as idle connections and closing them.

The fix required three specific settings:

ollama.mydomain.com {
    reverse_proxy localhost:11434 {
        flush_interval 0
        transport http {
            read_timeout 0
            write_timeout 0
        }
    }
}

Why These Settings Matter

flush_interval 0: This tells Caddy to immediately forward any data it receives from the upstream, without waiting to accumulate a buffer. The default behavior would hold data briefly, which breaks the streaming experience.

read_timeout 0 and write_timeout 0: These disable the connection timeouts entirely. Normally, Caddy expects requests to complete within a reasonable time. But streaming responses can legitimately stay open for minutes while the model generates tokens.

Setting these to 0 means "no timeout." This is safe in my case because Ollama itself will close the connection when the response completes, and I'm not exposing this to untrusted clients.

The Cloudflare Tunnel Part

Once Caddy was configured correctly, the Cloudflare tunnel handled the connection fine. The 100-second idle timeout I worried about doesn't apply when data is actively flowing—and with proper flushing, Ollama sends tokens frequently enough to keep the connection alive.

I didn't need to change anything in the tunnel configuration. The default settings worked once Caddy stopped killing the connection prematurely.

Testing the Fix

I tested with a simple streaming request:

curl -X POST https://ollama.mydomain.com/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "prompt": "Write a long story about debugging network issues",
    "stream": true
  }'

The response now streams continuously until completion, even for responses that take several minutes. No more mid-stream disconnections.

What I Learned About Streaming Through Proxies

Buffering is usually good, but not for SSE: Caddy's default buffering improves performance for normal HTTP responses, but it breaks streaming protocols that expect immediate delivery.

Timeouts need context: A 30-second timeout makes sense for most web requests. But streaming AI responses are fundamentally different—they're long-lived by design. Disabling timeouts for specific endpoints is the right choice when you control both ends of the connection.

Cloudflare tunnels don't break everything: I was ready to blame Cloudflare's proxy layer, but the tunnel itself handled long-running connections fine. The problem was in my reverse proxy configuration.

Test without layers first: When I bypassed Cloudflare and tested directly against Caddy, I immediately narrowed down where the problem was. Adding layers back one at a time confirmed the fix.

Current Configuration

My working Caddyfile for Ollama looks like this:

ollama.mydomain.com {
    reverse_proxy localhost:11434 {
        flush_interval 0
        transport http {
            read_timeout 0
            write_timeout 0
        }
    }
}

It's been stable for weeks now. I use it daily for streaming completions, and the connection stays open as long as needed.

Trade-offs I'm Aware Of

Disabling timeouts means a misbehaving client could hold a connection open indefinitely. In my case, this is acceptable because:

  • I'm the only user
  • Ollama itself will close the connection when done
  • The endpoint is behind Cloudflare Access for authentication

If I were exposing this to untrusted users, I'd need to add connection limits or idle detection at a different layer.

Key Takeaways

Streaming responses require explicit configuration in reverse proxies. The defaults are designed for request-response patterns, not long-lived connections.

When debugging timeouts, test each layer independently. Don't assume the most complex part (like Cloudflare) is the problem.

SSE and WebSocket protocols need immediate flushing and disabled timeouts. This isn't a hack—it's the correct configuration for these protocols.

Document your timeout decisions. Future me (or anyone else reading the config) needs to understand why certain values are set to zero.