Implementing Circuit Breakers for Self-Hosted LLM APIs: Preventing Cascading Failures in n8n Workflows with Timeout Fallbacks

Why I Built Circuit Breakers for My LLM Workflows

I run several n8n workflows that call self-hosted LLM APIs—mostly Ollama on my Proxmox cluster and occasionally OpenAI for specific tasks. These workflows handle everything from document summarization to automated content processing. The problem I kept hitting was cascading failures: one slow or unresponsive LLM call would jam up the entire workflow, causing timeouts that rippled through dependent steps.

I needed a way to fail fast when an API was struggling, rather than letting every request wait 30+ seconds before timing out. That’s what pushed me to implement circuit breakers specifically for LLM API calls in my n8n setup.

Implementing Circuit Breakers for Self-Hosted LLM APIs: Preventing Cascading Failures in n8n Workflows with Timeout Fallbacks

What Circuit Breakers Actually Do

A circuit breaker sits between your workflow and the API. It monitors call success rates and response times. When failures cross a threshold, it “opens” and blocks subsequent requests for a cooldown period. This prevents your workflow from hammering a failing service and gives the API time to recover.

The pattern has three states:

Closed: Normal operation. All requests go through.
Open: Too many failures detected. Requests are blocked immediately.
Half-Open: Testing if the service has recovered. A few requests are allowed through.

For LLM APIs specifically, this matters because these services have unpredictable response times, rate limits, and they cost money per request. A circuit breaker prevents wasted calls to a service that’s already struggling.

My Implementation in n8n

I initially tried to build this directly in n8n using error workflows and conditional logic, but it became messy fast. The problem is that n8n doesn’t have native circuit breaker support, and managing state across workflow executions is awkward.

Instead, I created a small Python service that wraps my LLM API calls and exposes a simple HTTP endpoint. My n8n workflows call this service instead of hitting the LLM APIs directly. The service handles all the circuit breaker logic.

The Core Service

Here’s the basic structure I’m using. This runs in a Docker container on the same Proxmox node as my n8n instance:

from flask import Flask, request, jsonify
import time
from enum import Enum
from threading import Lock

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class LLMCircuitBreaker:
    def __init__(self, failure_threshold=3, timeout_duration=60, success_threshold=2):
        self.failure_threshold = failure_threshold
        self.timeout_duration = timeout_duration
        self.success_threshold = success_threshold
        
        self.state = CircuitState.CLOSED
        self.failure_count = 0
        self.success_count = 0
        self.last_failure_time = None
        self.lock = Lock()
    
    def call(self, func):
        with self.lock:
            if self.state == CircuitState.OPEN:
                if time.time() - self.last_failure_time = self.success_threshold:
                self.state = CircuitState.CLOSED
                self.failure_count = 0
        elif self.state == CircuitState.CLOSED:
            self.failure_count = 0
    
    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN
            self.success_count = 0

app = Flask(__name__)
breaker = LLMCircuitBreaker(failure_threshold=3, timeout_duration=60)

@app.route('/llm', methods=['POST'])
def call_llm():
    data = request.json
    
    try:
        result = breaker.call(lambda: make_llm_call(data))
        return jsonify({"success": True, "result": result, "state": breaker.state.value})
    except Exception as e:
        return jsonify({"success": False, "error": str(e), "state": breaker.state.value}), 503

The make_llm_call function handles the actual API request to Ollama or OpenAI, with its own timeout handling.

Timeout Handling

I set individual request timeouts at 30 seconds for most tasks and 60 seconds for complex generation work. If a request exceeds the timeout, it counts as a failure. This is important because LLM APIs can hang indefinitely on certain inputs, and I’d rather fail fast than wait forever.

import requests

def make_llm_call(data, timeout=30):
    response = requests.post(
        "http://ollama:11434/api/generate",
        json=data,
        timeout=timeout
    )
    response.raise_for_status()
    return response.json()

Integration with n8n

In my n8n workflows, I use an HTTP Request node to call the circuit breaker service. The workflow looks like this:

HTTP Request node calls my circuit breaker service
If the response status is 503, the circuit is open—workflow branches to a fallback path
If the response succeeds, the workflow continues normally
If there’s a timeout or error, n8n’s error workflow logs it for monitoring

The fallback path typically either queues the task for retry later (using Cronicle) or returns a cached response if one exists.

What Worked

This setup has been running for about four months now. The main benefits I’ve seen:

Faster failure detection: Instead of waiting for multiple 30-second timeouts, the circuit opens after 2-3 failures and blocks subsequent requests immediately.
Reduced load on struggling services: When Ollama hits resource limits on my GPU, the circuit breaker prevents my workflows from piling on more requests.
Better workflow reliability: n8n workflows no longer hang indefinitely. They either succeed or fail fast and move to fallback logic.
Visibility: I added basic metrics to the circuit breaker service. I can see when circuits open and how long they stay open, which helps me identify underlying issues.

What Didn’t Work

The first version of this used a global circuit breaker for all LLM calls. That was a mistake. When one workflow triggered failures (usually due to malformed prompts), it would open the circuit for all other workflows. I had to switch to per-endpoint circuit breakers so failures are isolated.

I also initially set the failure threshold too low (2 failures). This caused the circuit to open too aggressively during normal operation, especially during high-load periods when occasional timeouts are expected. I bumped it to 3 failures, which feels more reasonable.

The timeout duration is still something I’m tuning. 60 seconds works for most cases, but for workflows that run during off-peak hours, I’ve extended it to 120 seconds since there’s less urgency.

State Management Issues

Because the circuit breaker service runs as a single container, restarting it resets all circuit state. This isn’t a huge problem in practice, but it means circuits don’t “remember” their state across restarts. I considered adding Redis for persistent state but decided it wasn’t worth the complexity for my use case.

Monitoring Gaps

I’m not currently tracking detailed metrics about which specific prompts or workflow steps trigger failures most often. I log circuit state changes, but I don’t have granular visibility into what causes the circuit to open. This is something I need to improve.

Key Takeaways

Circuit breakers are worth implementing if you’re running LLM APIs in production workflows. The main value is preventing cascading failures and failing fast when services are struggling.

For self-hosted setups, the circuit breaker doesn’t need to be complex. A simple state machine with failure counting and timeout logic is enough. The hard part is tuning the thresholds to match your actual usage patterns.

If you’re using n8n, don’t try to build circuit breaker logic directly in workflows. It’s much cleaner to wrap your API calls in a small service that handles this logic externally.

The biggest benefit isn’t just preventing failures—it’s making failures predictable and fast. When something goes wrong, my workflows now fail in seconds instead of minutes, which makes debugging much easier.

Tech Expert & Vibe Coder

Implementing Circuit Breakers for Self-Hosted LLM APIs: Preventing Cascading Failures in n8n Workflows with Timeout Fallbacks

Why I Built Circuit Breakers for My LLM Workflows

What Circuit Breakers Actually Do