Why I Built Circuit Breakers for My LLM Workflows
I run several n8n workflows that call self-hosted LLM APIs—mostly Ollama on my Proxmox cluster and occasionally OpenAI for specific tasks. These workflows handle everything from document summarization to automated content processing. The problem I kept hitting was cascading failures: one slow or unresponsive LLM call would jam up the entire workflow, causing timeouts that rippled through dependent steps.
I needed a way to fail fast when an API was struggling, rather than letting every request wait 30+ seconds before timing out. That’s what pushed me to implement circuit breakers specifically for LLM API calls in my n8n setup.
What Circuit Breakers Actually Do
A circuit breaker sits between your workflow and the API. It monitors call success rates and response times. When failures cross a threshold, it “opens” and blocks subsequent requests for a cooldown period. This prevents your workflow from hammering a failing service and gives the API time to recover.
The pattern has three states:
- Closed: Normal operation. All requests go through.
- Open: Too many failures detected. Requests are blocked immediately.
- Half-Open: Testing if the service has recovered. A few requests are allowed through.
For LLM APIs specifically, this matters because these services have unpredictable response times, rate limits, and they cost money per request. A circuit breaker prevents wasted calls to a service that’s already struggling.
My Implementation in n8n
I initially tried to build this directly in n8n using error workflows and conditional logic, but it became messy fast. The problem is that n8n doesn’t have native circuit breaker support, and managing state across workflow executions is awkward.
Instead, I created a small Python service that wraps my LLM API calls and exposes a simple HTTP endpoint. My n8n workflows call this service instead of hitting the LLM APIs directly. The service handles all the circuit breaker logic.
The Core Service
Here’s the basic structure I’m using. This runs in a Docker container on the same Proxmox node as my n8n instance:
from flask import Flask, request, jsonify
import time
from enum import Enum
from threading import Lock
class CircuitState(Enum):
CLOSED = "closed"
OPEN = "open"
HALF_OPEN = "half_open"
class LLMCircuitBreaker:
def __init__(self, failure_threshold=3, timeout_duration=60, success_threshold=2):
self.failure_threshold = failure_threshold
self.timeout_duration = timeout_duration
self.success_threshold = success_threshold
self.state = CircuitState.CLOSED
self.failure_count = 0
self.success_count = 0
self.last_failure_time = None
self.lock = Lock()
def call(self, func):
with self.lock:
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time = self.success_threshold:
self.state = CircuitState.CLOSED
self.failure_count = 0
elif self.state == CircuitState.CLOSED:
self.failure_count = 0
def _on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
self.success_count = 0
app = Flask(__name__)
breaker = LLMCircuitBreaker(failure_threshold=3, timeout_duration=60)
@app.route('/llm', methods=['POST'])
def call_llm():
data = request.json
try:
result = breaker.call(lambda: make_llm_call(data))
return jsonify({"success": True, "result": result, "state": breaker.state.value})
except Exception as e:
return jsonify({"success": False, "error": str(e), "state": breaker.state.value}), 503
The make_llm_call function handles the actual API request to Ollama or OpenAI, with its own timeout handling.
Timeout Handling
I set individual request timeouts at 30 seconds for most tasks and 60 seconds for complex generation work. If a request exceeds the timeout, it counts as a failure. This is important because LLM APIs can hang indefinitely on certain inputs, and I’d rather fail fast than wait forever.
import requests
def make_llm_call(data, timeout=30):
response = requests.post(
"http://ollama:11434/api/generate",
json=data,
timeout=timeout
)
response.raise_for_status()
return response.json()
Integration with n8n
In my n8n workflows, I use an HTTP Request node to call the circuit breaker service. The workflow looks like this:
- HTTP Request node calls my circuit breaker service
- If the response status is 503, the circuit is open—workflow branches to a fallback path
- If the response succeeds, the workflow continues normally
- If there’s a timeout or error, n8n’s error workflow logs it for monitoring
The fallback path typically either queues the task for retry later (using Cronicle) or returns a cached response if one exists.
What Worked
This setup has been running for about four months now. The main benefits I’ve seen:
- Faster failure detection: Instead of waiting for multiple 30-second timeouts, the circuit opens after 2-3 failures and blocks subsequent requests immediately.
- Reduced load on struggling services: When Ollama hits resource limits on my GPU, the circuit breaker prevents my workflows from piling on more requests.
- Better workflow reliability: n8n workflows no longer hang indefinitely. They either succeed or fail fast and move to fallback logic.
- Visibility: I added basic metrics to the circuit breaker service. I can see when circuits open and how long they stay open, which helps me identify underlying issues.
What Didn’t Work
The first version of this used a global circuit breaker for all LLM calls. That was a mistake. When one workflow triggered failures (usually due to malformed prompts), it would open the circuit for all other workflows. I had to switch to per-endpoint circuit breakers so failures are isolated.
I also initially set the failure threshold too low (2 failures). This caused the circuit to open too aggressively during normal operation, especially during high-load periods when occasional timeouts are expected. I bumped it to 3 failures, which feels more reasonable.
The timeout duration is still something I’m tuning. 60 seconds works for most cases, but for workflows that run during off-peak hours, I’ve extended it to 120 seconds since there’s less urgency.
State Management Issues
Because the circuit breaker service runs as a single container, restarting it resets all circuit state. This isn’t a huge problem in practice, but it means circuits don’t “remember” their state across restarts. I considered adding Redis for persistent state but decided it wasn’t worth the complexity for my use case.
Monitoring Gaps
I’m not currently tracking detailed metrics about which specific prompts or workflow steps trigger failures most often. I log circuit state changes, but I don’t have granular visibility into what causes the circuit to open. This is something I need to improve.
Key Takeaways
Circuit breakers are worth implementing if you’re running LLM APIs in production workflows. The main value is preventing cascading failures and failing fast when services are struggling.
For self-hosted setups, the circuit breaker doesn’t need to be complex. A simple state machine with failure counting and timeout logic is enough. The hard part is tuning the thresholds to match your actual usage patterns.
If you’re using n8n, don’t try to build circuit breaker logic directly in workflows. It’s much cleaner to wrap your API calls in a small service that handles this logic externally.
The biggest benefit isn’t just preventing failures—it’s making failures predictable and fast. When something goes wrong, my workflows now fail in seconds instead of minutes, which makes debugging much easier.