Creating an n8n Workflow to Monitor AI Model Response Quality: Detecting Degraded Outputs from Quantized GGUF Models

Why I Built This

I run several n8n instances that handle everything from event notifications to data processing workflows. These automations run 24/7, and when something breaks at 3 AM, I need to know what happened without spending 20 minutes clicking through execution logs.

The problem isn’t just finding failures—it’s understanding them quickly. n8n’s interface shows you what failed, but figuring out why means reading through JSON error messages, comparing timestamps across executions, and mentally piecing together what went wrong.

I wanted to ask plain questions like “What failed last night?” and get actual answers, not just raw logs.

My Real Setup

Here’s what I’m working with:

A self-hosted n8n instance running on Proxmox
Multiple workflows that fetch event data from APIs and send Telegram notifications
Claude Desktop installed locally
An MCP (Model Context Protocol) server I wrote to connect Claude to my n8n API

The core idea: expose n8n’s execution data through a webhook, then let Claude query it through natural language.

The Webhook Workflow

I created an n8n workflow that acts as an API endpoint. It has three main functions:

Get Active Workflows – Returns a list of all running workflows
Get Last Executions – Fetches the most recent n executions with summary stats
Get Execution Details (Errors Only) – Pulls full error logs from failed runs

The workflow listens on a webhook URL and responds with JSON. Nothing fancy—just a clean interface to my instance’s data.

The MCP Server

I wrote a simple MCP server in TypeScript that connects Claude Desktop to this webhook. The server defines three tools that Claude can call:

get_active_workflows
get_workflow_executions
get_execution_details

Each tool makes an HTTP request to my n8n webhook and returns structured data. Claude decides which tool to use based on what I ask.

What Worked

I tested this on a real failure scenario. One of my workflows—”Bangkok Meetup”—failed three times in five hours. Here’s what happened when I asked Claude about it:

Step 1: Listing Active Workflows

I asked: “How many workflows are active?”

Claude automatically called get_active_workflows and returned a categorized list:

8 workflows fetching event data
3 work-in-progress workflows

It even grouped them by purpose without me asking. This gave me context before diving into failures.

Step 2: Analyzing Recent Executions

Next: “Analyze the last 25 executions.”

Claude called get_workflow_executions and came back with:

Total executions: 25
Success rate: 88%
Failure rate: 12%
Only 1 workflow affected: “Bangkok Meetup” (ID: 7uvA2XQPMB5l4kI5)

It listed the three failed execution IDs with timestamps:

11:00 AM
9:00 AM
7:00 AM

This immediately told me the workflow was failing every two hours during its scheduled runs.

Step 3: Root Cause Analysis

I didn’t even need to ask for details. Claude saw the failures and automatically called get_execution_details for those three execution IDs.

It returned the actual error:

“API rate limit exceeded. The upstream service returned a 429 status code.”

Claude explained in plain language:

The workflow runs every hour
The external API has rate limits
Three consecutive failures suggest the limit was hit repeatedly
The workflow succeeded at 6:00 AM and 8:00 AM, confirming it’s not a permanent issue

This took about 30 seconds. Without the agent, I would have opened the n8n UI, clicked through each execution, read the error JSON, and manually connected the dots.

What Didn’t Work

Initial Authentication Issues

My first attempt used n8n’s API key authentication directly from the MCP server. This failed because:

The API key needed to be stored in the MCP server config
Claude Desktop’s MCP implementation doesn’t support environment variables cleanly
Hardcoding credentials felt wrong

I switched to using a webhook with no authentication (since it’s only accessible on my local network). For production, I’d add a simple bearer token.

Too Much Data at Once

My first version of get_workflow_executions returned every field from the n8n API response. This overwhelmed Claude with unnecessary details like internal node IDs and execution paths.

I stripped it down to:

Execution ID
Workflow ID
Status (success/error)
Start and stop times
Execution mode (webhook/trigger/manual)

This made the responses faster and the analysis clearer.

Error Details Formatting

n8n’s error objects are deeply nested JSON with lots of metadata. Claude struggled to extract the actual error message from this structure.

I added a preprocessing step in the webhook that:

Extracts the root error message
Identifies which node failed
Includes only the relevant stack trace lines

Now Claude gets clean, readable error summaries instead of raw dumps.

No Historical Context

The current setup only looks at recent executions. If a workflow fails intermittently over days, Claude can’t see the pattern unless I manually ask for multiple time windows.

I considered adding a database to store execution history, but that felt like overengineering for my use case. For now, I just query different time ranges when needed.

Key Takeaways

1. MCP Makes This Trivial

The Model Context Protocol is exactly what I needed. No complex API integrations, no prompt engineering gymnastics—just define tools and let Claude figure out when to use them.

The entire MCP server is about 150 lines of TypeScript.

2. Webhooks Are Underrated

I could have connected directly to n8n’s REST API, but the webhook approach gave me:

Full control over data formatting
Easy preprocessing of errors
No need to manage API keys in multiple places

Plus, I can add custom logic in the webhook workflow without touching the MCP server.

3. Claude Is Genuinely Good at This

I expected to need explicit instructions for every step. Instead, Claude:

Inferred which tool to call based on my questions
Automatically fetched additional details when it spotted failures
Categorized workflows by purpose without prompting
Explained technical errors in plain language

It felt less like querying a database and more like asking a colleague who knows the system.

4. The Real Value Is Speed

This isn’t about replacing the n8n UI. It’s about getting answers fast when something breaks.

Before: Open browser → log into n8n → find workflow → check executions → read errors → repeat for other workflows.

Now: “What failed last night?” → Get a summary in 30 seconds.

5. Start Simple

My first version only had get_active_workflows. I added the other functions as I realized what questions I actually wanted to ask.

Don’t try to expose every API endpoint upfront. Build for the questions you need answered today.

What I’d Change

If I were starting over, I’d:

Add a simple bearer token to the webhook for basic security
Include a function to restart failed workflows directly from Claude
Store a rolling 7-day execution history in SQLite for pattern detection
Add a notification workflow that pings me in Telegram when Claude detects critical failures

But honestly, what I have now solves the problem I had. The rest would be nice-to-haves.

The Honest Limits

This setup works for my n8n instance with my workflows. If you’re running hundreds of workflows or need compliance-grade audit logs, you’d need something more robust.

Also, Claude can only see what the webhook exposes. If you need to inspect workflow definitions or modify nodes, you’d need additional tools.

And obviously, this requires Claude Desktop running locally. It’s not a web service you can share with a team.

Final Thoughts

I built this because I was tired of debugging workflows manually. The fact that it works this well with so little code is surprising.

The combination of n8n’s API, MCP’s simplicity, and Claude’s reasoning capabilities turned what could have been a weekend project into something I actually use daily.

If you run n8n and you’ve ever wished you could just ask what went wrong instead of hunting through logs, this approach is worth trying.

It won’t replace proper monitoring, but it makes troubleshooting feel less like detective work and more like a conversation.

Tech Expert & Vibe Coder

Why I Built This