Creating an n8n Workflow to Auto-Generate Blog Post Summaries Using Local Mistral Models with Fallback to OpenRouter API

Why I Built This Workflow

I needed a way to generate summaries of my blog posts without relying entirely on external APIs. My self-hosted setup includes n8n running on Proxmox, and I wanted to keep AI processing local when possible to reduce costs and maintain control over my data.

The challenge was simple: I write technical articles that need accurate summaries for social media, RSS feeds, and preview cards. Manually writing these summaries takes time, and I wanted automation that could handle this reliably.

I also wanted a fallback system. Local models are great when they work, but sometimes they fail due to resource constraints or model limitations. Having OpenRouter as a backup meant the workflow wouldn’t break completely if my local setup had issues.

My Actual Setup

Here’s what I’m working with:

n8n instance running in a Docker container on Proxmox
Ollama installed on the same Proxmox host, serving Mistral models locally
OpenRouter account with API access as a fallback
Ghost CMS where my blog posts are stored and published

The workflow triggers when a new post is published in Ghost. It fetches the full article content, sends it to my local Mistral model for summarization, and if that fails, falls back to OpenRouter’s API.

Setting Up the Local Mistral Model

I installed Ollama directly on my Proxmox host because running it in Docker added unnecessary complexity for my use case. The installation was straightforward:

curl -fsSL https://ollama.com/install.sh | sh

After installation, I pulled the Mistral model I wanted to use:

ollama pull mistral:7b-instruct

I chose the 7B instruct variant because it’s small enough to run on my hardware without causing memory issues, and it handles summarization tasks well enough for my needs.

To make Ollama accessible to my n8n container, I configured it to listen on all interfaces instead of just localhost. I edited the systemd service file:

sudo systemctl edit ollama

And added this environment variable:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

After restarting the service, Ollama was accessible from my n8n container at http://proxmox-host-ip:11434.

Building the n8n Workflow

Triggering on New Posts

I set up a webhook trigger in n8n that Ghost calls whenever a post is published. In Ghost’s admin panel, I created a custom integration and configured the webhook URL to point to my n8n instance.

The webhook sends post data including the title, content, and metadata. I extract the full HTML content from this payload.

Preparing the Content

Before sending content to the AI model, I strip out HTML tags because they add noise and increase token usage. I used n8n’s HTML Extract node with a simple configuration to pull just the text content.

I also added a length check. If the article is shorter than 500 characters, I skip summarization entirely since there’s not much to summarize. This prevents unnecessary API calls.

Calling the Local Mistral Model

I used n8n’s HTTP Request node to call Ollama’s API. The endpoint is /api/generate, and I send a POST request with this structure:

{
  "model": "mistral:7b-instruct",
  "prompt": "Summarize this article in 2-3 sentences, focusing on the main technical points: {{ $json.content }}",
  "stream": false
}

The stream: false parameter is important because I need the full response at once, not streamed chunks.

I set a timeout of 60 seconds on this request. Local inference can be slow depending on article length and system load.

Handling Failures

This is where the fallback logic comes in. I connected an error trigger to the HTTP Request node that catches any failures—timeouts, connection errors, or model errors.

When the local model fails, the workflow automatically routes to an OpenRouter node. I configured this with my OpenRouter API key and set it to use a similar prompt structure:

{
  "model": "mistralai/mistral-7b-instruct",
  "messages": [
    {
      "role": "user",
      "content": "Summarize this article in 2-3 sentences: {{ $json.content }}"
    }
  ]
}

OpenRouter’s API is more reliable but costs money per request, which is why I only use it as a backup.

Storing the Summary

Once I have a summary from either source, I update the Ghost post with the generated text. I use Ghost’s Admin API to patch the post record, adding the summary to a custom field I created called meta_description.

The HTTP Request node for this looks like:

PUT https://blog.vipinpg.com/ghost/api/admin/posts/{{ $json.post_id }}
Headers:
  Authorization: Ghost [admin-api-key]
  Content-Type: application/json
Body:
{
  "posts": [{
    "meta_description": "{{ $json.summary }}"
  }]
}

What Worked

The local Mistral model handles most summarization tasks without issues. For standard technical articles between 1000-3000 words, it generates decent summaries in 10-30 seconds.

The fallback system works exactly as intended. When my Proxmox host is under heavy load or if Ollama crashes, the workflow seamlessly switches to OpenRouter without manual intervention.

Using Ollama instead of running models directly in Docker simplified my setup. I don’t have to manage GPU passthrough or deal with container-specific CUDA issues.

What Didn’t Work

The first version of my prompt was too generic. It produced summaries that were technically accurate but boring. I had to iterate several times to get prompts that captured the practical focus of my articles.

I initially tried using a larger Mistral model (13B parameters), but it was too slow on my hardware. Inference times exceeded 90 seconds for longer articles, which made the workflow feel broken even though it technically worked.

Error handling was messier than expected. n8n’s error triggers don’t always catch every type of failure cleanly, especially network timeouts. I had to add explicit timeout values and test various failure scenarios manually.

The HTML extraction doesn’t always work perfectly. Some of my older posts have embedded code blocks or complex formatting that confuses the text extraction. I haven’t found a clean solution for this yet.

Key Takeaways

Running local AI models is practical for simple tasks like summarization, but you need realistic expectations about speed and reliability.

Always have a fallback. Local models will fail at some point, and having an API backup prevents your entire workflow from breaking.

Prompt engineering matters more than model size for focused tasks. A well-crafted prompt on a smaller model often beats a generic prompt on a larger one.

Test your workflow under different conditions—high system load, network issues, long articles, short articles. Real-world usage will expose edge cases you didn’t consider.

Monitor your OpenRouter usage if you’re using it as a fallback. I check my API costs monthly to make sure the fallback isn’t triggering more than expected, which would indicate problems with my local setup.

Tech Expert & Vibe Coder

Why I Built This Workflow

My Actual Setup

Setting Up the Local Mistral Model

Building the n8n Workflow

Triggering on New Posts

Preparing the Content

Calling the Local Mistral Model

Handling Failures

Storing the Summary

What Worked

What Didn’t Work

Key Takeaways

Category:

Implementing automatic model...

Setting up hybrid inference...

Leave a Comment Cancel reply

Categories

Related Posts

Implementing automatic model selection based on...

Setting up hybrid inference pipelines: routing...

Debugging token generation slowdowns in LM Studio...

About Me

Vipin PG

Tech Expert & Vibe Coder

Creating an n8n Workflow to Auto-Generate Blog Post Summaries Using Local Mistral Models with Fallback to OpenRouter API

Why I Built This Workflow

My Actual Setup

Setting Up the Local Mistral Model

Building the n8n Workflow

Triggering on New Posts

Preparing the Content

Calling the Local Mistral Model

Handling Failures

Storing the Summary

What Worked

What Didn’t Work

Key Takeaways

Category:

Implementing automatic model...

Setting up hybrid inference...

Leave a Comment Cancel reply

Subscribe to Newsletter

Categories

Related Posts

Implementing automatic model selection based on...

Setting up hybrid inference pipelines: routing...

Debugging token generation slowdowns in LM Studio...

About Me

Vipin PG