Implementing AI-powered log analysis with local Llama models to detect docker container anomalies before grafana alerts trigger

Why I Built This

I run about 30 Docker containers on my Proxmox homelab. Each one generates logs. Some are quiet. Some are chatty. A few are downright noisy.

For months, I relied on Grafana alerts to tell me when something broke. The problem? By the time Grafana fired an alert, the issue had already cascaded. A failed health check meant the container was already down. A memory spike alert came after the OOM killer had done its work.

I wanted to catch problems earlier—when logs started showing weird patterns but before metrics crossed alert thresholds. I needed something that could read context, not just count errors.

That’s when I started experimenting with local Llama models to analyze logs in real time.

My Real Setup

I’m running this on a Proxmox VM with 16GB RAM and 8 cores. The VM sits on the same network as my Docker host, so it can pull logs directly via SSH or read from shared volumes.

The core stack:

Ollama running locally with the llama3.2:3b model (later switched to qwen2.5:7b for better reasoning)
Python script that tails Docker container logs in real time
Simple prompt engineering to ask the model: “Is anything unusual happening here?”
Webhook to n8n when the model flags something

No vector databases. No embeddings. No RAG pipeline. Just streaming logs to a language model and asking it to think.

Why Local Models?

I wanted this to run without internet dependencies. My homelab is behind CGNAT, and I don’t want to rely on cloud APIs for something that should work offline. Plus, I’m sending raw logs—some contain internal IPs, service names, and error traces I’d rather not send to OpenAI.

Ollama made this trivial. Install, pull a model, hit the API. Done.

What Worked

The Basic Flow

I wrote a Python script that:

Connects to the Docker host via SSH
Runs docker logs --follow --tail 100 [container_name]
Buffers the last 50 lines in memory
Every 30 seconds, sends those lines to Ollama with a prompt
If the model response contains keywords like “anomaly,” “unusual,” or “concern,” it triggers a webhook

The prompt I settled on after a few iterations:

You are analyzing Docker container logs in real time.
Your job is to detect unusual patterns, errors, or behavior that might indicate a problem.

Here are the last 50 log lines:
{log_lines}

Respond in one of two ways:
1. If everything looks normal: "NORMAL"
2. If something is unusual: "ANOMALY: [brief explanation]"

Focus on:
- Repeated errors
- Sudden changes in log frequency
- Stack traces
- Connection failures
- Memory or resource warnings

This worked better than I expected. The model doesn’t need perfect accuracy—it just needs to flag things that deserve a closer look.

Switching Models

I started with llama3.2:3b because it’s fast and uses less RAM. It caught obvious errors but missed subtle patterns like a gradual increase in retry attempts.

I switched to qwen2.5:7b, which is slower but more thoughtful. It started catching things like:

A container logging the same INFO message 10 times in a row (turned out to be a loop bug)
Gradually increasing response times in an API container (memory leak)
A service retrying a connection every second for 2 minutes straight (upstream issue)

The 7B model uses about 6GB of RAM when loaded. On my 16GB VM, that’s fine. If you’re tighter on resources, the 3B model is still useful for basic anomaly detection.

Integration with n8n

When the model flags something, my script sends a POST request to an n8n webhook. The workflow:

Logs the anomaly to a SQLite database
Sends a Telegram message to me with the container name and model’s explanation
Optionally triggers a Grafana snapshot of the relevant dashboard

This gives me a heads-up before Grafana alerts fire. Sometimes I can fix the issue before it becomes an outage. Other times, I just know what to expect when the alert does trigger.

What Didn’t Work

Trying to Analyze Everything

My first version tried to monitor all 30 containers at once. Bad idea. The model got overwhelmed with context switching, and I got spammed with false positives.

I narrowed it down to 5 critical containers: my reverse proxy, database, n8n instance, and two API services. Everything else still goes to Grafana, but these five get AI analysis.

Asking for Too Much Detail

I tried prompting the model to explain root causes and suggest fixes. It hallucinated. A lot.

For example, it once told me a PostgreSQL connection error was caused by “insufficient shared memory buffers” when the real issue was a typo in my connection string.

I learned to keep the model’s job simple: flag unusual behavior. Let me do the root cause analysis.

Real-Time Processing with Large Models

I experimented with llama3.1:70b for deeper analysis. It was too slow. By the time it finished processing a batch of logs, new issues had already appeared.

For real-time monitoring, smaller models (3B to 7B) are the sweet spot. You can run larger models on a schedule for deeper post-mortem analysis, but not in the hot path.

Log Formats

Some containers log in JSON. Some use plain text. Some mix both. I thought the model would handle this automatically. It didn’t.

I added a preprocessing step to parse JSON logs and extract the message field. For plain text, I just pass it through. This made the model’s job easier and reduced noise.

Limitations and Trade-Offs

This approach has clear boundaries:

Not a replacement for structured logging. If your logs are a mess, the model will struggle. Garbage in, garbage out.
False positives happen. About 10-15% of anomalies flagged by the model turn out to be harmless. I’m okay with that—it’s better than missing real issues.
Resource usage. Running a 7B model continuously uses CPU and RAM. On my setup, it’s fine. On a Raspberry Pi, probably not.
No historical analysis. This system only looks at recent logs. If you need to search back through days of logs, you’d need a different approach (like the RAG setup in the research content, though I haven’t built that yet).

Key Takeaways

Local LLMs are surprisingly good at spotting patterns in logs—not perfect, but good enough to be useful.

You don’t need embeddings, vector databases, or complex RAG pipelines for real-time monitoring. A simple prompt and a streaming log feed can catch issues before traditional metrics do.

Start small. Pick your most critical containers. Tune the prompt. Iterate.

If I were to expand this, I’d add:

A feedback loop where I mark false positives, and the system learns over time
Periodic deep analysis using a larger model on stored logs
Correlation with metrics from Grafana to reduce false positives

But for now, this setup has already saved me from two outages and helped me catch three bugs before they hit production. That’s enough to keep running it.

Tech Expert & Vibe Coder

Why I Built This