Why I Started Looking at Network Traffic in My Homelab
I run a Proxmox homelab with multiple VMs, Docker containers, and services exposed through reverse proxies. Over time, I noticed odd patterns: sudden spikes in outbound connections from containers that should be idle, DNS queries to domains I didn't recognize, and occasional SSH brute-force attempts logged in fail2ban. Nothing catastrophic, but enough to make me uncomfortable.
I wanted visibility into what was actually happening on my network. Not just firewall logs or basic packet captures, but something that could identify patterns that looked wrong. The problem was that traditional intrusion detection systems either required expensive hardware, relied on signature databases that went stale, or generated so many false positives that I'd ignore them.
I started wondering if I could use a local LLM to analyze network traffic patterns in real-time. Not to replace proper security tools, but to add another layer of detection that could learn what "normal" looked like for my specific setup.
My Initial Setup and Reality Check
I'm running Proxmox on a Dell R720 with 128GB RAM. I already had Ollama running locally with llama3.2 for other automation tasks. My network stack includes pfSense for routing, Traefik as a reverse proxy, and various Docker containers on different VLANs.
The first thing I tried was feeding raw tcpdump output directly to the LLM. This failed immediately. The model couldn't make sense of packet-level data at scale, and the context window filled up in seconds. I was asking it to do something it wasn't designed for.
I stepped back and realized I needed to pre-process the traffic into features that an LLM could actually reason about: connection counts per IP, protocol distributions, timing patterns, DNS query volumes, and failed connection attempts. This meant building a pipeline, not just piping tcpdump into a prompt.
What I Actually Built
I created a three-stage system:
Stage 1: Traffic Aggregation
I configured pfSense to send flow data (NetFlow) to a collector running in a Docker container. I didn't use full packet captures because the volume was too high and I didn't need payload data. Flow records gave me source/destination IPs, ports, protocols, byte counts, and timing without storing everything.
I wrote a Python script that aggregated these flows into 5-minute windows and extracted features:
- Unique destination IPs per source
- Connection attempts vs. established connections ratio
- Port scan indicators (many ports, few packets)
- DNS query patterns (volume, failed lookups, unusual TLDs)
- Traffic timing (sudden bursts, regular intervals suggesting beaconing)
Stage 2: Baseline Learning
Before feeding anything to the LLM, I needed to establish what normal looked like. I collected a week of traffic data during regular usage: backups running, media streaming, containers updating, my own SSH sessions.
I stored these aggregated features in a simple SQLite database. The LLM wasn't doing the baseline detection—I used basic statistical methods (mean, standard deviation, percentiles) to flag anomalies. The LLM's job was to interpret whether flagged anomalies were actually suspicious or just unusual but legitimate.
Stage 3: LLM Analysis
When the system detected statistical anomalies, it generated a structured summary and sent it to Ollama with llama3.2. The prompt included:
- The anomaly details (which metrics spiked, by how much)
- Recent baseline context (what's normal for this time of day)
- Historical similar events (if any were previously classified)
The LLM's output was a simple classification: "likely benign", "investigate", or "suspicious". It also provided reasoning, which helped me tune the system over time.
What Actually Worked
The system caught things I would have missed:
A container I'd forgotten about was making regular connections to an IP in a range I didn't recognize. The LLM flagged it as "investigate" because the timing was too regular (every 6 hours) and the destination wasn't in my usual traffic patterns. Turns out it was a health check endpoint for a service I'd set up months ago and forgotten. Not malicious, but good to know.
I had a Raspberry Pi running Pi-hole that suddenly started making hundreds of DNS queries to random subdomains under a single domain. The LLM correctly identified this as suspicious because the query pattern didn't match normal recursive DNS behavior. I found that a script I'd written for monitoring had a bug causing a DNS amplification loop.
The most useful aspect was the LLM's ability to correlate multiple weak signals. A single anomaly might be noise, but when it saw: increased outbound connections + unusual port usage + failed authentication attempts in the same window, it correctly flagged that as worth investigating.
What Didn't Work
Real-time analysis was a fantasy. Even with aggregated features, running inference on every 5-minute window created too much latency. I had to batch anomalies and run analysis every 15-30 minutes. For actual attacks, this would be too slow. The system is better suited for post-incident analysis or catching slow, persistent threats.
False positives were still a problem. The LLM would flag legitimate but unusual activity—like when I spun up a new VM and it immediately started updating packages, generating a spike in outbound connections. I had to add context about known events (backup windows, update schedules) to reduce noise.
The model had no memory between runs. Each analysis was independent, so it couldn't learn from my feedback over time. I ended up maintaining a manual "known patterns" file that got included in the prompt, but this was clunky.
Resource usage was higher than expected. Running llama3.2 inference every 15 minutes added noticeable CPU load on my Proxmox host. I had to allocate dedicated cores to the Ollama VM to prevent impact on other services.
Key Limitations I Discovered
This approach only works if you have stable baseline traffic. If your homelab is constantly changing (new services, different usage patterns), the anomaly detection becomes unreliable. You need at least a week of consistent data to start.
The LLM doesn't replace proper security tools. It can't detect zero-day exploits, analyze packet payloads for malware, or block attacks in real-time. It's a supplementary analysis layer, not a firewall or IDS.
Privacy matters here. I'm sending network metadata to a local LLM, which is fine because it never leaves my network. If you were using a cloud-based model, you'd be exposing your network topology and traffic patterns to a third party. That's a non-starter for me.
The system requires ongoing tuning. As my network usage evolves, I have to update baselines and refine what gets flagged as anomalous. It's not a "set and forget" solution.
What I Learned
Local LLMs can add value to network monitoring, but only when used correctly. They're good at pattern interpretation and correlating multiple signals, not at raw data processing or real-time detection.
Pre-processing is everything. The LLM needs structured, aggregated data with context, not raw packet dumps. The more work you do before the inference step, the better the results.
This approach works best for detecting slow, persistent anomalies—things that traditional signature-based systems miss because they don't match known attack patterns. It's terrible at stopping fast-moving threats.
Running AI models locally for security analysis is feasible on homelab hardware, but you need to be realistic about resource requirements and latency. This isn't production-grade security; it's an experimental layer for learning and gaining visibility.
The biggest benefit wasn't catching attacks—it was understanding my own network better. Seeing what the system flagged forced me to document services, clean up forgotten containers, and tighten firewall rules. The security improvement came more from that process than from the detection itself.