Why I Built This
I run multiple services across my home lab—Proxmox containers, Docker stacks, Synology apps—and I got tired of discovering failures hours after they happened. SSH-ing into each machine to check logs or spotting issues only when something breaks downstream felt reactive and sloppy.
I already had n8n running for workflow automation and Home Assistant managing notifications. I wanted failed systemd services and high server load to trigger immediate alerts through Home Assistant, which could then ping my phone or turn on a specific light in my office.
This isn't about monitoring dashboards I never look at. It's about getting actionable alerts only when something actually needs my attention.
My Setup
Here's what I was working with:
- n8n: Self-hosted in a Docker container on my main Proxmox node
- Home Assistant: Running in a separate LXC container, accessible via local network
- Target servers: Mix of Debian-based LXC containers and Ubuntu VMs running various services
- Systemd services I care about: Docker daemon, Nginx, Syncthing, Cronicle scheduler, Adguard Home
I needed a way for these servers to report failures to n8n, which would then decide whether to trigger Home Assistant automations.
Setting Up n8n Webhooks
I created a new workflow in n8n with a Webhook node as the trigger. Under the webhook settings:
- HTTP Method: POST
- Path:
/system-alert - Authentication: Header Auth with a simple token I generated
The webhook URL ended up looking like: http://192.168.1.50:5678/webhook/system-alert
I added the auth token in the workflow settings so any incoming POST request without the correct X-Auth-Token header would be rejected. This isn't enterprise-grade security, but it's enough to prevent accidental triggers from other scripts on my network.
Webhook Payload Structure
I decided on a simple JSON payload format that my scripts would send:
{
"alert_type": "service_failure",
"hostname": "proxmox-lxc-01",
"service_name": "docker",
"status": "failed",
"timestamp": "2024-01-15T14:23:00Z"
}
For load alerts:
{
"alert_type": "high_load",
"hostname": "synology-nas",
"load_avg": "8.45",
"threshold": "5.0",
"timestamp": "2024-01-15T14:30:00Z"
}
Keeping it flat and predictable made the n8n logic easier to write.
Monitoring Systemd Service Failures
I wrote a small bash script that runs on each server via cron. It checks specific systemd services and posts to the n8n webhook if any are failed.
#!/bin/bash
WEBHOOK_URL="http://192.168.1.50:5678/webhook/system-alert"
AUTH_TOKEN="your-secret-token-here"
HOSTNAME=$(hostname)
SERVICES=("docker" "nginx" "syncthing@vipin" "adguardhome")
for SERVICE in "${SERVICES[@]}"; do
STATUS=$(systemctl is-active "$SERVICE")
if [ "$STATUS" != "active" ]; then
PAYLOAD=$(cat <
I saved this as /usr/local/bin/check-services.sh, made it executable, and added a cron job to run it every 5 minutes:
*/5 * * * * /usr/local/bin/check-services.sh
This approach is simple and doesn't require installing monitoring agents. The script only sends data when something is wrong, so n8n isn't flooded with "everything is fine" messages.
Monitoring Server Load
For load monitoring, I wrote a separate script that checks the 5-minute load average and compares it to a threshold based on CPU count.
#!/bin/bash
WEBHOOK_URL="http://192.168.1.50:5678/webhook/system-alert"
AUTH_TOKEN="your-secret-token-here"
HOSTNAME=$(hostname)
CPU_COUNT=$(nproc)
THRESHOLD=$(echo "$CPU_COUNT * 0.8" | bc)
LOAD_AVG=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $2}' | tr -d ',')
if (( $(echo "$LOAD_AVG > $THRESHOLD" | bc -l) )); then
PAYLOAD=$(cat <
I set this to run every 10 minutes. The threshold is 80% of available CPU cores, which works well for my setup. On a 4-core machine, it alerts if load exceeds 3.2.
Processing Alerts in n8n
Back in the n8n workflow, after the Webhook trigger, I added a Switch node to route different alert types:
- Route 1:
alert_typeequalsservice_failure - Route 2:
alert_typeequalshigh_load
For service failures, I added a Function node to format the alert message:
const hostname = $input.item.json.hostname;
const service = $input.item.json.service_name;
const status = $input.item.json.status;
return {
json: {
message: `Service ${service} on ${hostname} is ${status}`,
entity_id: "input_boolean.system_alert"
}
};
For high load alerts, similar logic:
const hostname = $input.item.json.hostname;
const load = $input.item.json.load_avg;
const threshold = $input.item.json.threshold;
return {
json: {
message: `High load on ${hostname}: ${load} (threshold: ${threshold})`,
entity_id: "input_boolean.system_alert"
}
};
Triggering Home Assistant Automations
I created an input_boolean in Home Assistant's configuration.yaml:
input_boolean:
system_alert:
name: System Alert Trigger
initial: off
In n8n, after the Function node, I added a Home Assistant node configured to call the input_boolean.turn_on service. The node settings:
- Credentials: Home Assistant API token (long-lived access token from HA)
- Resource: services/input_boolean/turn_on
- Entity ID:
{{ $json.entity_id }}
I also added a second Home Assistant node to send a persistent notification with the formatted message:
- Resource: services/persistent_notification/create
- Message:
{{ $json.message }} - Title: System Alert
Home Assistant Automation
In Home Assistant, I created an automation that watches for the input_boolean state change:
automation:
- alias: "System Alert Handler"
trigger:
- platform: state
entity_id: input_boolean.system_alert
to: "on"
action:
- service: notify.mobile_app_my_phone
data:
message: "System alert triggered - check Home Assistant"
title: "Server Alert"
- service: light.turn_on
entity_id: light.office_desk_lamp
data:
rgb_color: [255, 0, 0]
brightness: 255
- delay: "00:00:05"
- service: input_boolean.turn_off
entity_id: input_boolean.system_alert
The light turning red is my physical indicator that something needs attention. The 5-second delay before resetting the input_boolean prevents rapid re-triggers if multiple alerts come in quick succession.
What Didn't Work
Polling systemd via n8n directly: I initially tried using n8n's SSH node to poll systemd status on each server every few minutes. This worked but felt clunky—n8n became a bottleneck, and managing SSH keys across multiple nodes was annoying. The push-based webhook approach is cleaner.
Using systemd's OnFailure directive: I experimented with OnFailure= in service unit files to trigger a script on failure. This worked for some services but not others, especially third-party services where I didn't want to modify unit files. The cron-based check is less elegant but more reliable across different service types.
Load average alone isn't enough: I learned that load average can spike temporarily during normal operations (backups, media transcoding). I had to add logic to only alert if load stays high across two consecutive checks. I haven't implemented this yet, but I'm considering a simple state file that tracks previous load values.
Home Assistant webhook vs API: I first tried using Home Assistant's webhook automation trigger directly from the bash scripts. This worked but meant duplicating logic between n8n and Home Assistant. Keeping all the routing and formatting in n8n made it easier to adjust without restarting Home Assistant.
Limitations and Trade-offs
This setup has some clear boundaries:
- No historical data: I'm not storing metrics over time. If I want trends or graphs, I'd need to add something like InfluxDB or Prometheus. Right now, I only care about immediate alerts.
- Single point of failure: If n8n goes down, alerts stop. I could add a fallback that sends email directly from the scripts if the webhook fails, but I haven't done that yet.
- Network dependency: Everything relies on local network connectivity. If my router dies, nothing works. That's acceptable for my use case.
- Cron timing: A 5-minute check interval means I might not know about a failure for up to 5 minutes. For critical services, I could reduce this, but it increases load on n8n.
What I'd Do Differently
If I were starting over, I'd add a simple state management layer to prevent alert spam. Right now, if a service stays failed, I get an alert every 5 minutes until I fix it. A state file that tracks "already alerted" status would help.
I'd also consider using systemd timers instead of cron for the monitoring scripts. Timers give better logging and can handle missed runs more gracefully if a server is down during a scheduled check.
For load monitoring, I'd add a moving average check instead of a single threshold. Two consecutive high readings would be more reliable than one spike.
Key Takeaways
- Push beats poll: Having servers report problems to n8n is simpler than having n8n check every server constantly.
- Keep payloads simple: Flat JSON with only necessary fields made debugging easier.
- Home Assistant as the notification layer works well: It already handles mobile notifications, smart home triggers, and persistent alerts. No need to reinvent that.
- Authentication matters even on local networks: A simple token prevents accidental triggers from other scripts.
- Test failure scenarios: I manually stopped services and artificially increased load to make sure alerts fired correctly. Assumptions about systemd behavior were wrong more than once.
This setup isn't perfect, but it's been running for several months now and has caught real problems—Docker daemon crashes, Nginx misconfigurations, and a VM that was thrashing due to a runaway process. The red light in my office has become a reliable signal that something needs my attention, which was the whole point.