Why I Set This Up
I run local LLM inference on consumer GPUs in Docker containers on my Proxmox host. These aren't datacenter cards—they're gaming GPUs repurposed for AI work. The problem I kept hitting was thermal throttling during long inference runs. The card would spike to 85°C+, performance would drop, and I wouldn't know until the model started responding slowly or the container logs showed thermal warnings.
I needed automated alerts before things got critical. Not enterprise monitoring with a full observability stack—just reliable temperature tracking that would notify me when a GPU was running too hot.
My Setup
Here's what I'm working with:
- Proxmox host running Docker containers for LLM inference (Ollama, text-generation-webui)
- NVIDIA RTX 3090 passed through to containers via nvidia-container-toolkit
- Prometheus already running in a container for other host metrics
- No Kubernetes—this is plain Docker on a single node
The goal was simple: scrape GPU temperature metrics and send alerts when thresholds are crossed.
DCGM Exporter for GPU Metrics
NVIDIA's DCGM (Data Center GPU Manager) includes an exporter that exposes GPU metrics in Prometheus format. Even though my setup isn't a datacenter, DCGM works fine on consumer cards.
I deployed it as a Docker container alongside my LLM containers:
docker run -d \ --name dcgm-exporter \ --gpus all \ --restart unless-stopped \ -p 9400:9400 \ nvcr.io/nvidia/k8s/dcgm-exporter:3.1.8-3.1.5-ubuntu22.04
Key points from my experience:
- The
--gpus allflag is required to expose GPU access to the container - Port 9400 is the default metrics endpoint
- The image version matters—older versions had issues with RTX 30-series cards
Once running, I verified metrics were available:
curl http://localhost:9400/metrics | grep temperature
This returned several temperature metrics. The one I needed was DCGM_FI_DEV_GPU_TEMP, which reports the current GPU temperature in Celsius.
Configuring Prometheus Scraping
I added the DCGM exporter as a scrape target in my Prometheus config. Since I run Prometheus in Docker too, I edited the prometheus.yml mounted into the container:
scrape_configs:
- job_name: 'gpu-metrics'
scrape_interval: 5s
static_configs:
- targets: ['host.docker.internal:9400']
labels:
instance: 'proxmox-gpu-node'
Notes on this config:
scrape_interval: 5schecks temperature every 5 seconds. I tried 1s initially but it was overkill and added unnecessary loadhost.docker.internalis how Docker containers reference the host's localhost. On Linux, I had to add--add-host=host.docker.internal:host-gatewayto the Prometheus container's run command- The
instancelabel helps when I eventually add more nodes
After restarting Prometheus, I confirmed the target was up in the Prometheus UI at http://localhost:9090/targets.
Creating Temperature Alert Rules
Prometheus alert rules are defined in a separate file. I created gpu_alerts.yml and mounted it into the Prometheus container:
groups:
- name: gpu_temperature
interval: 10s
rules:
- alert: GPUTemperatureHigh
expr: DCGM_FI_DEV_GPU_TEMP > 80
for: 2m
labels:
severity: warning
annotations:
summary: "GPU temperature high on {{ $labels.instance }}"
description: "GPU temperature is {{ $value }}°C (threshold: 80°C)"
- alert: GPUTemperatureCritical
expr: DCGM_FI_DEV_GPU_TEMP > 85
for: 30s
labels:
severity: critical
annotations:
summary: "GPU temperature critical on {{ $labels.instance }}"
description: "GPU temperature is {{ $value }}°C (threshold: 85°C)"
Why these thresholds:
- 80°C warning: My RTX 3090 starts thermal throttling around 82-83°C. I wanted advance notice
- 85°C critical: At this point, performance is degraded and the card is close to its thermal limit
for: 2mon the warning prevents alerts during brief spikes (like model loading)for: 30son critical because if it hits 85°C, I want to know immediately
I referenced this file in the main Prometheus config:
rule_files: - '/etc/prometheus/gpu_alerts.yml'
Setting Up Alertmanager
Prometheus detects alert conditions, but Alertmanager handles notifications. I run Alertmanager in another Docker container:
docker run -d \ --name alertmanager \ --restart unless-stopped \ -p 9093:9093 \ -v /opt/alertmanager/config.yml:/etc/alertmanager/config.yml \ prom/alertmanager:latest
My Alertmanager config sends alerts to a Discord webhook (I tried email first, but relay setup was a pain):
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'instance']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'discord'
receivers:
- name: 'discord'
webhook_configs:
- url: 'http://host.docker.internal:9094/webhook'
Discord doesn't directly support Prometheus webhook format, so I run a small bridge service (prometheus-discord-bridge) that translates and forwards to Discord. It's a third container, but it works reliably.
In my Prometheus config, I pointed to Alertmanager:
alerting:
alertmanagers:
- static_configs:
- targets: ['host.docker.internal:9093']
What Worked
After running this setup for several weeks:
- Alerts fire consistently when temperature crosses thresholds
- The 2-minute delay on warnings eliminated false positives from brief load spikes
- I caught two cases where my LLM container's cooling wasn't adequate—the alerts let me adjust fan curves before thermal throttling became a pattern
- DCGM exporter has been stable with no restarts needed
The Discord notifications work well for my use case. I get a ping on my phone, can check Grafana if needed, and decide whether to intervene.
What Didn't Work
Initial attempts had issues:
Wrong DCGM version: I first tried version 2.x of the exporter. It wouldn't recognize my RTX 3090 correctly and reported zero for most metrics. Upgrading to 3.x fixed this.
Scrape interval too aggressive: I started with 1-second scraping. Prometheus CPU usage jumped noticeably, and I was generating far more data than I needed. 5 seconds is plenty for temperature monitoring.
Alert fatigue from no grouping: My first Alertmanager config didn't group alerts. During a sustained high-temp period, I got spammed with repeat notifications. Adding group_by and repeat_interval fixed this.
Host networking confusion: Getting containers to talk to each other and the host took trial and error. host.docker.internal works on Docker Desktop automatically, but on plain Docker Engine (which I use), I had to explicitly add the host-gateway mapping.
Limitations and Trade-offs
This setup has clear boundaries:
- It only monitors temperature. Other GPU metrics (power draw, memory usage, utilization) are available from DCGM but I'm not alerting on them yet
- No automatic remediation. The alert tells me there's a problem, but I have to manually check container logs, adjust cooling, or stop inference jobs
- Single point of failure: if the DCGM exporter container dies, I lose all GPU visibility. I should add a meta-alert for missing metrics
- Consumer GPU support in DCGM is unofficial. NVIDIA documents this for datacenter cards, and while it works on RTX, there's no guarantee future driver updates won't break something
Key Takeaways
From actually running this system:
- DCGM exporter works fine on consumer GPUs despite being designed for datacenter use
- Temperature monitoring is essential for sustained LLM inference on hardware not designed for 24/7 compute loads
- Alert thresholds need tuning based on your specific GPU and workload—80°C might be too conservative for some cards, too late for others
- The
forduration in alert rules is critical to avoid noise from transient spikes - Prometheus + Alertmanager is overkill if you only need GPU monitoring, but if you're already running it for other metrics, adding GPU alerts is straightforward
This setup gives me confidence to leave inference jobs running unattended. I know I'll get notified before thermal issues cause performance degradation or hardware stress.