Tech Expert & Vibe Coder

With 14+ years of experience, I specialize in self-hosting, AI automation, and Vibe Coding – building applications using AI-powered tools like Google Antigravity, Dyad, and Cline. From homelabs to enterprise solutions.

Implementing Container Resource Limits to Prevent Memory Leaks from Crashing Your Docker Host

Why I Started Caring About Container Resource Limits

I run a Proxmox server at home with about a dozen Docker containers. Most are small utilities—DNS filtering, monitoring tools, a media server. For months, everything ran fine. Then one morning, I couldn't SSH into the box.

The VPN connected, but the SSH timeout just hung. I could ping the server, but nothing responded. I had to physically reboot it.

When it came back up, I checked the logs. One container—a web scraper I'd written for processing RSS feeds—had ballooned from 200MB to over 12GB of memory. It had a memory leak I didn't know about. The kernel's OOM Killer eventually stepped in, but by then the damage was done. SSH was unresponsive, and three other containers had crashed.

That's when I learned: Docker containers have no resource limits by default. A single runaway process can consume everything.

What Actually Happened

Docker uses Linux cgroups to manage resources, but unless you explicitly set limits, containers can take whatever they want. My scraper had been running for weeks, slowly leaking memory with each batch job. When it finally hit the physical RAM limit, the system started thrashing swap, the disk I/O spiked, and everything ground to a halt.

The Linux kernel has a mechanism called the OOM Killer (Out Of Memory Killer). When the system runs out of memory, it picks a process to terminate. It assigns each process an oom_score—higher scores get killed first. Docker is smart enough to lower its own daemon's score, so your containers usually get axed instead.

When a container gets killed this way, you see Exit Code 137 in the logs. That's 128 + 9 (SIGKILL)—the kernel forcibly terminated it.

In my case, the container didn't get killed fast enough. By the time the OOM Killer acted, the server was already locked up.

How I Fixed It: Memory Limits

The solution was straightforward: set a hard memory limit on every container. If a container tries to exceed that limit, the OOM Killer takes it down immediately—before it can drag the host with it.

I use the -m or --memory flag when starting containers:

docker run -m 512m my-scraper

This sets a hard cap at 512MB. If the container tries to use more, it gets killed. Not gracefully—just terminated. But that's better than crashing the entire server.

For my scraper, I ran some tests to see its normal memory usage under load. It typically sat around 150-200MB. I set the limit to 400MB to give it headroom. When the leak happened again, the container died at 400MB instead of 12GB. The server stayed up.

Understanding Swap Behavior

There's a second parameter: --memory-swap. This one confused me at first. It doesn't set the swap size—it sets the total of memory + swap.

If you run:

docker run -m 512m --memory-swap 1g my-app

The container gets 512MB of RAM and 512MB of swap (total 1GB). If you don't set --memory-swap, Docker defaults to doubling the memory value. So -m 512m actually allows up to 1GB total (512m memory + 512m swap).

I disable swap on most of my containers now:

docker run -m 512m --memory-swap 512m my-app

This makes the memory limit stricter. Swap thrashing can kill disk performance, especially on my Synology NAS where some containers run. I'd rather have a container die cleanly than drag down the disk I/O for everything else.

CPU Limits Work Differently

CPU limits don't kill containers—they just throttle them. A container that exceeds its CPU quota gets slowed down, not terminated.

I use the --cpus flag:

docker run --cpus="1.5" my-app

This allows the container to use up to 1.5 CPU cores worth of compute time. If it tries to use more, the kernel scheduler throttles it.

I had a case where a container running n8n workflows got into an infinite loop. Without limits, it pegged all 4 cores on my Proxmox VM. SSH became sluggish, other containers slowed down. After adding --cpus="2", the runaway process could only consume 2 cores max. The server stayed responsive.

CPU Shares vs Hard Limits

There's another parameter: --cpu-shares. This sets relative priority, not a hard cap. It only matters when CPU resources are actually constrained.

docker run --cpu-shares 2048 important-app
docker run --cpu-shares 1024 background-task

If both containers are competing for CPU, the first one gets twice as much time. But if the CPU is idle, both can run at full speed.

I use this for my monitoring stack. Prometheus gets higher shares than the Grafana renderer, so queries stay responsive even when dashboards are being generated.

How I Monitor Resource Usage

Setting limits is only useful if you know what your containers actually consume. I use a few tools to track this.

docker stats

The simplest option. Just run:

docker stats

It shows real-time CPU, memory, network, and disk I/O for all running containers. I keep this open in a tmux pane when I'm testing new containers or troubleshooting.

The output looks like:

CONTAINER ID   NAME          CPU %   MEM USAGE / LIMIT   MEM %
abc123         my-scraper    12.5%   287MiB / 400MiB     71.75%

It's not persistent—just a live view. But it's enough to spot obvious problems.

cAdvisor

For longer-term tracking, I run cAdvisor in a container. It collects metrics from all other containers and exposes them via a web UI.

docker run -d \
  --name=cadvisor \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:ro \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8080:8080 \
  gcr.io/cadvisor/cadvisor:latest

I access it at http://proxmox-ip:8080. It shows historical graphs for each container—CPU, memory, network, disk. I can see trends over days or weeks.

This is how I caught the memory leak in my scraper. The graph showed a steady climb over 10 days, which wouldn't have been obvious from docker stats alone.

Prometheus and Grafana

For my main server, I run Prometheus to scrape cAdvisor metrics and Grafana for dashboards. This is overkill for a home setup, but I already had it running for other monitoring.

Prometheus scrapes cAdvisor every 15 seconds. I set up alerts in Alertmanager to notify me when a container's memory usage crosses 80% of its limit. That gives me time to investigate before the OOM Killer steps in.

The Grafana dashboard shows all containers on one screen. I can spot anomalies quickly—like when one container's CPU suddenly spikes or memory starts climbing.

What I Learned From Failures

I've crashed my Docker host more than once. Here's what I know now:

1. Set limits on everything

Even containers you think are safe. My scraper was supposed to be a simple utility. I didn't think it needed limits. I was wrong.

Now every container gets at least a memory limit. I use docker-compose for most services, so I add this to every service definition:

services:
  my-app:
    image: my-app:latest
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: '1.0'

2. Test limits under load

Don't guess. Run your container under realistic load and measure its resource usage. Then set limits with some headroom.

For my n8n instance, I ran a few hundred workflows in parallel to simulate heavy usage. Peak memory was around 600MB. I set the limit to 1GB.

3. Monitor consistently

I check cAdvisor weekly now. If I see memory usage climbing steadily, I investigate. Sometimes it's a legitimate increase (like more data being processed). Sometimes it's a leak.

4. Restart policies matter

When a container gets OOM-killed, Docker's restart policy determines what happens next. I use restart: unless-stopped for most services. The container restarts automatically, which is usually fine for stateless apps.

For databases or stateful services, I use restart: on-failure and set up proper monitoring. I don't want a database restarting in a loop because of a memory leak.

What Still Doesn't Work Well

Resource limits aren't perfect. Here are the rough edges:

Hard to predict some workloads

My Cronicle job scheduler runs various scripts. Some use 50MB, others use 2GB. Setting a single memory limit is tricky. I ended up setting it high (4GB) and accepting the risk. I monitor it closely.

Swap is a blunt tool

Disabling swap makes limits stricter, but some apps genuinely benefit from a bit of swap. I haven't found a good middle ground. I either disable it entirely or leave the default (2x memory).

No way to limit disk I/O easily

Docker has --device-read-bps and --device-write-bps, but they're device-specific and awkward to use. I haven't found a clean way to prevent one container from saturating disk I/O. This is a problem on my Synology where disk is shared across many containers.

Key Takeaways

After a few hard crashes and some trial and error, here's what I do now:

  • Every container gets a memory limit. No exceptions.
  • I disable swap (--memory-swap equals --memory) unless I have a specific reason not to.
  • CPU limits go on anything that might run unchecked (scrapers, batch jobs, workflow engines).
  • I run cAdvisor and check it weekly for trends.
  • I test limits under realistic load before deploying.

The goal isn't to prevent memory leaks or runaway processes—those are application bugs. The goal is to contain the damage when they happen. A single crashed container is annoying. A crashed server is a disaster.

Resource limits are the difference between those two outcomes.