Why I Hit This Wall
I run a Proxmox cluster with several Ubuntu LXC containers hosting Docker Compose stacks. These aren't fancy—mostly n8n, monitoring tools, a few databases, and some internal services I've built over time. They've been stable for years.
When I upgraded one of my containers from Ubuntu 22.04 to 24.04, everything broke within hours. Containers wouldn't start. Some would start but immediately OOM kill. Others would hang during initialization. The logs were useless—just generic memory allocation failures.
I assumed I'd misconfigured something during the upgrade. Spent two days rechecking Docker versions, Compose files, and systemd units. Nothing obvious. The same stacks worked fine on my 22.04 containers.
The real problem: Ubuntu 24.04 ships with kernel 6.x, which enforces cgroup v2 memory accounting differently than the 5.x kernels I'd been running. My Docker setup, which had worked for years, was suddenly incompatible.
What Changed Under the Hood
Ubuntu 24.04 uses kernel 6.8 by default. This kernel version tightened how cgroup v2 handles memory limits, particularly around:
- memory.high and memory.max enforcement
- How swap is accounted when
memory.swap.maxisn't explicitly set - The interaction between Docker's
--memoryflag and systemd's cgroup slice limits
In my case, I had Docker Compose files with mem_limit set on some services but not others. Under kernel 5.x, Docker would apply those limits loosely. If a container needed more memory temporarily, the kernel would allow it as long as system memory was available.
Kernel 6.x doesn't do that. It enforces limits strictly. If you set mem_limit: 512m and the container tries to allocate 513m, it gets killed. No grace period. No soft limit behavior.
Worse, if you don't set a limit, Docker now inherits the cgroup slice limit from systemd. On my LXC containers, systemd was capping Docker's total memory usage at the container's allocated RAM. That meant all running containers had to share that pool, and there was no swap fallback because I hadn't configured memory.swap.max.
How I Diagnosed It
First, I checked kernel versions across my containers:
uname -r
The broken container was running 6.8.0-49-generic. The working ones were still on 5.15.0-x.
Next, I looked at cgroup version:
mount | grep cgroup
Both were using cgroup v2, but the behavior was different. I suspected memory accounting.
I then inspected a failing container's cgroup memory stats:
docker inspect <container_id> | grep -i memory
cat /sys/fs/cgroup/system.slice/docker-<container_id>.scope/memory.current
cat /sys/fs/cgroup/system.slice/docker-<container_id>.scope/memory.max
The memory.max value was lower than I expected. It wasn't coming from my Compose file—it was inherited from the parent slice.
I checked systemd's slice limits:
systemctl show docker.service | grep Memory
There it was: MemoryMax was set to the LXC container's total RAM. Every Docker container was fighting for a share of that fixed pool.
What Didn't Work
My first attempt was to remove all mem_limit directives from Compose files. I thought maybe the limits were just too low. That made things worse—containers would start, consume all available memory, and crash the entire Docker daemon.
Next, I tried setting memory.swap.max manually in the cgroup:
echo 2G > /sys/fs/cgroup/system.slice/docker.service/memory.swap.max
That failed with a permission error. Turns out you can't set swap limits on a slice that's already active without reloading systemd.
I also tried downgrading Docker. Didn't help. The issue was kernel-side, not Docker-side.
What Actually Fixed It
I had to do three things:
1. Override systemd's Docker service memory limit
I created a systemd drop-in file:
mkdir -p /etc/systemd/system/docker.service.d
nano /etc/systemd/system/docker.service.d/memory.conf
Contents:
[Service]
MemoryMax=infinity
MemoryHigh=infinity
This removed the inherited cgroup limit from the LXC container. Docker could now use all available memory.
systemctl daemon-reload
systemctl restart docker
2. Set explicit memory limits in Compose files
For every service, I added:
deploy:
resources:
limits:
memory: 1G
reservations:
memory: 512M
I used the deploy syntax instead of the old mem_limit because it's more explicit about hard vs. soft limits. The reservations value tells Docker to guarantee that much memory. The limits value is the hard cap.
I set these based on actual usage, not guesses. I ran docker stats for a week on the working 22.04 containers and noted peak memory usage for each service.
3. Enable swap accounting in the kernel
I added this to /etc/default/grub:
GRUB_CMDLINE_LINUX="swapaccount=1 cgroup_enable=memory"
Then:
update-grub
reboot
This allowed the kernel to track swap usage per cgroup. Without it, containers would OOM kill even when swap was available.
Why This Matters for LXC Containers
If you're running Docker inside Proxmox LXC containers, you're adding another layer of cgroup nesting. Proxmox sets memory limits on the LXC container. Systemd inside the container sets limits on Docker. Docker sets limits on individual containers.
Under kernel 5.x, these layers were loosely enforced. Kernel 6.x enforces them strictly, in order. If any layer is too restrictive, everything below it breaks.
The fix is to be explicit at every layer:
- Set the LXC container's memory high enough to handle peak Docker usage
- Remove or raise systemd's Docker service limit
- Set per-container limits in Compose files
Don't rely on defaults. They changed.
What I'm Still Watching
After the fix, everything stabilized. But I'm monitoring a few things:
- Memory pressure: I added a Prometheus exporter to track cgroup memory stats. If
memory.pressurespikes, I know a container is hitting its limit. - OOM kills: I set up alerts in Grafana for kernel OOM events. If a container gets killed, I get notified immediately.
- Swap usage: I'm tracking how much swap Docker actually uses now that it's enabled. So far, it's minimal—most services stay within their limits.
I also haven't tested this on kernel 6.10+ yet. I'm staying on 6.8 until I see how the next LTS release handles it.
Key Takeaways
Kernel 6.x changed how cgroup v2 memory limits work. If you're running Docker on Ubuntu 24.04 or any distro with a 6.x kernel, expect breakage if you relied on loose memory enforcement.
The fix isn't one thing—it's a combination of systemd overrides, explicit Compose limits, and kernel parameters. You can't skip any of them.
If you're running Docker inside LXC containers, test the upgrade on a non-critical container first. Don't assume your old Compose files will work.
And if you see random OOM kills after upgrading, check cgroup memory limits before blaming Docker. The kernel is doing exactly what it's supposed to do—you just didn't tell it the right limits.