Implementing Container Egress Filtering with eBPF: Blocking...

Why I worked on this

I run a handful of public-facing containers on the same Docker host that also hosts my internal stuff—CI, monitoring, home-automation, the works. Docker’s default “open egress” policy started to feel reckless: every container can reach the whole LAN and the internet. I don’t run Kubernetes here, so network policies aren’t an option. I wanted something that:

works on a single-node Docker box,
doesn’t need another overlay network or service mesh,
can be dropped in without rebuilding images,
and shows up in the logs so I can tune it.

eBPF can hook syscalls at the kernel level, so I decided to see if I could use it to block outbound connections from specific containers without touching Docker’s network stack.

My real setup or context

Host: Proxmox VM, Debian 12, 6.1 kernel, cgroup v2 enabled.
Docker: 24.0.5, rootless mode disabled (I still run the daemon as root—old habits).
Tooling: I picked bpftool and libbpf instead of BCC—smaller, no Python runtime on the host.
Target containers: two public nginx containers (labels egress=block) and one internal Grafana container (label egress=allow).

What worked (and why)

1. Attach to `connect` syscall inside the container’s cgroup

eBPF programs can be pinned to a cgroup. When any process in that cgroup calls connect(), the program runs. I compile a tiny BPF program that:

reads the destination IP from the socket,
checks a pinned map of “allowed” prefixes,
returns -EACCES if the IP isn’t in the map.

// connect_block.c  (snippet)
SEC("cgroup/connect4")
int block_egress(struct bpf_sock_addr *ctx)
{
    __u32 ip = ctx->user_ip4;
    __u32 *allowed = bpf_map_lookup_elem(&allow_map, &ip);
    return allowed ? 1 : -EACCES;
}

2. Load and pin the program once

bpftool prog load connect_block.o /sys/fs/bpf/connect_block \
        type cgroup/connect4 \
        pinmaps /sys/fs/bpf/maps/

3. Attach the program to the container’s cgroup

Docker puts every container under /sys/fs/cgroup/system.slice/docker-<container-id>.scope. I wrote a 20-line Go helper that:

watches the Docker event API for container starts,
reads its labels,
if egress=block, opens the cgroup path and attaches the pinned program.

echo 2689 > /sys/fs/bpf/maps/cgroup_map  # 2689 is the cgroup inode

4. Populate the allow-list map

I keep a simple text file:

10.0.0.0/8
172.16.0.0/12
192.168.0.0/16
1.1.1.1/32

A cron job runs every hour and rewrites the pinned map via bpftool map update. No hot-reload drama.

5. Logs you can grep

The kernel prints audit: type=1400 ... denied egress ... every time the program returns -EACCES. I ship those to Loki and alert on spikes.

What didn’t work

Trying to hook at the Docker network namespace level. You can attach to veth egress tc hooks, but the program runs after NAT, so the source IP is already SNAT-ed to the host. That broke my “per-container” rule idea.
BCC Python tools. Pulls in 70 MB of dependencies and needs the kernel headers mounted inside the container that runs the tool. Overkill for a single-node box.
Using iptables with docker0 bridge. Works until Docker decides to recreate the bridge after an upgrade—learned that the hard way at 3 a.m.
Pinning the same program to cgroup/connect6. My first compile forgot to zero the IPv6 flowlabel field; the verifier rejected the program with an opaque “invalid mem access” message. Took two evenings of single-stepping with bpftool prog load ... verif to spot it.

Key takeaways

cgroup-bpf is the smallest choke point I found that still keeps the policy inside the kernel—no extra bridge, no iptables save/restore dance.
Pinning maps in /sys/fs/bpf survives daemon restarts; that means you can update the allow-list without re-attaching the program.
Docker’s cgroup path is predictable, but only if you stick to the default systemd cgroup driver. Switch to cgroupfs and the path changes—my helper script now double-checks /proc/<pid>/cgroup instead of guessing.
IPv6 is easy to forget. If you block v4 only, a container can still tunnel out via v6. I added a second program for connect6 and duplicated the allow-list logic.
The overhead is invisible in my tests; even a 2 kB response payload fetch from inside the container still averages sub-millisecond connect times. But I only run ~30 containers—YMMV on a dense multi-tenant box.

I still don’t have a slick UI to edit the allow-list—SSH and vim suffice for now. One day I’ll wrap it in a small web form, but only after I catch myself editing the file more than twice a week.

Tech Expert & Vibe Coder

Why I worked on this

My real setup or context

What worked (and why)

1. Attach to `connect` syscall inside the container’s cgroup

2. Load and pin the program once

3. Attach the program to the container’s cgroup

4. Populate the allow-list map

5. Logs you can grep

What didn’t work

Key takeaways

Category:

Building Zero-Trust Container...

Fixing Docker Compose Volume...

Categories

Related Posts

Building Zero-Trust Container Networks with mTLS:...

Fixing Docker Compose Volume Mount Performance on...

Debugging Docker DNS Resolution Failures in...

About Me

Vipin PG

Tech Expert & Vibe Coder

Implementing Container Egress Filtering with eBPF: Blocking Unauthorized Outbound Connections in Docker Without Network Policies

Why I worked on this

My real setup or context

What worked (and why)

1. Attach to connect syscall inside the container’s cgroup

2. Load and pin the program once

3. Attach the program to the container’s cgroup

4. Populate the allow-list map

5. Logs you can grep

What didn’t work

Key takeaways

Category:

Building Zero-Trust Container...

Fixing Docker Compose Volume...

Subscribe to Newsletter

Categories

Related Posts

Building Zero-Trust Container Networks with mTLS:...

Fixing Docker Compose Volume Mount Performance on...

Debugging Docker DNS Resolution Failures in...

About Me

Vipin PG

1. Attach to `connect` syscall inside the container’s cgroup