Why I Started Debugging Docker DNS
I run most of my services in Docker Compose stacks on Proxmox VMs. For months, everything worked fine—until I started adding more containers that needed to talk to each other. Suddenly, I'd get "bad address" errors or connection timeouts between containers that should have been able to reach each other by name.
The frustrating part was the inconsistency. Sometimes a container would resolve another's name perfectly. Other times, it wouldn't. I needed to understand what was actually happening inside Docker's networking layer.
How Docker's Embedded DNS Actually Works
Every container on a user-defined network gets access to Docker's embedded DNS server at 127.0.0.11. This is not a guess—I verified it by checking /etc/resolv.conf inside running containers:
docker exec my-container cat /etc/resolv.conf
It always showed nameserver 127.0.0.11. This DNS server handles container name resolution and forwards external queries to upstream resolvers.
The critical detail I learned the hard way: this only works on user-defined networks. The default bridge network does not support container name resolution at all. I wasted time trying to ping containers by name on the default bridge before realizing this limitation.
My Real Setup
I run several Docker Compose stacks on separate VMs:
- A monitoring stack (Uptime Kuma, Grafana, Prometheus)
- An automation stack (n8n, Cronicle, PostgreSQL)
- A reverse proxy stack (Traefik, Authelia)
Each stack uses its own user-defined network. Some services need to reach others within the same stack by name. I do not use Docker Swarm or Kubernetes—just plain Docker Compose.
What Actually Failed
Problem 1: Container Names Not Resolving
I had a container trying to connect to postgres by name. It kept failing with "bad address". First, I checked if both containers were on the same network:
docker network inspect my-network | grep -A5 "Containers"
They were. Then I checked if the PostgreSQL container was actually running:
docker ps --filter "name=postgres"
It was stopped. I had restarted the VM earlier and forgot to bring the stack back up. Starting it fixed the issue immediately.
Lesson: DNS resolution only works for running containers. Stopped containers do not resolve, even if they exist in the Compose file.
Problem 2: External DNS Failing Inside Containers
Some containers couldn't resolve external hostnames like github.com. I tested this directly:
docker run --rm alpine ping github.com
It failed. Checking /etc/resolv.conf inside the container showed 127.0.0.11, which was correct. But Docker's embedded DNS wasn't forwarding queries properly.
I checked my host's DNS configuration. My Proxmox VM was using my local DNS server (running Pi-hole), which sometimes blocks or delays certain queries. I added explicit DNS servers to the container:
docker run --dns 8.8.8.8 --rm alpine ping github.com
That worked. For a permanent fix, I added DNS settings to my Compose file:
services:
my-service:
dns:
- 8.8.8.8
- 1.1.1.1
This bypassed my local DNS server for that container. I did not change the Docker daemon settings because I wanted to keep Pi-hole as the default for most containers.
Problem 3: DNS Resolved, But Connection Still Failed
I had a case where ping worked, but the application couldn't connect to the database. I tested the TCP port directly:
docker exec my-app nc -zv postgres 5432
Connection refused. The DNS resolution was fine—the issue was with the PostgreSQL container itself. I checked what ports were actually listening:
docker exec postgres ss -tlnp
PostgreSQL was listening on 127.0.0.1:5432, not 0.0.0.0:5432. This meant it only accepted connections from within its own container, not from other containers on the network.
I changed the PostgreSQL configuration to listen on all interfaces and restarted the container. That fixed it.
Lesson: If DNS works but connections fail, the problem is not DNS. Check the service's bind address and firewall rules.
Tools I Actually Use for Debugging
I keep a small Alpine-based debugging container with network tools installed. I built it once and reuse it whenever I need to troubleshoot:
FROM alpine:3.19 RUN apk add --no-cache bind-tools curl netcat-openbsd iputils CMD ["sleep", "infinity"]
I run it on the same network as the problem container:
docker run -d --name debug --network my-network netdebug docker exec -it debug sh
From there, I can test DNS resolution, ping IPs, check TCP ports, and trace routes without installing tools in production containers.
Commands I Use Most
Check DNS configuration:
docker exec my-container cat /etc/resolv.conf
Test container name resolution:
docker exec my-container nslookup other-container
Test external DNS:
docker exec my-container nslookup google.com
Check what's listening on a port:
docker exec my-container ss -tlnp
Get a container's IP manually:
docker inspect my-container --format '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'
I do not use dig often because nslookup is simpler and already installed in most base images.
What I Learned About Network Configuration
I used to rely on Docker Compose's automatic networking. It creates a default network for each stack and connects all services to it. This works fine until you need more control.
For example, I have a reverse proxy that needs to reach multiple backend stacks. I created a shared external network:
docker network create traefik-public
Then I connected each backend stack to it:
services:
my-app:
networks:
- default
- traefik-public
networks:
traefik-public:
external: true
This lets Traefik resolve backend service names without putting everything on the same network.
I also learned that network aliases are useful when a service needs to respond to multiple names. I use this for database containers that need to be reachable by both a generic name and a specific one:
services:
postgres:
networks:
default:
aliases:
- db
- postgres-primary
Now other containers can reach it as either postgres, db, or postgres-primary.
Mistakes I Made
I initially tried to debug DNS issues by restarting containers repeatedly. This didn't help because the problem was usually configuration, not state.
I also assumed that if ping worked, everything else would work too. That's not true. Ping uses ICMP, which doesn't test TCP connectivity or application-level issues.
I wasted time trying to use --link in Compose files. It's a legacy feature and unnecessary on user-defined networks. Container names resolve automatically—no links needed.
What Works Reliably
Use user-defined networks, not the default bridge. Always.
Name containers explicitly in Compose files. Let Docker's embedded DNS handle resolution.
If external DNS is unreliable, set explicit DNS servers in the Compose file or daemon config.
Test DNS and TCP connectivity separately. Use nslookup for DNS, nc or telnet for TCP.
Keep a debugging container image ready. Install network tools once, reuse it everywhere.
Check /etc/resolv.conf first when DNS fails. It tells you what DNS server the container is using.
Key Takeaways
Docker's embedded DNS is simple and reliable—when you use it correctly. Most DNS issues I've encountered were caused by using the default bridge network, stopped containers, or misconfigured services.
DNS resolution is not the same as connectivity. A name can resolve perfectly and still fail to connect if the service isn't listening on the right interface or port.
Debugging is faster when you separate layers: network, DNS, TCP, application. Test each one independently.
I don't use service discovery tools like Consul for my home setup. Docker's built-in DNS is sufficient for small-scale deployments. If I needed more, I'd consider it, but I haven't hit that limit yet.