Why I Built a Pi-hole Failover Cluster
I run Pi-hole as my primary DNS resolver at home. It blocks ads and trackers across every device on my network without needing per-device configuration. The problem is simple: when Pi-hole goes down—whether for updates, maintenance, or an unexpected crash—DNS resolution stops. No DNS means no internet access for anyone in the house.
I learned this the hard way when I rebooted my Pi-hole VM during a weeknight. My partner was in the middle of a video call. It did not go well.
Running two independent Pi-hole instances with separate IPs seemed like an obvious fix, but that approach has real drawbacks. Clients don't reliably fail over between DNS servers—they often wait for timeouts, causing noticeable delays. And keeping blocklists and settings synchronized between two servers manually is tedious and error-prone.
I needed a proper failover cluster: one virtual IP address that would automatically switch between two Pi-hole instances if the primary went down, with synchronized configuration between them.
My Setup
I run both Pi-hole instances as LXC containers on Proxmox. Each runs Ubuntu 22.04 with Pi-hole 5.x installed. The containers sit on the same VLAN as my client devices.
I assigned them these addresses:
- Primary Pi-hole: 192.168.1.21
- Secondary Pi-hole: 192.168.1.22
- Virtual IP (VIP): 192.168.1.20
Clients are configured to use only 192.168.1.20 as their DNS server. They never touch the real IPs directly.
Synchronizing Configuration with Gravity-Sync
Pi-hole 5 stores blocklists and configuration in a SQLite database at /etc/pihole/gravity.db. I needed a way to keep this database synchronized between both instances without manual intervention.
I used gravity-sync, a script specifically designed for this purpose. It handles syncing blocklists, whitelist entries, and local DNS records. It does not sync admin passwords, upstream DNS servers, or DHCP settings—those I configured manually to match on both instances.
Installing Gravity-Sync
On both containers, I first installed dependencies:
apt update && apt install sqlite3 sudo git rsync ssh -y
I created a service account with sudo privileges on both instances:
sudo useradd -G sudo -m pihole-sync sudo passwd pihole-sync
On the primary Pi-hole, I installed gravity-sync as the primary:
export GS_INSTALL=primary && curl -sSL https://gravity.vmstan.com | bash
On the secondary, I installed it as the secondary:
export GS_INSTALL=secondary && curl -sSL https://gravity.vmstan.com | bash
The secondary installer prompted me for the primary's IP address and the service account username. It also configured passwordless SSH authentication between the two instances.
Testing and Automating Sync
From the secondary Pi-hole, I tested connectivity:
cd ~/gravity-sync ./gravity-sync.sh compare
This verified that the secondary could reach the primary and checked if any sync was needed. Since my primary already had configuration and the secondary was fresh, I forced an initial one-way sync:
./gravity-sync.sh pull
Finally, I enabled automatic synchronization:
./gravity-sync.sh automate
I set it to sync every hour. This is frequent enough that changes propagate quickly but not so aggressive that it creates unnecessary load.
Configuring IP Failover with Keepalived
With configuration synchronized, I needed automatic failover. If the primary Pi-hole becomes unavailable, the secondary should immediately take over the virtual IP address so clients experience no interruption.
I used keepalived for this. It implements VRRP (Virtual Router Redundancy Protocol) to manage the VIP between the two instances.
Installing Keepalived
On both containers:
sudo apt install keepalived libipset13 -y
I created a health check script to monitor the pihole-FTL service. If FTL stops running, keepalived needs to know so it can fail over:
sudo mkdir -p /etc/scripts sudo nano /etc/scripts/chk_ftl
The script checks if pihole-FTL is running:
#!/bin/bash
if systemctl is-active --quiet pihole-FTL; then
exit 0
else
exit 1
fi
sudo chmod +x /etc/scripts/chk_ftl
Configuring Keepalived
I created /etc/keepalived/keepalived.conf on the primary:
vrrp_script chk_ftl {
script "/etc/scripts/chk_ftl"
interval 2
weight 2
}
vrrp_instance PIHOLE {
state MASTER
interface eth0
virtual_router_id 51
priority 150
advert_int 1
authentication {
auth_type PASS
auth_pass MySecret1
}
unicast_src_ip 192.168.1.21
unicast_peer {
192.168.1.22
}
virtual_ipaddress {
192.168.1.20/24
}
track_script {
chk_ftl
}
}
On the secondary, the configuration is nearly identical except:
state BACKUP
priority 145
unicast_src_ip 192.168.1.22
unicast_peer {
192.168.1.21
}
The primary has higher priority (150 vs 145), so it will always claim the VIP when both instances are healthy. The auth_pass must match on both instances.
I started and enabled keepalived on both containers:
sudo systemctl enable --now keepalived sudo systemctl status keepalived
On the primary, I confirmed it entered MASTER state. On the secondary, it should show BACKUP state.
What Worked
Failover happens almost instantly. When I shut down the primary container, the secondary takes over the VIP within about one second. DNS queries continue without interruption—I tested this by running continuous nslookup queries while stopping the primary.
Gravity-sync keeps blocklists and settings synchronized reliably. When I add a domain to the whitelist on the primary, it appears on the secondary within an hour (or immediately if I manually trigger a sync).
The VIP approach means I only configure one DNS server address on clients. No more dealing with primary/secondary DNS confusion or timeout delays.
What Didn't Work
My first attempt used multicast VRRP instead of unicast. This failed because my network switch doesn't properly forward multicast traffic between VLANs. Switching to unicast configuration (explicitly defining peer IP addresses) fixed it.
I initially set gravity-sync to run every 15 minutes. This caused noticeable CPU spikes on both containers during sync operations. Hourly sync is a better balance for my usage.
Admin passwords don't sync via gravity-sync. I had to manually set them to match on both instances. Same for upstream DNS servers—I configured both to use the same Cloudflare DNS addresses.
Statistics and query logs are not synchronized. Each Pi-hole maintains its own query history. This doesn't affect functionality but means I can't get a unified view of DNS activity across both instances.
Key Takeaways
A Pi-hole failover cluster eliminates single points of failure for home DNS. The combination of gravity-sync and keepalived provides both configuration synchronization and automatic failover without manual intervention.
Use unicast VRRP if your network doesn't reliably handle multicast. Most home networks don't.
Don't over-sync. Hourly synchronization is sufficient for blocklist updates unless you're constantly tweaking configuration.
Test failover before you need it. Shut down the primary and verify that DNS continues working. Make sure the secondary actually takes over the VIP.
This setup has been running for several months now. I've performed maintenance on both Pi-hole instances without anyone in the house noticing. That's exactly what I wanted.