Building a Pi-hole failover cluster with keepalived and gravity-sync to eliminate DNS downtime during maintenance windows

Why I Built a Pi-hole Failover Cluster

I run Pi-hole as my primary DNS resolver at home. It blocks ads and trackers across every device on my network without needing per-device configuration. The problem is simple: when Pi-hole goes down—whether for updates, maintenance, or an unexpected crash—DNS resolution stops. No DNS means no internet access for anyone in the house.

I learned this the hard way when I rebooted my Pi-hole VM during a weeknight. My partner was in the middle of a video call. It did not go well.

Running two independent Pi-hole instances with separate IPs seemed like an obvious fix, but that approach has real drawbacks. Clients don’t reliably fail over between DNS servers—they often wait for timeouts, causing noticeable delays. And keeping blocklists and settings synchronized between two servers manually is tedious and error-prone.

I needed a proper failover cluster: one virtual IP address that would automatically switch between two Pi-hole instances if the primary went down, with synchronized configuration between them.

My Setup

I run both Pi-hole instances as LXC containers on Proxmox. Each runs Ubuntu 22.04 with Pi-hole 5.x installed. The containers sit on the same VLAN as my client devices.

I assigned them these addresses:

Primary Pi-hole: 192.168.1.21
Secondary Pi-hole: 192.168.1.22
Virtual IP (VIP): 192.168.1.20

Clients are configured to use only 192.168.1.20 as their DNS server. They never touch the real IPs directly.

Synchronizing Configuration with Gravity-Sync

Pi-hole 5 stores blocklists and configuration in a SQLite database at /etc/pihole/gravity.db. I needed a way to keep this database synchronized between both instances without manual intervention.

I used gravity-sync, a script specifically designed for this purpose. It handles syncing blocklists, whitelist entries, and local DNS records. It does not sync admin passwords, upstream DNS servers, or DHCP settings—those I configured manually to match on both instances.

Installing Gravity-Sync

On both containers, I first installed dependencies:

apt update && apt install sqlite3 sudo git rsync ssh -y

I created a service account with sudo privileges on both instances:

sudo useradd -G sudo -m pihole-sync
sudo passwd pihole-sync

On the primary Pi-hole, I installed gravity-sync as the primary:

export GS_INSTALL=primary && curl -sSL https://gravity.vmstan.com | bash

On the secondary, I installed it as the secondary:

export GS_INSTALL=secondary && curl -sSL https://gravity.vmstan.com | bash

The secondary installer prompted me for the primary’s IP address and the service account username. It also configured passwordless SSH authentication between the two instances.

Testing and Automating Sync

From the secondary Pi-hole, I tested connectivity:

cd ~/gravity-sync
./gravity-sync.sh compare

This verified that the secondary could reach the primary and checked if any sync was needed. Since my primary already had configuration and the secondary was fresh, I forced an initial one-way sync:

./gravity-sync.sh pull

Finally, I enabled automatic synchronization:

./gravity-sync.sh automate

I set it to sync every hour. This is frequent enough that changes propagate quickly but not so aggressive that it creates unnecessary load.

Configuring IP Failover with Keepalived

With configuration synchronized, I needed automatic failover. If the primary Pi-hole becomes unavailable, the secondary should immediately take over the virtual IP address so clients experience no interruption.

I used keepalived for this. It implements VRRP (Virtual Router Redundancy Protocol) to manage the VIP between the two instances.

Installing Keepalived

On both containers:

sudo apt install keepalived libipset13 -y

I created a health check script to monitor the pihole-FTL service. If FTL stops running, keepalived needs to know so it can fail over:

sudo mkdir -p /etc/scripts
sudo nano /etc/scripts/chk_ftl

The script checks if pihole-FTL is running:

#!/bin/bash
if systemctl is-active --quiet pihole-FTL; then
    exit 0
else
    exit 1
fi

sudo chmod +x /etc/scripts/chk_ftl

Configuring Keepalived

I created /etc/keepalived/keepalived.conf on the primary:

vrrp_script chk_ftl {
    script "/etc/scripts/chk_ftl"
    interval 2
    weight 2
}

vrrp_instance PIHOLE {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 150
    advert_int 1
    
    authentication {
        auth_type PASS
        auth_pass MySecret1
    }
    
    unicast_src_ip 192.168.1.21
    unicast_peer {
        192.168.1.22
    }
    
    virtual_ipaddress {
        192.168.1.20/24
    }
    
    track_script {
        chk_ftl
    }
}

On the secondary, the configuration is nearly identical except:

state BACKUP
priority 145
unicast_src_ip 192.168.1.22
unicast_peer {
    192.168.1.21
}

The primary has higher priority (150 vs 145), so it will always claim the VIP when both instances are healthy. The auth_pass must match on both instances.

I started and enabled keepalived on both containers:

sudo systemctl enable --now keepalived
sudo systemctl status keepalived

On the primary, I confirmed it entered MASTER state. On the secondary, it should show BACKUP state.

What Worked

Failover happens almost instantly. When I shut down the primary container, the secondary takes over the VIP within about one second. DNS queries continue without interruption—I tested this by running continuous nslookup queries while stopping the primary.

Gravity-sync keeps blocklists and settings synchronized reliably. When I add a domain to the whitelist on the primary, it appears on the secondary within an hour (or immediately if I manually trigger a sync).

The VIP approach means I only configure one DNS server address on clients. No more dealing with primary/secondary DNS confusion or timeout delays.

What Didn’t Work

My first attempt used multicast VRRP instead of unicast. This failed because my network switch doesn’t properly forward multicast traffic between VLANs. Switching to unicast configuration (explicitly defining peer IP addresses) fixed it.

I initially set gravity-sync to run every 15 minutes. This caused noticeable CPU spikes on both containers during sync operations. Hourly sync is a better balance for my usage.

Admin passwords don’t sync via gravity-sync. I had to manually set them to match on both instances. Same for upstream DNS servers—I configured both to use the same Cloudflare DNS addresses.

Statistics and query logs are not synchronized. Each Pi-hole maintains its own query history. This doesn’t affect functionality but means I can’t get a unified view of DNS activity across both instances.

Key Takeaways

A Pi-hole failover cluster eliminates single points of failure for home DNS. The combination of gravity-sync and keepalived provides both configuration synchronization and automatic failover without manual intervention.

Use unicast VRRP if your network doesn’t reliably handle multicast. Most home networks don’t.

Don’t over-sync. Hourly synchronization is sufficient for blocklist updates unless you’re constantly tweaking configuration.

Test failover before you need it. Shut down the primary and verify that DNS continues working. Make sure the secondary actually takes over the VIP.

This setup has been running for several months now. I’ve performed maintenance on both Pi-hole instances without anyone in the house noticing. That’s exactly what I wanted.

Tech Expert & Vibe Coder

Why I Built a Pi-hole Failover Cluster

My Setup

Synchronizing Configuration with Gravity-Sync

Installing Gravity-Sync

Testing and Automating Sync

Configuring IP Failover with Keepalived

Installing Keepalived

Configuring Keepalived

What Worked

What Didn’t Work

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG

Tech Expert & Vibe Coder

Building a Pi-hole failover cluster with keepalived and gravity-sync to eliminate DNS downtime during maintenance windows

Why I Built a Pi-hole Failover Cluster

My Setup

Synchronizing Configuration with Gravity-Sync

Installing Gravity-Sync

Testing and Automating Sync

Configuring IP Failover with Keepalived

Installing Keepalived

Configuring Keepalived

What Worked

What Didn’t Work

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Subscribe to Newsletter

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG