Building a multi-region WireGuard mesh with automatic failover using BGP BIRD daemon when primary tunnels fail

Why I Built This

I run services across multiple locations—home lab, a VPS in Europe, and another in Asia. Each site has its own subnet and local services. For a long time, I used static WireGuard tunnels with hardcoded routes. It worked, but every time I added a new network or changed something, I had to manually update routes on every node.

Then my primary VPS went down for maintenance. Traffic didn’t reroute. Services became unreachable. I realized I needed automatic failover and dynamic route learning. That’s when I started experimenting with BGP over WireGuard using the BIRD daemon.

My Setup

I have three nodes:

Node A (Home): Proxmox host running Ubuntu VM, subnet 10.10.0.0/24
Node B (EU VPS): Hetzner Cloud instance, subnet 10.20.0.0/24
Node C (Asia VPS): DigitalOcean droplet, subnet 10.30.0.0/24

Each node runs WireGuard and BIRD2. I configured a full mesh—every node connects to every other node. BGP runs over each WireGuard tunnel to exchange routes dynamically.

WireGuard Configuration

I created separate WireGuard interfaces for each peer connection. On Node A, I have wg-eu for the EU link and wg-asia for the Asia link. Each interface gets a /30 point-to-point subnet for the tunnel itself.

Example config on Node A for the EU tunnel (/etc/wireguard/wg-eu.conf):

[Interface]
PrivateKey = <node-a-private-key>
Address = 172.16.1.1/30
ListenPort = 51821

[Peer]
PublicKey = <node-b-public-key>
Endpoint = eu-vps-ip:51821
AllowedIPs = 0.0.0.0/0
PersistentKeepalive = 25

I set AllowedIPs = 0.0.0.0/0 because I want WireGuard to accept any traffic that BIRD decides to route through this tunnel. Without this, WireGuard would drop packets for networks not explicitly listed.

I used similar configs for the other tunnels, each with its own /30 subnet (172.16.1.0/30, 172.16.2.0/30, etc.).

BIRD2 Configuration

BIRD handles the BGP sessions and routing decisions. I installed BIRD2 on each node and configured it to peer over the WireGuard interfaces.

On Node A (/etc/bird/bird.conf):

router id 10.10.0.1;

protocol device {
  scan time 10;
}

protocol direct {
  ipv4;
  interface "eth0", "wg-*";
}

protocol kernel {
  ipv4 {
    export all;
  };
}

protocol static {
  ipv4;
  route 10.10.0.0/24 via "eth0";
}

protocol bgp eu_peer {
  local as 65000;
  neighbor 172.16.1.2 as 65001;
  ipv4 {
    import all;
    export all;
  };
}

protocol bgp asia_peer {
  local as 65000;
  neighbor 172.16.2.2 as 65002;
  ipv4 {
    import all;
    export all;
  };
}

Each node has its own AS number. I used private ASNs (65000, 65001, 65002). The protocol bgp blocks define the BGP sessions with each peer. BIRD learns routes from neighbors and installs them into the kernel routing table.

I exported my local subnet (10.10.0.0/24) via the protocol static block. BIRD then advertises this to all BGP peers.

What Worked

After bringing up the tunnels and starting BIRD, routes appeared automatically. I could see routes to 10.20.0.0/24 and 10.30.0.0/24 in my routing table on Node A. I didn’t have to add them manually.

I tested failover by shutting down the WireGuard tunnel to Node B. BIRD detected the peer as down within seconds. Traffic to 10.20.0.0/24 stopped routing through the direct tunnel. If I had configured a backup path (which I later did), traffic would reroute automatically.

To test multi-path failover, I added a fourth node (Node D) as a backup hub. I configured it to peer with all other nodes. When Node B went down, Node A could still reach 10.20.0.0/24 via Node D. BIRD recalculated the best path and updated the kernel routing table without any manual intervention.

Route Preferences and Metrics

I used BGP local preference to control which path BIRD prefers. For example, I set a higher local preference on the direct EU tunnel compared to the backup path through Node D.

In BIRD config:

protocol bgp eu_peer {
  local as 65000;
  neighbor 172.16.1.2 as 65001;
  ipv4 {
    import filter {
      bgp_local_pref = 200;
      accept;
    };
    export all;
  };
}

This made BIRD prefer the direct tunnel as long as it was up. When it failed, the backup path (with default local pref of 100) took over.

What Didn’t Work

Initial MTU issues caused packet loss. WireGuard adds overhead, and I didn’t account for that. I had to manually set the MTU on each WireGuard interface to 1420 to avoid fragmentation.

In /etc/wireguard/wg-eu.conf:

[Interface]
...
MTU = 1420

BGP sessions flapped when I first started. I had forgotten to allow port 179 (BGP) through the firewall on each node. Once I added the rules, sessions stabilized.

On Ubuntu:

ufw allow from 172.16.1.0/30 to any port 179

I also had issues with asymmetric routing. Traffic from Node A to Node C sometimes took a different path than the return traffic. This caused connection issues with stateful protocols. I fixed it by ensuring consistent route preferences across all nodes and enabling rp_filter adjustments where needed.

Another mistake: I initially set AllowedIPs to only the remote subnet (e.g., 10.20.0.0/24). This broke dynamic routing because WireGuard wouldn’t accept packets for other subnets learned via BGP. Changing it to 0.0.0.0/0 fixed it, but I had to be careful with firewall rules to avoid unintended traffic.

Monitoring and Debugging

I use birdc to check BGP session status and routes:

birdc show protocols
birdc show route

This shows which peers are up and what routes BIRD has learned. If a session is down, I check the WireGuard interface first with wg show, then look at BIRD logs in /var/log/syslog.

I also set up a simple monitoring script that pings each remote subnet every minute. If a subnet becomes unreachable, it logs an alert. This helped me catch issues before users noticed.

Key Takeaways

Dynamic routing with BGP removes the pain of managing static routes across multiple sites. When done right, it handles failover automatically and adapts to network changes without manual work.

WireGuard works well as the transport layer. It’s fast, stable, and easy to configure. BIRD is powerful but requires careful tuning. The documentation is dense, and mistakes in the config can cause silent failures.

MTU matters. Always account for WireGuard overhead and test with different packet sizes.

Firewall rules are critical. BGP won’t work if port 179 is blocked. WireGuard won’t route learned networks if AllowedIPs is too restrictive.

This setup isn’t plug-and-play. It took me several iterations to get right. But once it was stable, it saved me hours of manual route management and gave me real failover capability.

Tech Expert & Vibe Coder

Why I Built This

My Setup

WireGuard Configuration

BIRD2 Configuration

What Worked

Route Preferences and Metrics

What Didn’t Work

Monitoring and Debugging

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG

Tech Expert & Vibe Coder

Building a multi-region WireGuard mesh with automatic failover using BGP BIRD daemon when primary tunnels fail

Why I Built This

My Setup

WireGuard Configuration

BIRD2 Configuration

What Worked

Route Preferences and Metrics

What Didn’t Work

Monitoring and Debugging

Key Takeaways

Category:

implementing rate limiting...

setting up caddy as a...

Leave a Comment Cancel reply

Subscribe to Newsletter

Categories

Related Posts

implementing rate limiting for self-hosted api...

setting up caddy as a transparent proxy for...

building automated firewall rule testing with...

About Me

Vipin PG