HAProxy DNS Load Balancing: High Availability for DNS Infrastructure

HAProxy DNS Load Balancing: High Availability for DNS Infrastructure

DNS is critical infrastructure - when DNS fails, nothing works. This guide shows how to use HAProxy to load balance DNS queries across multiple resolvers or authoritative servers, providing high availability and improved performance for your DNS infrastructure.

Why Load Balance DNS?

DNS load balancing offers several benefits:

  • High availability: If one DNS server fails, queries are automatically routed to healthy servers
  • Performance: Distribute load across multiple servers to handle higher query volumes
  • Geographic distribution: Route queries to the nearest server
  • Maintenance flexibility: Take servers offline for maintenance without service interruption
  • Protocol flexibility: Support both UDP and TCP DNS queries

DNS Protocol Considerations

DNS primarily uses UDP on port 53, but can fall back to TCP for:

  • Responses larger than 512 bytes (or EDNS0 buffer size)
  • Zone transfers (AXFR/IXFR)
  • DNS over TCP enforcement

HAProxy can handle both protocols, but requires different configurations.

Basic UDP DNS Load Balancing

HAProxy 2.0+ supports UDP mode. Here's a basic configuration:

# /etc/haproxy/haproxy.cfg

global
    log /dev/log local0
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    user haproxy
    group haproxy
    daemon
    maxconn 50000

defaults
    log global
    timeout connect 5s
    timeout client  30s
    timeout server  30s

# UDP DNS Frontend
frontend dns_udp
    bind *:53 udp
    mode udp
    timeout client 5s
    default_backend dns_servers_udp

backend dns_servers_udp
    mode udp
    balance roundrobin
    timeout server 5s
    timeout connect 1s
    server dns1 192.168.1.10:53 check
    server dns2 192.168.1.11:53 check
    server dns3 192.168.1.12:53 check

TCP DNS Load Balancing

For TCP DNS traffic (large responses, zone transfers):

frontend dns_tcp
    bind *:53
    mode tcp
    option tcplog
    timeout client 30s
    default_backend dns_servers_tcp

backend dns_servers_tcp
    mode tcp
    balance roundrobin
    timeout server 30s
    timeout connect 5s
    option tcp-check
    server dns1 192.168.1.10:53 check
    server dns2 192.168.1.11:53 check
    server dns3 192.168.1.12:53 check

Combined UDP and TCP Configuration

A production DNS load balancer should handle both protocols:

# /etc/haproxy/haproxy.cfg

global
    log /dev/log local0
    log /dev/log local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon
    
    maxconn 100000
    nbthread 4

defaults dns
    log global
    timeout connect 1s
    timeout client  5s
    timeout server  5s
    default-server inter 3s fall 3 rise 2

# UDP DNS
frontend dns_udp_frontend
    bind *:53 udp
    mode udp
    default_backend dns_udp_backend

backend dns_udp_backend
    mode udp
    balance roundrobin
    server dns1 192.168.1.10:53 check
    server dns2 192.168.1.11:53 check
    server dns3 192.168.1.12:53 check

# TCP DNS
frontend dns_tcp_frontend
    bind *:53
    mode tcp
    option tcplog
    default_backend dns_tcp_backend

backend dns_tcp_backend
    mode tcp
    balance roundrobin
    option tcp-check
    server dns1 192.168.1.10:53 check
    server dns2 192.168.1.11:53 check
    server dns3 192.168.1.12:53 check

# Stats interface
listen stats
    bind *:8404
    mode http
    stats enable
    stats uri /stats
    stats refresh 5s
    http-request use-service prometheus-exporter if { path /metrics }

DNS Health Checks

Basic TCP Health Check

The simplest check verifies the TCP port is responding:

backend dns_tcp_backend
    mode tcp
    option tcp-check
    server dns1 192.168.1.10:53 check inter 2s fall 3 rise 2

DNS-Specific Health Check

For more thorough checking, verify the server can resolve queries:

backend dns_tcp_backend
    mode tcp
    option tcp-check
    
    # DNS query for "health.local" A record
    # This is a raw DNS query packet
    tcp-check connect port 53
    tcp-check send-binary 0000  # Transaction ID
    tcp-check send-binary 0100  # Flags: Standard query
    tcp-check send-binary 0001  # Questions: 1
    tcp-check send-binary 0000  # Answer RRs: 0
    tcp-check send-binary 0000  # Authority RRs: 0
    tcp-check send-binary 0000  # Additional RRs: 0
    tcp-check send-binary 06686561697468  # "health"
    tcp-check send-binary 056c6f63616c    # "local"
    tcp-check send-binary 00              # End of name
    tcp-check send-binary 0001  # Type: A
    tcp-check send-binary 0001  # Class: IN
    tcp-check expect binary 8180  # Response flags (standard response, no error)
    
    server dns1 192.168.1.10:53 check

External Health Check Script

For complex health checks, use an external script:

backend dns_tcp_backend
    mode tcp
    option external-check
    external-check path "/usr/bin:/bin"
    external-check command /etc/haproxy/check-dns.sh
    server dns1 192.168.1.10:53 check inter 5s

Create the health check script:

#!/bin/bash
# /etc/haproxy/check-dns.sh

# Arguments from HAProxy:
# $1 = server address
# $2 = server port
# $3 = server name

SERVER=$1
PORT=$2

# Try to resolve a known domain
if dig @${SERVER} -p ${PORT} +time=2 +tries=1 google.com A > /dev/null 2>&1; then
    exit 0  # Success
else
    exit 1  # Failure
fi

Make it executable:

chmod +x /etc/haproxy/check-dns.sh

Load Balancing Algorithms for DNS

Round Robin

Default algorithm, distributes queries evenly:

backend dns_servers
    balance roundrobin
    server dns1 192.168.1.10:53 check
    server dns2 192.168.1.11:53 check

Least Connections (TCP only)

Best for DNS servers with varying query complexity:

backend dns_tcp_backend
    mode tcp
    balance leastconn
    server dns1 192.168.1.10:53 check
    server dns2 192.168.1.11:53 check

Source IP Hash

Ensures clients consistently reach the same server (useful for caching):

backend dns_servers
    balance source
    hash-type consistent
    server dns1 192.168.1.10:53 check
    server dns2 192.168.1.11:53 check

First Available

Send all traffic to the first available server (active-passive):

backend dns_servers
    balance first
    server dns-primary   192.168.1.10:53 check
    server dns-secondary 192.168.1.11:53 check backup

Recursive vs Authoritative DNS

Load Balancing Recursive Resolvers

For caching resolvers serving internal clients:

frontend dns_resolver
    bind 10.0.0.1:53 udp
    bind 10.0.0.1:53
    mode tcp
    default_backend resolvers

backend resolvers
    balance roundrobin
    # Internal resolvers
    server resolver1 192.168.1.10:53 check weight 100
    server resolver2 192.168.1.11:53 check weight 100
    # Public fallback
    server cloudflare 1.1.1.1:53 check weight 50 backup
    server google     8.8.8.8:53 check weight 50 backup

Load Balancing Authoritative Servers

For authoritative servers serving external queries:

frontend dns_authoritative
    bind 203.0.113.53:53 udp
    bind 203.0.113.53:53
    default_backend auth_servers

backend auth_servers
    balance roundrobin
    option tcp-check
    server auth1 192.168.1.20:53 check
    server auth2 192.168.1.21:53 check
    server auth3 192.168.1.22:53 check

DNS over TLS (DoT) Load Balancing

For encrypted DNS traffic on port 853:

# TLS Termination at HAProxy
frontend dot_frontend
    bind *:853 ssl crt /etc/haproxy/certs/dns.example.com.pem
    mode tcp
    option tcplog
    default_backend dns_tcp_backend

backend dns_tcp_backend
    mode tcp
    balance roundrobin
    server dns1 192.168.1.10:53 check
    server dns2 192.168.1.11:53 check

# TLS Passthrough
frontend dot_passthrough
    bind *:853
    mode tcp
    option tcplog
    default_backend dot_servers

backend dot_servers
    mode tcp
    balance roundrobin
    server dot1 192.168.1.10:853 check ssl verify none
    server dot2 192.168.1.11:853 check ssl verify none

Rate Limiting DNS Queries

Protect against DNS amplification attacks:

frontend dns_udp
    bind *:53 udp
    mode udp
    
    # Track request rates per source IP
    stick-table type ip size 100k expire 30s store gpc0,gpc0_rate(10s)
    
    # Mark excessive requesters
    acl abuse src_get_gpc0 gt 100
    tcp-request connection reject if abuse
    tcp-request connection track-sc0 src
    
    default_backend dns_udp_backend

For TCP with more options:

frontend dns_tcp
    bind *:53
    mode tcp
    option tcplog
    
    # Rate limiting table
    stick-table type ip size 100k expire 60s store conn_rate(10s),bytes_out_rate(10s)
    
    # Limit connections per IP
    tcp-request connection track-sc0 src
    tcp-request connection reject if { sc0_conn_rate gt 50 }
    
    # Limit bandwidth (prevent amplification)
    tcp-request content reject if { sc0_bytes_out_rate gt 100000 }
    
    default_backend dns_tcp_backend

High Availability HAProxy Setup

For the load balancer itself, use keepalived for failover:

apt install keepalived

Configure keepalived on the primary:

# /etc/keepalived/keepalived.conf (primary)

global_defs {
    router_id HAPROXY_DNS_1
}

vrrp_script check_haproxy {
    script "/usr/bin/killall -0 haproxy"
    interval 2
    weight 2
}

vrrp_instance VI_DNS {
    state MASTER
    interface eth0
    virtual_router_id 53
    priority 101
    advert_int 1
    
    authentication {
        auth_type PASS
        auth_pass secret123
    }
    
    virtual_ipaddress {
        10.0.0.53/24
    }
    
    track_script {
        check_haproxy
    }
}

Configure the backup:

# /etc/keepalived/keepalived.conf (backup)

global_defs {
    router_id HAPROXY_DNS_2
}

vrrp_script check_haproxy {
    script "/usr/bin/killall -0 haproxy"
    interval 2
    weight 2
}

vrrp_instance VI_DNS {
    state BACKUP
    interface eth0
    virtual_router_id 53
    priority 100
    advert_int 1
    
    authentication {
        auth_type PASS
        auth_pass secret123
    }
    
    virtual_ipaddress {
        10.0.0.53/24
    }
    
    track_script {
        check_haproxy
    }
}

Complete Production Configuration

Here's a complete configuration for a production DNS load balancer:

# /etc/haproxy/haproxy.cfg

global
    log /dev/log local0 info
    log /dev/log local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon
    
    # Performance
    maxconn 100000
    nbthread 4
    cpu-map auto:1/1-4 0-3
    
    # Security
    tune.ssl.default-dh-param 2048

defaults dns_defaults
    log global
    timeout connect 1s
    timeout client  5s
    timeout server  5s
    default-server inter 3s fall 3 rise 2
    retries 2

# Stats and monitoring
listen stats
    bind *:8404
    mode http
    stats enable
    stats uri /stats
    stats refresh 5s
    stats auth admin:changeme
    http-request use-service prometheus-exporter if { path /metrics }

# ====================
# Internal Resolvers
# ====================

# UDP DNS for internal clients
frontend internal_dns_udp
    bind 10.0.0.53:53 udp
    mode udp
    default_backend internal_resolvers_udp
    
    # Basic rate limiting
    stick-table type ip size 50k expire 30s store gpc0_rate(10s)
    tcp-request content track-sc0 src
    tcp-request content reject if { sc0_gpc0_rate gt 100 }

backend internal_resolvers_udp
    mode udp
    balance roundrobin
    server resolver1 192.168.1.10:53 check
    server resolver2 192.168.1.11:53 check
    server resolver3 192.168.1.12:53 check
    # Fallback to public DNS
    server cloudflare 1.1.1.1:53 check backup
    server quad9      9.9.9.9:53 check backup

# TCP DNS for internal clients
frontend internal_dns_tcp
    bind 10.0.0.53:53
    mode tcp
    option tcplog
    default_backend internal_resolvers_tcp
    
    # Connection limiting
    stick-table type ip size 50k expire 60s store conn_rate(10s)
    tcp-request connection track-sc0 src
    tcp-request connection reject if { sc0_conn_rate gt 30 }

backend internal_resolvers_tcp
    mode tcp
    balance leastconn
    option tcp-check
    server resolver1 192.168.1.10:53 check
    server resolver2 192.168.1.11:53 check
    server resolver3 192.168.1.12:53 check
    server cloudflare 1.1.1.1:53 check backup
    server quad9      9.9.9.9:53 check backup

# ====================
# Authoritative Servers
# ====================

# UDP DNS for external queries
frontend authoritative_dns_udp
    bind 203.0.113.53:53 udp
    mode udp
    default_backend auth_servers_udp
    
    # Strict rate limiting for public interface
    stick-table type ip size 100k expire 30s store gpc0_rate(10s)
    tcp-request content track-sc0 src
    tcp-request content reject if { sc0_gpc0_rate gt 50 }

backend auth_servers_udp
    mode udp
    balance roundrobin
    server auth1 192.168.1.20:53 check
    server auth2 192.168.1.21:53 check
    server auth3 192.168.1.22:53 check

# TCP DNS for zone transfers and large responses
frontend authoritative_dns_tcp
    bind 203.0.113.53:53
    mode tcp
    option tcplog
    default_backend auth_servers_tcp
    
    # Connection limiting
    stick-table type ip size 100k expire 60s store conn_rate(10s),bytes_out_rate(10s)
    tcp-request connection track-sc0 src
    tcp-request connection reject if { sc0_conn_rate gt 20 }

backend auth_servers_tcp
    mode tcp
    balance roundrobin
    option tcp-check
    timeout server 30s  # Longer timeout for zone transfers
    server auth1 192.168.1.20:53 check
    server auth2 192.168.1.21:53 check
    server auth3 192.168.1.22:53 check

# ====================
# DNS over TLS (DoT)
# ====================

frontend dot_frontend
    bind *:853 ssl crt /etc/haproxy/certs/dns.example.com.pem alpn dns
    mode tcp
    option tcplog
    timeout client 30s
    default_backend internal_resolvers_tcp
    
    # TLS client validation (optional)
    # bind *:853 ssl crt /etc/haproxy/certs/dns.example.com.pem ca-file /etc/haproxy/certs/ca.pem verify optional

Monitoring and Logging

DNS Query Logging

frontend dns_tcp
    bind *:53
    mode tcp
    option tcplog
    log-format "%ci:%cp [%t] %ft %b/%s %Tw/%Tc/%Tt %B %ts %ac/%fc/%bc/%sc/%rc"
    default_backend dns_servers

Prometheus Metrics

Enable the built-in Prometheus exporter:

listen stats
    bind *:8404
    mode http
    http-request use-service prometheus-exporter if { path /metrics }

Key metrics to monitor:

  • haproxy_server_status: Backend server health
  • haproxy_frontend_current_sessions: Active connections
  • haproxy_backend_response_time_average_seconds: Query latency
  • haproxy_frontend_bytes_in_total: Incoming traffic
  • haproxy_frontend_bytes_out_total: Outgoing traffic (watch for amplification)

Testing Your Configuration

Validate Configuration

haproxy -c -f /etc/haproxy/haproxy.cfg

Test DNS Resolution

# UDP query
dig @10.0.0.53 google.com A

# TCP query
dig @10.0.0.53 +tcp google.com A

# Test all servers
for server in 192.168.1.10 192.168.1.11 192.168.1.12; do
    echo "Testing $server:"
    dig @$server +short google.com A
done

Load Testing

# Install dnsperf
apt install dnsperf

# Create query file
cat > queries.txt << EOF
google.com A
example.com A
github.com A
EOF

# Run load test
dnsperf -s 10.0.0.53 -d queries.txt -c 100 -Q 10000

Troubleshooting

Common Issues

  1. UDP not working: Ensure HAProxy 2.0+ and mode udp is specified
  2. Health checks failing: Verify DNS servers are responding, check firewall rules
  3. High latency: Check timeout settings and server response times
  4. Connection refused: Verify HAProxy is binding correctly (netstat -tulpn | grep 53)

Debug Commands

# Check HAProxy status
echo "show info" | socat stdio /run/haproxy/admin.sock

# Check backend servers
echo "show servers state" | socat stdio /run/haproxy/admin.sock

# Check statistics
echo "show stat" | socat stdio /run/haproxy/admin.sock

# Watch connections in real-time
watch -n 1 'echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d, -f1,2,5,18'

Next Steps

This post covered DNS load balancing with HAProxy. In the next post, we'll explore how to set up HAProxy with Let's Encrypt for automatic HTTPS certificate management.

Summary

HAProxy provides robust DNS load balancing for both UDP and TCP traffic. Key points:

  • Use HAProxy 2.0+ for UDP support
  • Configure both UDP and TCP frontends for complete DNS coverage
  • Implement proper health checks for DNS servers
  • Use rate limiting to protect against abuse
  • Monitor with Prometheus for visibility
  • Deploy keepalived for HAProxy high availability

With proper configuration, HAProxy can provide highly available, performant DNS infrastructure for both internal resolvers and authoritative servers.

Read more

HAProxy Monitoring with Prometheus: Complete Observability Guide

HAProxy Monitoring with Prometheus: Complete Observability Guide

Monitoring HAProxy is essential for maintaining reliable load balancing infrastructure. Prometheus provides powerful metrics collection, alerting capabilities, and seamless Grafana integration for visualizing HAProxy performance and health. Why Prometheus for HAProxy? Prometheus offers: * Pull-based metrics - Prometheus scrapes HAProxy metrics endpoints * Time-series database - Store historical data for trend analysis

By Patrick de Ruiter