HAProxy Monitoring with Prometheus: Complete Observability Guide

HAProxy Monitoring with Prometheus: Complete Observability Guide

Monitoring HAProxy is essential for maintaining reliable load balancing infrastructure. Prometheus provides powerful metrics collection, alerting capabilities, and seamless Grafana integration for visualizing HAProxy performance and health.

Why Prometheus for HAProxy?

Prometheus offers:

  • Pull-based metrics - Prometheus scrapes HAProxy metrics endpoints
  • Time-series database - Store historical data for trend analysis
  • Powerful queries - PromQL for complex metric analysis
  • Alerting - Define rules for proactive incident detection
  • Grafana integration - Beautiful dashboards and visualizations

HAProxy Built-in Prometheus Support

HAProxy 2.0+ includes native Prometheus metrics export via the built-in Prometheus exporter.

Enable Prometheus Metrics in HAProxy

Add to /etc/haproxy/haproxy.cfg:

global
    log /dev/log local0
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    log global
    mode http
    option httplog
    option dontlognull
    timeout connect 5000
    timeout client 50000
    timeout server 50000

# Prometheus metrics frontend
frontend prometheus
    bind *:8405
    mode http
    http-request use-service prometheus-exporter if { path /metrics }
    no log
    stats enable
    stats uri /stats
    stats refresh 10s

# Stats page for human viewing
frontend stats
    bind *:8404
    mode http
    stats enable
    stats uri /stats
    stats refresh 10s
    stats admin if LOCALHOST

# Application frontends and backends
frontend http_front
    bind *:80
    bind *:443 ssl crt /etc/haproxy/certs/
    default_backend web_servers

backend web_servers
    balance roundrobin
    option httpchk GET /health
    server web1 192.168.1.10:8080 check
    server web2 192.168.1.11:8080 check
    server web3 192.168.1.12:8080 check

Verify Metrics Endpoint

# Test Prometheus metrics endpoint
curl http://localhost:8405/metrics

# Sample output:
# HELP haproxy_process_nbthread Number of threads
# TYPE haproxy_process_nbthread gauge
haproxy_process_nbthread 4
# HELP haproxy_process_current_connections Current number of connections
# TYPE haproxy_process_current_connections gauge
haproxy_process_current_connections 42

Prometheus Configuration

Basic Prometheus Setup

Create /etc/prometheus/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

rule_files:
  - "haproxy_alerts.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'haproxy'
    static_configs:
      - targets:
        - 'haproxy1.example.com:8405'
        - 'haproxy2.example.com:8405'
    metrics_path: /metrics
    scrape_interval: 10s
    scrape_timeout: 5s

Service Discovery for Multiple HAProxy Instances

scrape_configs:
  - job_name: 'haproxy'
    file_sd_configs:
      - files:
        - '/etc/prometheus/haproxy_targets.json'
        refresh_interval: 30s

Create /etc/prometheus/haproxy_targets.json:

[
  {
    "targets": [
      "haproxy1.example.com:8405",
      "haproxy2.example.com:8405"
    ],
    "labels": {
      "env": "production",
      "dc": "dc1"
    }
  },
  {
    "targets": [
      "haproxy-staging.example.com:8405"
    ],
    "labels": {
      "env": "staging",
      "dc": "dc1"
    }
  }
]

Consul Service Discovery

scrape_configs:
  - job_name: 'haproxy'
    consul_sd_configs:
      - server: 'consul.example.com:8500'
        services:
          - 'haproxy'
    relabel_configs:
      - source_labels: [__meta_consul_tags]
        regex: .*,prometheus,.*
        action: keep
      - source_labels: [__meta_consul_service]
        target_label: job
      - source_labels: [__meta_consul_node]
        target_label: instance

Key HAProxy Metrics

Process Metrics

# Current connections
haproxy_process_current_connections

# Connection rate
rate(haproxy_process_connections_total[5m])

# Memory usage
haproxy_process_pool_allocated_bytes
haproxy_process_pool_used_bytes

# CPU usage
haproxy_process_cpu_user_seconds_total
haproxy_process_cpu_system_seconds_total

# Uptime
haproxy_process_start_time_seconds

Frontend Metrics

# Request rate per frontend
rate(haproxy_frontend_http_requests_total[5m])

# Current sessions
haproxy_frontend_current_sessions

# Bytes in/out
rate(haproxy_frontend_bytes_in_total[5m])
rate(haproxy_frontend_bytes_out_total[5m])

# HTTP response codes
rate(haproxy_frontend_http_responses_total{code="2xx"}[5m])
rate(haproxy_frontend_http_responses_total{code="4xx"}[5m])
rate(haproxy_frontend_http_responses_total{code="5xx"}[5m])

# Request errors
rate(haproxy_frontend_request_errors_total[5m])

# Denied requests
rate(haproxy_frontend_requests_denied_total[5m])

Backend Metrics

# Backend response time
haproxy_backend_response_time_average_seconds
haproxy_backend_total_time_average_seconds

# Queue length
haproxy_backend_current_queue
haproxy_backend_max_queue

# Active servers
haproxy_backend_active_servers

# Connection errors
rate(haproxy_backend_connection_errors_total[5m])

# Response errors
rate(haproxy_backend_response_errors_total[5m])

# Retry rate
rate(haproxy_backend_retry_warnings_total[5m])

# Sessions
haproxy_backend_current_sessions
haproxy_backend_max_sessions

Server Metrics

# Server status (1=UP, 0=DOWN)
haproxy_server_status

# Server weight
haproxy_server_weight

# Current sessions per server
haproxy_server_current_sessions

# Connection time
haproxy_server_connect_time_average_seconds

# Response time
haproxy_server_response_time_average_seconds

# Health check status
haproxy_server_check_status

# Failed health checks
rate(haproxy_server_check_failures_total[5m])

Alerting Rules

Create /etc/prometheus/haproxy_alerts.yml:

groups:
  - name: haproxy
    rules:
      # High error rate
      - alert: HAProxyHighErrorRate
        expr: |
          sum(rate(haproxy_frontend_http_responses_total{code="5xx"}[5m])) by (proxy)
          /
          sum(rate(haproxy_frontend_http_responses_total[5m])) by (proxy)
          > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High 5xx error rate on {{ $labels.proxy }}"
          description: "Error rate is {{ $value | humanizePercentage }} on frontend {{ $labels.proxy }}"

      # Backend down
      - alert: HAProxyBackendDown
        expr: haproxy_backend_active_servers == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "HAProxy backend {{ $labels.proxy }} has no active servers"
          description: "All servers in backend {{ $labels.proxy }} are down"

      # Server down
      - alert: HAProxyServerDown
        expr: haproxy_server_status == 0
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Server {{ $labels.server }} is down"
          description: "Server {{ $labels.server }} in backend {{ $labels.proxy }} is down"

      # High connection count
      - alert: HAProxyHighConnections
        expr: |
          haproxy_frontend_current_sessions
          /
          haproxy_frontend_limit_sessions
          > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High connection usage on {{ $labels.proxy }}"
          description: "Frontend {{ $labels.proxy }} is at {{ $value | humanizePercentage }} of max connections"

      # Queue building up
      - alert: HAProxyQueueNotEmpty
        expr: haproxy_backend_current_queue > 10
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Queue building up on {{ $labels.proxy }}"
          description: "Backend {{ $labels.proxy }} has {{ $value }} requests queued"

      # High response time
      - alert: HAProxyHighResponseTime
        expr: haproxy_backend_response_time_average_seconds > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High response time on {{ $labels.proxy }}"
          description: "Backend {{ $labels.proxy }} average response time is {{ $value }}s"

      # Health check failures
      - alert: HAProxyHealthCheckFailures
        expr: increase(haproxy_server_check_failures_total[10m]) > 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Health check failures on {{ $labels.server }}"
          description: "Server {{ $labels.server }} has {{ $value }} health check failures"

      # Connection errors
      - alert: HAProxyConnectionErrors
        expr: rate(haproxy_backend_connection_errors_total[5m]) > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Connection errors to {{ $labels.proxy }}"
          description: "Backend {{ $labels.proxy }} has {{ $value }}/s connection errors"

      # HAProxy instance down
      - alert: HAProxyDown
        expr: up{job="haproxy"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "HAProxy instance {{ $labels.instance }} is down"
          description: "HAProxy at {{ $labels.instance }} is not responding"

      # SSL certificate expiry
      - alert: HAProxySSLCertExpiringSoon
        expr: haproxy_ssl_cert_not_after - time() < 7 * 24 * 3600
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "SSL certificate expiring soon"
          description: "Certificate expires in {{ $value | humanizeDuration }}"

Grafana Dashboards

HAProxy Overview Dashboard JSON

Create a comprehensive dashboard with these panels:

{
  "title": "HAProxy Overview",
  "panels": [
    {
      "title": "Request Rate",
      "type": "graph",
      "targets": [
        {
          "expr": "sum(rate(haproxy_frontend_http_requests_total[5m])) by (proxy)",
          "legendFormat": "{{ proxy }}"
        }
      ]
    },
    {
      "title": "HTTP Response Codes",
      "type": "graph",
      "targets": [
        {
          "expr": "sum(rate(haproxy_frontend_http_responses_total[5m])) by (code)",
          "legendFormat": "{{ code }}"
        }
      ]
    },
    {
      "title": "Backend Response Time",
      "type": "graph",
      "targets": [
        {
          "expr": "haproxy_backend_response_time_average_seconds",
          "legendFormat": "{{ proxy }}"
        }
      ]
    },
    {
      "title": "Active Connections",
      "type": "gauge",
      "targets": [
        {
          "expr": "sum(haproxy_frontend_current_sessions)"
        }
      ]
    },
    {
      "title": "Server Health",
      "type": "table",
      "targets": [
        {
          "expr": "haproxy_server_status",
          "format": "table",
          "instant": true
        }
      ]
    }
  ]
}

Key Dashboard Panels

Traffic Overview:

# Total requests per second
sum(rate(haproxy_frontend_http_requests_total[5m]))

# Bandwidth in/out
sum(rate(haproxy_frontend_bytes_in_total[5m])) * 8  # bits/sec
sum(rate(haproxy_frontend_bytes_out_total[5m])) * 8

Error Analysis:

# Error rate percentage
100 * sum(rate(haproxy_frontend_http_responses_total{code=~"5.."}[5m]))
/
sum(rate(haproxy_frontend_http_responses_total[5m]))

# 4xx rate
100 * sum(rate(haproxy_frontend_http_responses_total{code=~"4.."}[5m]))
/
sum(rate(haproxy_frontend_http_responses_total[5m]))

Backend Performance:

# Response time histogram
histogram_quantile(0.95, sum(rate(haproxy_backend_http_response_time_seconds_bucket[5m])) by (le, proxy))

# Queue time
haproxy_backend_queue_time_average_seconds

Server Status:

# Server availability
sum(haproxy_server_status) by (proxy)
/
count(haproxy_server_status) by (proxy)

# Server weight distribution
haproxy_server_weight / ignoring(server) group_left sum(haproxy_server_weight) by (proxy)

Recording Rules

Optimize query performance with recording rules in /etc/prometheus/haproxy_rules.yml:

groups:
  - name: haproxy_recording
    rules:
      # Request rate by frontend
      - record: haproxy:frontend:request_rate:5m
        expr: sum(rate(haproxy_frontend_http_requests_total[5m])) by (proxy, instance)

      # Error rate by frontend
      - record: haproxy:frontend:error_rate:5m
        expr: |
          sum(rate(haproxy_frontend_http_responses_total{code="5xx"}[5m])) by (proxy, instance)
          /
          sum(rate(haproxy_frontend_http_responses_total[5m])) by (proxy, instance)

      # Backend availability
      - record: haproxy:backend:availability
        expr: |
          sum(haproxy_server_status) by (proxy, instance)
          /
          count(haproxy_server_status) by (proxy, instance)

      # Session utilization
      - record: haproxy:frontend:session_utilization
        expr: |
          haproxy_frontend_current_sessions
          /
          haproxy_frontend_limit_sessions

      # Bytes total rate
      - record: haproxy:frontend:bytes_total_rate:5m
        expr: |
          sum(rate(haproxy_frontend_bytes_in_total[5m])) by (proxy, instance)
          +
          sum(rate(haproxy_frontend_bytes_out_total[5m])) by (proxy, instance)

HAProxy Exporter (Alternative)

For HAProxy versions without native Prometheus support, use the official HAProxy Exporter:

Installation

# Download HAProxy Exporter
EXPORTER_VERSION="0.15.0"
wget https://github.com/prometheus/haproxy_exporter/releases/download/v${EXPORTER_VERSION}/haproxy_exporter-${EXPORTER_VERSION}.linux-amd64.tar.gz
tar xzf haproxy_exporter-${EXPORTER_VERSION}.linux-amd64.tar.gz
sudo mv haproxy_exporter-${EXPORTER_VERSION}.linux-amd64/haproxy_exporter /usr/local/bin/

HAProxy Stats Socket Configuration

global
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats socket ipv4@127.0.0.1:9999 level admin
    stats timeout 30s

Exporter Systemd Service

Create /etc/systemd/system/haproxy_exporter.service:

[Unit]
Description=HAProxy Exporter
After=network.target haproxy.service

[Service]
Type=simple
User=haproxy
Group=haproxy
ExecStart=/usr/local/bin/haproxy_exporter \
  --haproxy.scrape-uri="unix:/run/haproxy/admin.sock" \
  --web.listen-address=":9101"
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Prometheus Configuration for Exporter

scrape_configs:
  - job_name: 'haproxy'
    static_configs:
      - targets: ['haproxy1.example.com:9101']

Advanced Monitoring Patterns

Multi-Instance Aggregation

# Total request rate across all instances
sum(rate(haproxy_frontend_http_requests_total[5m]))

# Per-datacenter metrics
sum(rate(haproxy_frontend_http_requests_total[5m])) by (dc)

# Highest error rate instance
topk(3, haproxy:frontend:error_rate:5m)

SLO Monitoring

# Availability SLO (99.9%)
1 - (
  sum(rate(haproxy_frontend_http_responses_total{code="5xx"}[30d]))
  /
  sum(rate(haproxy_frontend_http_responses_total[30d]))
) > 0.999

# Error budget remaining
1 - (
  sum(increase(haproxy_frontend_http_responses_total{code="5xx"}[30d]))
  /
  (sum(increase(haproxy_frontend_http_responses_total[30d])) * 0.001)
)

Capacity Planning

# Predict session exhaustion
predict_linear(haproxy_frontend_current_sessions[1h], 3600)

# Connection growth rate
deriv(haproxy_frontend_current_sessions[30m])

# Peak hour comparison
max_over_time(haproxy_frontend_current_sessions[1h])

Troubleshooting

Verify Metrics Collection

# Check if Prometheus can reach HAProxy
curl -s http://prometheus:9090/api/v1/targets | jq '.data.activeTargets[] | select(.job=="haproxy")'

# Query current metrics
curl -s "http://prometheus:9090/api/v1/query?query=haproxy_process_current_connections" | jq .

# Check for scrape errors
curl -s http://prometheus:9090/api/v1/targets | jq '.data.activeTargets[] | select(.health!="up")'

Common Issues

  1. Metrics not appearing
    • Verify HAProxy version supports Prometheus exporter
    • Check firewall rules for port 8405
    • Confirm prometheus-exporter service is configured
  2. High cardinality
    • Limit label values in relabeling
    • Use recording rules for aggregations
  3. Stale metrics
    • Check scrape interval settings
    • Verify HAProxy is responding to health checks

Summary

With Prometheus monitoring configured, you have:

  • Real-time visibility into HAProxy performance
  • Historical data for trend analysis and capacity planning
  • Proactive alerting for issues before they impact users
  • Beautiful Grafana dashboards for operations teams

This completes the HAProxy monitoring series. Combined with the previous posts on load balancing, mTLS, Data Plane API, and Consul integration, you now have a comprehensive toolkit for building and operating production-grade HAProxy infrastructure.

Read more