HAProxy Monitoring with Prometheus: Complete Observability Guide
Monitoring HAProxy is essential for maintaining reliable load balancing infrastructure. Prometheus provides powerful metrics collection, alerting capabilities, and seamless Grafana integration for visualizing HAProxy performance and health.
Why Prometheus for HAProxy?
Prometheus offers:
- Pull-based metrics - Prometheus scrapes HAProxy metrics endpoints
- Time-series database - Store historical data for trend analysis
- Powerful queries - PromQL for complex metric analysis
- Alerting - Define rules for proactive incident detection
- Grafana integration - Beautiful dashboards and visualizations
HAProxy Built-in Prometheus Support
HAProxy 2.0+ includes native Prometheus metrics export via the built-in Prometheus exporter.
Enable Prometheus Metrics in HAProxy
Add to /etc/haproxy/haproxy.cfg:
global
log /dev/log local0
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
# Prometheus metrics frontend
frontend prometheus
bind *:8405
mode http
http-request use-service prometheus-exporter if { path /metrics }
no log
stats enable
stats uri /stats
stats refresh 10s
# Stats page for human viewing
frontend stats
bind *:8404
mode http
stats enable
stats uri /stats
stats refresh 10s
stats admin if LOCALHOST
# Application frontends and backends
frontend http_front
bind *:80
bind *:443 ssl crt /etc/haproxy/certs/
default_backend web_servers
backend web_servers
balance roundrobin
option httpchk GET /health
server web1 192.168.1.10:8080 check
server web2 192.168.1.11:8080 check
server web3 192.168.1.12:8080 check
Verify Metrics Endpoint
# Test Prometheus metrics endpoint
curl http://localhost:8405/metrics
# Sample output:
# HELP haproxy_process_nbthread Number of threads
# TYPE haproxy_process_nbthread gauge
haproxy_process_nbthread 4
# HELP haproxy_process_current_connections Current number of connections
# TYPE haproxy_process_current_connections gauge
haproxy_process_current_connections 42
Prometheus Configuration
Basic Prometheus Setup
Create /etc/prometheus/prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
rule_files:
- "haproxy_alerts.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'haproxy'
static_configs:
- targets:
- 'haproxy1.example.com:8405'
- 'haproxy2.example.com:8405'
metrics_path: /metrics
scrape_interval: 10s
scrape_timeout: 5s
Service Discovery for Multiple HAProxy Instances
scrape_configs:
- job_name: 'haproxy'
file_sd_configs:
- files:
- '/etc/prometheus/haproxy_targets.json'
refresh_interval: 30s
Create /etc/prometheus/haproxy_targets.json:
[
{
"targets": [
"haproxy1.example.com:8405",
"haproxy2.example.com:8405"
],
"labels": {
"env": "production",
"dc": "dc1"
}
},
{
"targets": [
"haproxy-staging.example.com:8405"
],
"labels": {
"env": "staging",
"dc": "dc1"
}
}
]
Consul Service Discovery
scrape_configs:
- job_name: 'haproxy'
consul_sd_configs:
- server: 'consul.example.com:8500'
services:
- 'haproxy'
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: .*,prometheus,.*
action: keep
- source_labels: [__meta_consul_service]
target_label: job
- source_labels: [__meta_consul_node]
target_label: instance
Key HAProxy Metrics
Process Metrics
# Current connections
haproxy_process_current_connections
# Connection rate
rate(haproxy_process_connections_total[5m])
# Memory usage
haproxy_process_pool_allocated_bytes
haproxy_process_pool_used_bytes
# CPU usage
haproxy_process_cpu_user_seconds_total
haproxy_process_cpu_system_seconds_total
# Uptime
haproxy_process_start_time_seconds
Frontend Metrics
# Request rate per frontend
rate(haproxy_frontend_http_requests_total[5m])
# Current sessions
haproxy_frontend_current_sessions
# Bytes in/out
rate(haproxy_frontend_bytes_in_total[5m])
rate(haproxy_frontend_bytes_out_total[5m])
# HTTP response codes
rate(haproxy_frontend_http_responses_total{code="2xx"}[5m])
rate(haproxy_frontend_http_responses_total{code="4xx"}[5m])
rate(haproxy_frontend_http_responses_total{code="5xx"}[5m])
# Request errors
rate(haproxy_frontend_request_errors_total[5m])
# Denied requests
rate(haproxy_frontend_requests_denied_total[5m])
Backend Metrics
# Backend response time
haproxy_backend_response_time_average_seconds
haproxy_backend_total_time_average_seconds
# Queue length
haproxy_backend_current_queue
haproxy_backend_max_queue
# Active servers
haproxy_backend_active_servers
# Connection errors
rate(haproxy_backend_connection_errors_total[5m])
# Response errors
rate(haproxy_backend_response_errors_total[5m])
# Retry rate
rate(haproxy_backend_retry_warnings_total[5m])
# Sessions
haproxy_backend_current_sessions
haproxy_backend_max_sessions
Server Metrics
# Server status (1=UP, 0=DOWN)
haproxy_server_status
# Server weight
haproxy_server_weight
# Current sessions per server
haproxy_server_current_sessions
# Connection time
haproxy_server_connect_time_average_seconds
# Response time
haproxy_server_response_time_average_seconds
# Health check status
haproxy_server_check_status
# Failed health checks
rate(haproxy_server_check_failures_total[5m])
Alerting Rules
Create /etc/prometheus/haproxy_alerts.yml:
groups:
- name: haproxy
rules:
# High error rate
- alert: HAProxyHighErrorRate
expr: |
sum(rate(haproxy_frontend_http_responses_total{code="5xx"}[5m])) by (proxy)
/
sum(rate(haproxy_frontend_http_responses_total[5m])) by (proxy)
> 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High 5xx error rate on {{ $labels.proxy }}"
description: "Error rate is {{ $value | humanizePercentage }} on frontend {{ $labels.proxy }}"
# Backend down
- alert: HAProxyBackendDown
expr: haproxy_backend_active_servers == 0
for: 1m
labels:
severity: critical
annotations:
summary: "HAProxy backend {{ $labels.proxy }} has no active servers"
description: "All servers in backend {{ $labels.proxy }} are down"
# Server down
- alert: HAProxyServerDown
expr: haproxy_server_status == 0
for: 2m
labels:
severity: warning
annotations:
summary: "Server {{ $labels.server }} is down"
description: "Server {{ $labels.server }} in backend {{ $labels.proxy }} is down"
# High connection count
- alert: HAProxyHighConnections
expr: |
haproxy_frontend_current_sessions
/
haproxy_frontend_limit_sessions
> 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "High connection usage on {{ $labels.proxy }}"
description: "Frontend {{ $labels.proxy }} is at {{ $value | humanizePercentage }} of max connections"
# Queue building up
- alert: HAProxyQueueNotEmpty
expr: haproxy_backend_current_queue > 10
for: 2m
labels:
severity: warning
annotations:
summary: "Queue building up on {{ $labels.proxy }}"
description: "Backend {{ $labels.proxy }} has {{ $value }} requests queued"
# High response time
- alert: HAProxyHighResponseTime
expr: haproxy_backend_response_time_average_seconds > 2
for: 5m
labels:
severity: warning
annotations:
summary: "High response time on {{ $labels.proxy }}"
description: "Backend {{ $labels.proxy }} average response time is {{ $value }}s"
# Health check failures
- alert: HAProxyHealthCheckFailures
expr: increase(haproxy_server_check_failures_total[10m]) > 5
for: 5m
labels:
severity: warning
annotations:
summary: "Health check failures on {{ $labels.server }}"
description: "Server {{ $labels.server }} has {{ $value }} health check failures"
# Connection errors
- alert: HAProxyConnectionErrors
expr: rate(haproxy_backend_connection_errors_total[5m]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "Connection errors to {{ $labels.proxy }}"
description: "Backend {{ $labels.proxy }} has {{ $value }}/s connection errors"
# HAProxy instance down
- alert: HAProxyDown
expr: up{job="haproxy"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "HAProxy instance {{ $labels.instance }} is down"
description: "HAProxy at {{ $labels.instance }} is not responding"
# SSL certificate expiry
- alert: HAProxySSLCertExpiringSoon
expr: haproxy_ssl_cert_not_after - time() < 7 * 24 * 3600
for: 1h
labels:
severity: warning
annotations:
summary: "SSL certificate expiring soon"
description: "Certificate expires in {{ $value | humanizeDuration }}"
Grafana Dashboards
HAProxy Overview Dashboard JSON
Create a comprehensive dashboard with these panels:
{
"title": "HAProxy Overview",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "sum(rate(haproxy_frontend_http_requests_total[5m])) by (proxy)",
"legendFormat": "{{ proxy }}"
}
]
},
{
"title": "HTTP Response Codes",
"type": "graph",
"targets": [
{
"expr": "sum(rate(haproxy_frontend_http_responses_total[5m])) by (code)",
"legendFormat": "{{ code }}"
}
]
},
{
"title": "Backend Response Time",
"type": "graph",
"targets": [
{
"expr": "haproxy_backend_response_time_average_seconds",
"legendFormat": "{{ proxy }}"
}
]
},
{
"title": "Active Connections",
"type": "gauge",
"targets": [
{
"expr": "sum(haproxy_frontend_current_sessions)"
}
]
},
{
"title": "Server Health",
"type": "table",
"targets": [
{
"expr": "haproxy_server_status",
"format": "table",
"instant": true
}
]
}
]
}
Key Dashboard Panels
Traffic Overview:
# Total requests per second
sum(rate(haproxy_frontend_http_requests_total[5m]))
# Bandwidth in/out
sum(rate(haproxy_frontend_bytes_in_total[5m])) * 8 # bits/sec
sum(rate(haproxy_frontend_bytes_out_total[5m])) * 8
Error Analysis:
# Error rate percentage
100 * sum(rate(haproxy_frontend_http_responses_total{code=~"5.."}[5m]))
/
sum(rate(haproxy_frontend_http_responses_total[5m]))
# 4xx rate
100 * sum(rate(haproxy_frontend_http_responses_total{code=~"4.."}[5m]))
/
sum(rate(haproxy_frontend_http_responses_total[5m]))
Backend Performance:
# Response time histogram
histogram_quantile(0.95, sum(rate(haproxy_backend_http_response_time_seconds_bucket[5m])) by (le, proxy))
# Queue time
haproxy_backend_queue_time_average_seconds
Server Status:
# Server availability
sum(haproxy_server_status) by (proxy)
/
count(haproxy_server_status) by (proxy)
# Server weight distribution
haproxy_server_weight / ignoring(server) group_left sum(haproxy_server_weight) by (proxy)
Recording Rules
Optimize query performance with recording rules in /etc/prometheus/haproxy_rules.yml:
groups:
- name: haproxy_recording
rules:
# Request rate by frontend
- record: haproxy:frontend:request_rate:5m
expr: sum(rate(haproxy_frontend_http_requests_total[5m])) by (proxy, instance)
# Error rate by frontend
- record: haproxy:frontend:error_rate:5m
expr: |
sum(rate(haproxy_frontend_http_responses_total{code="5xx"}[5m])) by (proxy, instance)
/
sum(rate(haproxy_frontend_http_responses_total[5m])) by (proxy, instance)
# Backend availability
- record: haproxy:backend:availability
expr: |
sum(haproxy_server_status) by (proxy, instance)
/
count(haproxy_server_status) by (proxy, instance)
# Session utilization
- record: haproxy:frontend:session_utilization
expr: |
haproxy_frontend_current_sessions
/
haproxy_frontend_limit_sessions
# Bytes total rate
- record: haproxy:frontend:bytes_total_rate:5m
expr: |
sum(rate(haproxy_frontend_bytes_in_total[5m])) by (proxy, instance)
+
sum(rate(haproxy_frontend_bytes_out_total[5m])) by (proxy, instance)
HAProxy Exporter (Alternative)
For HAProxy versions without native Prometheus support, use the official HAProxy Exporter:
Installation
# Download HAProxy Exporter
EXPORTER_VERSION="0.15.0"
wget https://github.com/prometheus/haproxy_exporter/releases/download/v${EXPORTER_VERSION}/haproxy_exporter-${EXPORTER_VERSION}.linux-amd64.tar.gz
tar xzf haproxy_exporter-${EXPORTER_VERSION}.linux-amd64.tar.gz
sudo mv haproxy_exporter-${EXPORTER_VERSION}.linux-amd64/haproxy_exporter /usr/local/bin/
HAProxy Stats Socket Configuration
global
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats socket ipv4@127.0.0.1:9999 level admin
stats timeout 30s
Exporter Systemd Service
Create /etc/systemd/system/haproxy_exporter.service:
[Unit]
Description=HAProxy Exporter
After=network.target haproxy.service
[Service]
Type=simple
User=haproxy
Group=haproxy
ExecStart=/usr/local/bin/haproxy_exporter \
--haproxy.scrape-uri="unix:/run/haproxy/admin.sock" \
--web.listen-address=":9101"
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
Prometheus Configuration for Exporter
scrape_configs:
- job_name: 'haproxy'
static_configs:
- targets: ['haproxy1.example.com:9101']
Advanced Monitoring Patterns
Multi-Instance Aggregation
# Total request rate across all instances
sum(rate(haproxy_frontend_http_requests_total[5m]))
# Per-datacenter metrics
sum(rate(haproxy_frontend_http_requests_total[5m])) by (dc)
# Highest error rate instance
topk(3, haproxy:frontend:error_rate:5m)
SLO Monitoring
# Availability SLO (99.9%)
1 - (
sum(rate(haproxy_frontend_http_responses_total{code="5xx"}[30d]))
/
sum(rate(haproxy_frontend_http_responses_total[30d]))
) > 0.999
# Error budget remaining
1 - (
sum(increase(haproxy_frontend_http_responses_total{code="5xx"}[30d]))
/
(sum(increase(haproxy_frontend_http_responses_total[30d])) * 0.001)
)
Capacity Planning
# Predict session exhaustion
predict_linear(haproxy_frontend_current_sessions[1h], 3600)
# Connection growth rate
deriv(haproxy_frontend_current_sessions[30m])
# Peak hour comparison
max_over_time(haproxy_frontend_current_sessions[1h])
Troubleshooting
Verify Metrics Collection
# Check if Prometheus can reach HAProxy
curl -s http://prometheus:9090/api/v1/targets | jq '.data.activeTargets[] | select(.job=="haproxy")'
# Query current metrics
curl -s "http://prometheus:9090/api/v1/query?query=haproxy_process_current_connections" | jq .
# Check for scrape errors
curl -s http://prometheus:9090/api/v1/targets | jq '.data.activeTargets[] | select(.health!="up")'
Common Issues
- Metrics not appearing
- Verify HAProxy version supports Prometheus exporter
- Check firewall rules for port 8405
- Confirm prometheus-exporter service is configured
- High cardinality
- Limit label values in relabeling
- Use recording rules for aggregations
- Stale metrics
- Check scrape interval settings
- Verify HAProxy is responding to health checks
Summary
With Prometheus monitoring configured, you have:
- Real-time visibility into HAProxy performance
- Historical data for trend analysis and capacity planning
- Proactive alerting for issues before they impact users
- Beautiful Grafana dashboards for operations teams
This completes the HAProxy monitoring series. Combined with the previous posts on load balancing, mTLS, Data Plane API, and Consul integration, you now have a comprehensive toolkit for building and operating production-grade HAProxy infrastructure.