This page is two things in one.

The first half is a complete reference to HAProxy as it applies to log-shipping pipelines: what it is, how the configuration is structured, the knobs that matter, the pitfalls that bite first-timers.

The second half is a real use-case example, showing the HAProxy configuration I used for my Observability logs lab (ELK stack).

Theory

1. What is HAProxy?

HAProxy (High Availability Proxy) is the de-facto open-source load balancer.

Born in 2001, written in C, runs as a single event-driven process, comfortably handles tens of thousands of concurrent connections on commodity hardware.

For an ELK pipeline, HAProxy plays a focused role: accept incoming Beats or Kafka traffic from many producers, and distribute it across a pool of Logstash workers.

2. Configuration anatomy

An haproxy.cfg has a fixed structure with four kinds of sections:

SectionPurpose
globalProcess-wide settings: user/group, log destination, max connections, stats socket, daemon mode
defaultsDefault values inherited by all frontend/backend blocks unless overridden
frontendHow clients connect: which port to bind, which mode (TCP/HTTP), default backend
backendA pool of upstream servers with a balance algorithm and health checks

3. TCP vs HTTP mode

HAProxy can operate in two modes per frontend/backend:

  • mode http: HAProxy parses every request, understands URL/method/headers/cookies. Can route based on URL, rewrite headers, terminate TLS, do sticky sessions.
  • mode tcp: HAProxy treats traffic as opaque bytes: pure L4 forwarding. No parsing, lower CPU, works for any protocol.

For log shippers (Beats, Syslog, Kafka) the answer is always TCP mode: the wire protocols aren’t HTTP, and we don’t need any L7 features, just distribute connections.

4. Load-balancing algorithms

The balance directive in a backend block decides which server gets the next connection:

AlgorithmBehaviorWhen to use
roundrobinEach new connection goes to the next server in the listDefault. Works when workers are equally powerful.
leastconnNew connection goes to the server with the fewest active connectionsWhen session duration varies a lot
sourceHash client IP β†’ consistent server (stickiness without cookies)When you need affinity but the protocol is plain TCP
randomPure random pickRare: useful when you want to spread without state

For Beats traffic, roundrobin is the textbook choice.

Beats connections are long-lived (a single TCP socket stays open for minutes), and Logstash workers are stateless: any worker can process any event.

There’s no upside to anything fancier here.

5. Health checks

A backend server is only used if HAProxy considers it healthy.

Two probe styles:

  • TCP: open a TCP connection to server:port, close it. If it accepts, the server is up. Enabled with option tcp-check.
  • HTTP: GET /health (or any URL you choose). Enabled with option httpchk.

For Beats backends, TCP is enough: Logstash either listens on :5044 or it doesn’t, there’s no separate health endpoint to query.

The probe cadence is tuned with default-server:

default-server inter 5s fall 3 rise 2

Reads as: probe every 5s, mark a server DOWN after 3 consecutive failures (15 s), mark it UP again after 2 consecutive successes (10 s).

6. The stats endpoint

HAProxy ships with a built-in HTML/CSV stats UI that shows every frontend/backend’s connection count, byte counters, server status, error rates, and per-session response time histograms.

Enable it on a dedicated frontend:

frontend stats
    bind *:8404
    mode http
    stats enable
    stats uri /
    stats refresh 10s

Then http://<haproxy-host>:8404/ (in a browser) or curl http://<haproxy-host>:8404/;csv gives you live visibility into the LB layer.

This page should exist in every HAProxy deployment, it’s the first thing you’ll look at when debugging.

7. Timeouts

This is the most common source of bugs in HAProxy setups that forward long-lived TCP.

The relevant timeouts in the defaults section:

TimeoutDefaultWhat it controls
timeout connectnoneTCP connect timeout to a backend server
timeout clientnoneHow long to keep an idle client connection
timeout servernoneSame, for the upstream side
timeout tunnellarger of client/serverMaximum lifetime of a forwarded TCP tunnel
timeout checkvariesCap on the duration of a single health check

IMPORTANT

For log shippers, set client/server/tunnel to 24h.

The Debian/Ubuntu default timeout client 1m will kill long-lived Beats TCP sessions every 60 seconds, producing a constant reconnect storm in the producer logs.

Hours can be spent debugging this exact symptom because the reconnects β€œwork”, they just shouldn’t be happening.

8. Common pitfalls

WARNING

mode tcp without option tcplog only logs a line when a session terminates β€” which for long-lived Beats sessions can be hours later. Always add option tcplog in a TCP frontend: it logs each session with byte counts and a state code, far more useful for diagnosing live traffic.

WARNING

option dontlognull drops log lines for sessions that never sent/received bytes (probes, port scanners, broken clients). Without it, the log is full of noise.

INFO

server <name> <ip>:<port> is hardcoded β€” HAProxy resolves the hostname once at config-reload time, not at every connection. If your backends are DNS names with dynamic IPs (Kubernetes Service, Docker), you need a resolvers block to keep the resolution live.

For static IPs (typical in this lab) this isn’t a concern.

TIP

Validate the config before reloading. haproxy -c -f /etc/haproxy/haproxy.cfg parses without applying.

A syntax error during a live reload causes HAProxy to keep running the OLD config silently.

Architecture

A single HAProxy is, of course, a single point of failure and does not allow for any traffic distribution, nor has a built-in queue system like Kafka or RabbitMQ.

To make at least the LB layer redundant, you need a separate component that:

  1. Constantly monitors which HAProxy instance is alive

  2. Owns a Virtual IP (VIP) that clients target, and migrates it to the surviving node on failure.

There are two mainstream schools for doing this on Linux.

1. HAProxy + Keepalived (this lab)

Two HAProxy instances on two hosts, both running the same config, plus Keepalived on each host implementing VRRP.

Clients always target the VIP: whichever HAProxy currently owns the VIP serves the traffic.

If that node dies, the VIP migrates to the standby within ~3 seconds.

  • HAProxy itself runs active-active at process level (both daemons are always up), but the VIP is active-passive: so only one node receives actual traffic at a time.

  • Minimal configuration, no quorum, no fencing: split-brain is theoretically possible under a network partition, but for a stateless LB in front of stateless workers, the blast radius is small.

This is what the lab uses: the standard open-source recipe for β€œhighly-available HAProxy”.

And honestly?

From personal experience, it’s even better (more reliable/stable) than the next option, in my opinion, but of course, it has less features and it’s less customizable.

2. HAProxy + Pacemaker + Corosync

In larger environments, HAProxy is fronted by a full cluster manager instead of Keepalived:

  • Corosync: the messaging / membership layer. Handles β€œwho’s in the cluster, who’s reachable, do we have quorum”, but doesn’t manage any resource on its own.

  • Pacemaker: the resource manager that sits on top of Corosync. Treats both the VIP (ocf:heartbeat:IPaddr2) and HAProxy itself (systemd:haproxy) as cluster resources, with explicit dependencies, colocation, and ordering rules.

The trade-off vs. Keepalived:

AspectKeepalived (lab)Pacemaker + Corosync (enterprise)
ProtocolVRRP (L2 multicast)Corosync totem ring (UDP)
Resources managedVIP only (+ check scripts)Anything (VIP, services, mounts, …)
Quorum / fencingNoneNative quorum + STONITH
ConfigurationA few dozen linespcs / crm tooling, more articulated
Vendor supportCommunity onlyRed Hat HA, SUSE HAE…

When you choose this path you get stronger guarantees against split-brain (fencing is mandatory), the ability to express things like β€œif haproxy dies on node A, migrate the VIP to node B and restart haproxy there”, and integration with an HA stack you may already have for databases or shared storage.

The cost is operational complexity: more daemons, more config.

INFO

About β€œHeartbeat”. You may hear this stack referred to generically as Heartbeat, especially in older docs or from veteran sysadmins. Historically Heartbeat was the original Linux-HA daemon (pre-2009) that did both messaging and resource management; the project later split into Corosync (messaging) + Pacemaker (resource manager).

The Heartbeat daemon still exists as a legacy messaging layer but is rarely used in new deployments: modern clusters are Corosync-based. Treat β€œHeartbeat” as the name of the concept / legacy stack, not of a current component (the ocf:heartbeat:* resource agent namespace is a naming leftover from those years).


Installation + Configuration: my ELK lab

Let’s see now how to install it and configure it.

For my ELK lab, I have two HAProxy instances on loglb01 (10.0.0.11) and loglb02 (10.0.0.12), behind a shared VIP at 10.0.0.10 managed by Keepalived (we’ll see how to configure it too).

Each HAProxy listens on:

  • :5044: Beats input. Filebeat clients (on the producer side) send events here.
  • :8404: stats UI, private network only.

The backend pool is the two Logstash workers, logstash01 (10.0.0.21) and logstash02 (10.0.0.22), distributed round-robin with TCP health checks.

1. The configuration file

/etc/haproxy/haproxy.cfg β€” identical on both loglb01 and loglb02:

  • global block: standard daemon setup. Log forwarding to syslog via /dev/log (UNIX socket).
  • defaults β†’ mode tcp: the entire instance defaults to TCP. The stats frontend later overrides this to mode http for the dashboard.
  • defaults β†’ timeout client/server/tunnel 24h: the critical setting for long-lived Beats connections. With the package default of 1m you’d see Filebeat reconnect every minute, polluting the logs.
  • frontend stats on :8404: the HTML/CSV stats UI. In TCP mode it would be opaque, so this single frontend overrides to mode http.
  • frontend beats_in on :5044: the actual Beats receiver. option tcplog ensures each session is logged with byte counts and a state code on close.
  • backend logstash_pool:
    • balance roundrobin distributes connections one-by-one across logstash01 and logstash02.
    • option tcp-check says β€œprobe by opening a TCP socket”: no Beats handshake, just a connect.
    • default-server inter 5s fall 3 rise 2 applies to both server lines: probe cadence is 5 s, mark DOWN after three failures (15 s), mark UP after two successes (10 s).

2. Install and bring it up

# On both loglb01 and loglb02
sudo apt update
sudo apt install -y haproxy
 
# Drop the config above into /etc/haproxy/haproxy.cfg
sudo nano /etc/haproxy/haproxy.cfg
 
# Validate before applying β€” never skip this
sudo haproxy -c -f /etc/haproxy/haproxy.cfg
# Output: "Configuration file is valid"
 
# Enable + start
sudo systemctl enable --now haproxy
sudo systemctl status haproxy --no-pager | head -5

3. Verify the pool

From any host on the private network:

curl -s "http://10.0.0.11:8404/;csv" | awk -F',' '
  NR==1 { print "frontend/backend  svname        scur  bin       bout      status"; next }
  $1=="beats_in" || $1=="logstash_pool" {
    printf "%-17s %-13s %-5s %-9s %-9s %s\n", $1, $2, $5, $9, $10, $18
  }'

Expected output once Logstash workers are up:

frontend/backend  svname        scur  bin       bout      status
beats_in          FRONTEND      0     0         0         OPEN
logstash_pool     logstash01    0     0         0         UP
logstash_pool     logstash02    0     0         0         UP
logstash_pool     BACKEND       0     0         0         UP

As Filebeat clients connect, scur (current sessions) and bin/bout (byte counters) start moving. If a Logstash worker dies, its row flips to DOWN within ~15 s and scur redistributes to the survivors.

4. Reload after config changes

sudo haproxy -c -f /etc/haproxy/haproxy.cfg && sudo systemctl reload haproxy

A clean reload doesn’t drop existing connections: HAProxy spawns the new process and drains the old one gracefully!

Where to go next

  • Keepalived VRRP: let’s see the VRRP layer that owns the VIP and migrates it between loglb01 and loglb02 on failure.