The first half is a complete reference to HAProxy as it applies to log-shipping pipelines: what it is, how the configuration is structured, the knobs that matter, the pitfalls that bite first-timers.
The second half is a real use-case example, showing the HAProxy configuration I used for my Observability logs lab (ELK stack).
Theory
1. What is HAProxy?
HAProxy (High Availability Proxy) is the de-facto open-source load balancer.
Born in 2001, written in C, runs as a single event-driven process, comfortably handles tens of thousands of concurrent connections on commodity hardware.
For an ELK pipeline, HAProxy plays a focused role: accept incoming Beats or Kafka traffic from many producers, and distribute it across a pool of Logstash workers.
2. Configuration anatomy
An haproxy.cfg has a fixed structure with four kinds of sections:
Default values inherited by all frontend/backend blocks unless overridden
frontend
How clients connect: which port to bind, which mode (TCP/HTTP), default backend
backend
A pool of upstream servers with a balance algorithm and health checks
3. TCP vs HTTP mode
HAProxy can operate in two modes per frontend/backend:
mode http: HAProxy parses every request, understands URL/method/headers/cookies. Can route based on URL, rewrite headers, terminate TLS, do sticky sessions.
mode tcp: HAProxy treats traffic as opaque bytes: pure L4 forwarding. No parsing, lower CPU, works for any protocol.
For log shippers (Beats, Syslog, Kafka) the answer is always TCP mode: the wire protocols arenβt HTTP, and we donβt need any L7 features, just distribute connections.
4. Load-balancing algorithms
The balance directive in a backend block decides which server gets the next connection:
Algorithm
Behavior
When to use
roundrobin
Each new connection goes to the next server in the list
Default. Works when workers are equally powerful.
leastconn
New connection goes to the server with the fewest active connections
When session duration varies a lot
source
Hash client IP β consistent server (stickiness without cookies)
When you need affinity but the protocol is plain TCP
random
Pure random pick
Rare: useful when you want to spread without state
For Beats traffic, roundrobin is the textbook choice.
Beats connections are long-lived (a single TCP socket stays open for minutes), and Logstash workers are stateless: any worker can process any event.
Thereβs no upside to anything fancier here.
5. Health checks
A backend server is only used if HAProxy considers it healthy.
Two probe styles:
TCP: open a TCP connection to server:port, close it. If it accepts, the server is up. Enabled with option tcp-check.
HTTP: GET /health (or any URL you choose). Enabled with option httpchk.
For Beats backends, TCP is enough: Logstash either listens on :5044 or it doesnβt, thereβs no separate health endpoint to query.
The probe cadence is tuned with default-server:
default-server inter 5s fall 3 rise 2
Reads as: probe every 5s, mark a server DOWN after 3 consecutive failures (15 s), mark it UP again after 2 consecutive successes (10 s).
6. The stats endpoint
HAProxy ships with a built-in HTML/CSV stats UI that shows every frontend/backendβs connection count, byte counters, server status, error rates, and per-session response time histograms.
Then http://<haproxy-host>:8404/ (in a browser) or curl http://<haproxy-host>:8404/;csv gives you live visibility into the LB layer.
This page should exist in every HAProxy deployment, itβs the first thing youβll look at when debugging.
7. Timeouts
This is the most common source of bugs in HAProxy setups that forward long-lived TCP.
The relevant timeouts in the defaults section:
Timeout
Default
What it controls
timeout connect
none
TCP connect timeout to a backend server
timeout client
none
How long to keep an idle client connection
timeout server
none
Same, for the upstream side
timeout tunnel
larger of client/server
Maximum lifetime of a forwarded TCP tunnel
timeout check
varies
Cap on the duration of a single health check
IMPORTANT
For log shippers, set client/server/tunnel to 24h.
The Debian/Ubuntu default timeout client 1m will kill long-lived Beats TCP sessions every 60 seconds, producing a constant reconnect storm in the producer logs.
Hours can be spent debugging this exact symptom because the reconnects βworkβ, they just shouldnβt be happening.
8. Common pitfalls
WARNING
mode tcp without option tcplog only logs a line when a session terminates β which for long-lived Beats sessions can be hours later. Always add option tcplog in a TCP frontend: it logs each session with byte counts and a state code, far more useful for diagnosing live traffic.
WARNING
option dontlognull drops log lines for sessions that never sent/received bytes (probes, port scanners, broken clients). Without it, the log is full of noise.
INFO
server <name> <ip>:<port> is hardcoded β HAProxy resolves the hostname once at config-reload time, not at every connection. If your backends are DNS names with dynamic IPs (Kubernetes Service, Docker), you need a resolvers block to keep the resolution live.
For static IPs (typical in this lab) this isnβt a concern.
TIP
Validate the config before reloading.haproxy -c -f /etc/haproxy/haproxy.cfg parses without applying.
A syntax error during a live reload causes HAProxy to keep running the OLD config silently.
Architecture
A single HAProxy is, of course, a single point of failure and does not allow for any traffic distribution, nor has a built-in queue system like Kafka or RabbitMQ.
To make at least the LB layer redundant, you need a separate component that:
Constantly monitors which HAProxy instance is alive
Owns a Virtual IP (VIP) that clients target, and migrates it to the surviving node on failure.
There are two mainstream schools for doing this on Linux.
1. HAProxy + Keepalived (this lab)
Two HAProxy instances on two hosts, both running the same config, plus Keepalived on each host implementing VRRP.
Clients always target the VIP: whichever HAProxy currently owns the VIP serves the traffic.
If that node dies, the VIP migrates to the standby within ~3 seconds.
HAProxy itself runs active-active at process level (both daemons are always up), but the VIP is active-passive: so only one node receives actual traffic at a time.
Minimal configuration, no quorum, no fencing: split-brain is theoretically possible under a network partition, but for a stateless LB in front of stateless workers, the blast radius is small.
This is what the lab uses: the standard open-source recipe for βhighly-available HAProxyβ.
And honestly?
From personal experience, itβs even better (more reliable/stable) than the next option, in my opinion, but of course, it has less features and itβs less customizable.
2. HAProxy + Pacemaker + Corosync
In larger environments, HAProxy is fronted by a full cluster manager instead of Keepalived:
Corosync: the messaging / membership layer. Handles βwhoβs in the cluster, whoβs reachable, do we have quorumβ, but doesnβt manage any resource on its own.
Pacemaker: the resource manager that sits on top of Corosync. Treats both the VIP (ocf:heartbeat:IPaddr2) and HAProxy itself (systemd:haproxy) as cluster resources, with explicit dependencies, colocation, and ordering rules.
The trade-off vs. Keepalived:
Aspect
Keepalived (lab)
Pacemaker + Corosync (enterprise)
Protocol
VRRP (L2 multicast)
Corosync totem ring (UDP)
Resources managed
VIP only (+ check scripts)
Anything (VIP, services, mounts, β¦)
Quorum / fencing
None
Native quorum + STONITH
Configuration
A few dozen lines
pcs / crm tooling, more articulated
Vendor support
Community only
Red Hat HA, SUSE HAEβ¦
When you choose this path you get stronger guarantees against split-brain (fencing is mandatory), the ability to express things like βif haproxy dies on node A, migrate the VIP to node B and restart haproxy thereβ, and integration with an HA stack you may already have for databases or shared storage.
The cost is operational complexity: more daemons, more config.
INFO
About βHeartbeatβ. You may hear this stack referred to generically as Heartbeat, especially in older docs or from veteran sysadmins. Historically Heartbeat was the original Linux-HA daemon (pre-2009) that did both messaging and resource management; the project later split into Corosync (messaging) + Pacemaker (resource manager).
The Heartbeat daemon still exists as a legacy messaging layer but is rarely used in new deployments: modern clusters are Corosync-based. Treat βHeartbeatβ as the name of the concept / legacy stack, not of a current component (the ocf:heartbeat:* resource agent namespace is a naming leftover from those years).
Installation + Configuration: my ELK lab
Letβs see now how to install it and configure it.
For my ELK lab, I have two HAProxy instances on loglb01 (10.0.0.11) and loglb02 (10.0.0.12), behind a shared VIP at 10.0.0.10 managed by Keepalived (weβll see how to configure it too).
Each HAProxy listens on:
:5044: Beats input. Filebeat clients (on the producer side) send events here.
:8404: stats UI, private network only.
The backend pool is the two Logstash workers, logstash01 (10.0.0.21) and logstash02 (10.0.0.22), distributed round-robin with TCP health checks.
1. The configuration file
/etc/haproxy/haproxy.cfg β identical on both loglb01 and loglb02:
Example: my haproxy.cfg
global log /dev/log local0 log /dev/log local1 notice chroot /var/lib/haproxy stats socket /run/haproxy/admin.sock mode 660 level admin stats timeout 30s user haproxy group haproxy daemondefaults log global mode tcp option dontlognull timeout connect 5s timeout client 24h timeout server 24h timeout tunnel 24h timeout check 5s# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ# Stats UI on :8404 β private network only# Browse http://<lb-ip>:8404/ to see live backend status# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββfrontend stats bind *:8404 mode http stats enable stats uri / stats refresh 10s# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ# Beats input on :5044 β round-robin to Logstash workers# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββfrontend beats_in bind *:5044 mode tcp option tcplog default_backend logstash_poolbackend logstash_pool mode tcp balance roundrobin option tcp-check default-server inter 5s fall 3 rise 2 server logstash01 10.0.0.21:5044 check server logstash02 10.0.0.22:5044 check
global block: standard daemon setup. Log forwarding to syslog via /dev/log (UNIX socket).
defaults β mode tcp: the entire instance defaults to TCP. The stats frontend later overrides this to mode http for the dashboard.
defaults β timeout client/server/tunnel 24h: the critical setting for long-lived Beats connections. With the package default of 1m youβd see Filebeat reconnect every minute, polluting the logs.
frontend stats on :8404: the HTML/CSV stats UI. In TCP mode it would be opaque, so this single frontend overrides to mode http.
frontend beats_in on :5044: the actual Beats receiver. option tcplog ensures each session is logged with byte counts and a state code on close.
backend logstash_pool:
balance roundrobin distributes connections one-by-one across logstash01 and logstash02.
option tcp-check says βprobe by opening a TCP socketβ: no Beats handshake, just a connect.
default-server inter 5s fall 3 rise 2 applies to both server lines: probe cadence is 5 s, mark DOWN after three failures (15 s), mark UP after two successes (10 s).
2. Install and bring it up
# On both loglb01 and loglb02sudo apt updatesudo apt install -y haproxy# Drop the config above into /etc/haproxy/haproxy.cfgsudo nano /etc/haproxy/haproxy.cfg# Validate before applying β never skip thissudo haproxy -c -f /etc/haproxy/haproxy.cfg# Output: "Configuration file is valid"# Enable + startsudo systemctl enable --now haproxysudo systemctl status haproxy --no-pager | head -5
frontend/backend svname scur bin bout status
beats_in FRONTEND 0 0 0 OPEN
logstash_pool logstash01 0 0 0 UP
logstash_pool logstash02 0 0 0 UP
logstash_pool BACKEND 0 0 0 UP
As Filebeat clients connect, scur (current sessions) and bin/bout (byte counters) start moving. If a Logstash worker dies, its row flips to DOWN within ~15 s and scur redistributes to the survivors.