The first half is a complete, general reference to Keepalived, while the second half is a real use-case example, showing the configuration I used for my Observability logs lab (ELK stack).
Theory
1. What is Keepalived?
Keepalived is an open-source daemon that does two related jobs:
VRRP (RFC 5798): election and ownership of a Virtual IP shared across a group of hosts. Exactly one host in the group “owns” the VIP at any moment, and if that host dies, ownership migrates to another within a few seconds.
Health-check + automatic failover: periodic probes of local services (HAProxy in our case). If the local service is unhealthy, Keepalived lowers its priority so that ownership of the VIP migrates to the standby.
The combination gives a transparent active/standby HA pair: clients hit the VIP, and the underlying instance answering them changes invisibly.
HAProxy gives us horizontal scaling of the worker pool behind it, but it does not make the load balancer itself redundant: a single HAProxy is still a single point of failure.
The standard fix is exactly to run two HAProxy hosts sharing a Virtual IP (VIP) that migrates between them automatically.
2. VRRP in 60 seconds
VRRP is a protocol that runs between routers (or generic hosts) on a shared layer-2 segment.
Each VRRP “instance” defines:
A Virtual Router ID (VRID): a number 1-255 that identifies the group on the segment
A set of participating hosts (routers), each with a priority value
A Virtual IP that the group collectively owns
The participating hosts send periodic VRRP advertisements to each other (multicast group 224.0.0.18, IP protocol 112).
The host with the highest priority wins the election and becomes MASTER: it claims the VIP, responds to ARP for it, and serves traffic.
The others sit in BACKUP state, listening for the master’s heartbeat.
If the master stops advertising for 3 × advert_int seconds, the highest-priority backup promotes itself to MASTER, attaches the VIP to its own interface, and sends a gratuitous ARP so the switch fabric updates its MAC tables.
The whole failover takes ~3 seconds on a healthy LAN.
WARNING
Preemption is on by default. When the original MASTER recovers, it takes the VIP back from the BACKUP, causing a second failover. If your service can’t tolerate the brief glitch, add nopreempt to the vrrp_instance to make the standby keep the VIP until it itself fails.
Core configuration concepts
1. vrrp_instance block
The heart of the config.
One block per VIP being managed.
vrrp_instance VI_NAME { state MASTER # initial role on this host interface eth0 # interface where the VIP lives virtual_router_id 51 # VRID (1-255, must match on all peers) priority 110 # election weight advert_int 1 # heartbeat period (seconds) authentication { ... } virtual_ipaddress { ... } track_script { ... }}
2. state and priority
state MASTER on the active host, state BACKUP on the standby. This is just the initial state: election happens at startup and may reassign roles based on priority.
priority is what really decides who wins. Highest priority host becomes MASTER. A common convention: 110 on the active, 100 on the standby (10-point gap to allow weight tracking adjustments without flapping).
3. virtual_router_id
A number 1-255 that identifies this VRRP group on the segment.
All peers in the same group MUST use the same VRID, and the VRID MUST be unique among VRRP groups on the same L2 segment.
4. advert_int
Heartbeat period in seconds (default 1). Failover detection time is 3 × advert_int (~3 s by default).
Lowering to fractions (advert_int 0.5) tightens failover at the cost of more CPU/bandwidth.
5. authentication
VRRP supports two auth modes:
PASS: a shared password (max 8 chars). Weak, but enough to prevent accidental cross-group collisions.
AH: IPsec-style auth header. Stronger but rarely used; many implementations don’t support it well.
Always set authentication, even on private networks: it prevents an accidental misconfigured neighbor from joining your VRRP group.
6. virtual_ipaddress
The VIP itself, with CIDR.
The MASTER attaches this to the configured interface when it claims ownership, so the BACKUP doesn’t have it.
Multiple VIPs per instance are allowed.
virtual_ipaddress { 10.0.0.10/24}
7. track_script
A reference to a vrrp_script block that periodically runs a command and adjusts priority based on its exit code. The standard pattern: track whether HAProxy is running on the local host.
If pgrep haproxy fails, drop priority by N and the standby takes over.
vrrp_script chk_haproxy { script "/usr/bin/pgrep -x haproxy" interval 2 # run every 2s weight -20 # subtract 20 from priority on failure fall 2 # need 2 consecutive failures to count rise 2 # need 2 consecutive successes to recover}vrrp_instance VI_LOGS { ... priority 110 track_script { chk_haproxy }}
With priority 110 on the active and priority 100 on the standby: if HAProxy on the active dies, weight -20 drops the effective priority to 90 → standby (100) wins → VIP migrates.
8. Multicast vs unicast
The default VRRP transport is multicast on 224.0.0.18.
This works on any L2 segment that allows multicast (i.e. almost any physical LAN, and almost no cloud VPC).
If multicast is unavailable (most public cloud networks filter it), Keepalived supports unicast mode: each peer sends advertisements as plain unicast to the others’ IPs.
Same protocol, same election logic: just 224.0.0.18 replaced by direct peer IPs.
Slightly higher per-peer config (you have to list each peer explicitly), but works in routed environments where multicast doesn’t.
9. Verification commands
The classics:
# State of each VRRP instance on this host (Keepalived 2.x exposes /tmp/keepalived.data on SIGUSR1)sudo killall -SIGUSR1 keepalivedsudo cat /tmp/keepalived.data | grep -A2 "VRRP Instance"# Or via journal: every state transition is loggedjournalctl -u keepalived --no-pager -n 20 | grep -E "STATE|MASTER|BACKUP"# Does the host have the VIP attached right now?ip addr show | grep 10.0.0.10# tcpdump the heartbeatsudo tcpdump -i eth0 -n vrrp
Installation + Configuration: my ELK lab
Let’s see now how to install it and configure it.
Two HAProxy hosts: loglb01 (10.0.0.11, the default MASTER) and loglb02 (10.0.0.12, the BACKUP).
Both run Keepalived. They share a VIP 10.0.0.10/24.
Clients (Filebeat producers) target the VIP, and whichever HAProxy currently owns it serves them.
A vrrp_script checks that the local HAProxy process is alive: if it dies, Keepalived lowers the local priority so the peer wins the election.
VRID 51, authentication password is 8 hex chars (generated with openssl rand -hex 4).
1. Master config: loglb01
/etc/keepalived/keepalived.conf on loglb01:
Example: my keepalived.conf (loglb01, MASTER)
global_defs { enable_script_security script_user root router_id loglb01}# Health check: kill the local priority if HAProxy stops runningvrrp_script chk_haproxy { script "/usr/bin/pgrep -x haproxy" interval 2 weight -20 fall 2 rise 2}vrrp_instance VI_LOGS { state MASTER interface eth0 virtual_router_id 51 priority 110 advert_int 1 authentication { auth_type PASS auth_pass <8-char-secret> } virtual_ipaddress { 10.0.0.10/24 } track_script { chk_haproxy }}
2. Backup config: loglb02
/etc/keepalived/keepalived.conf on loglb02.
Identical to loglb01 with two changes: state BACKUP, priority 100, and router_id loglb02.
global_defs: enable_script_security + script_user root are required for vrrp_script to execute under modern Keepalived. router_id is a string used only for logging: useful to distinguish which host is logging what.
vrrp_script chk_haproxy: checks the local process every 2 s. weight -20 says “subtract 20 from priority while the check fails”. With master = 110 and backup = 100, a HAProxy crash on the master lowers it to 90 → backup (100) wins.
state MASTER / BACKUP: just the initial role, doesn’t actually decide who wins.
virtual_router_id 51: chosen arbitrarily, must match on both peers, must not collide with any other VRRP group on this L2.
priority 110 / 100: the real election weights. 10-point gap leaves headroom for the -20 track-script adjustment.
auth_pass: same 8-char value on both peers. Set with openssl rand -hex 4 to get a clean ASCII secret.
virtual_ipaddress 10.0.0.10/24: the VIP. Note the /24 matches the LAN’s CIDR: important for ARP resolution to work correctly.
track_script { chk_haproxy }: ties the health-check to the priority adjustment.
3. Install and bring it up
# Both loglb01 and loglb02sudo apt updatesudo apt install -y keepalived# Drop the config above into /etc/keepalived/keepalived.confsudo nano /etc/keepalived/keepalived.confsudo chmod 600 /etc/keepalived/keepalived.conf# Validate before applyingsudo keepalived -t -f /etc/keepalived/keepalived.conf# Silent output = valid# Enable + startsudo systemctl enable --now keepalivedsleep 3sudo systemctl status keepalived --no-pager | head -6
4. Verify election and VIP ownership
# On loglb01 — should be MASTER, owns the VIPjournalctl -u keepalived -n 10 --no-pager | grep -E "STATE|MASTER|BACKUP"ip addr show eth0 | grep 10.0.0.10# On loglb02 — should be BACKUP, no VIP attachedjournalctl -u keepalived -n 10 --no-pager | grep -E "STATE|MASTER|BACKUP"ip addr show eth0 | grep 10.0.0.10 && echo "✗ VIP attached on BACKUP (split-brain)" || echo "✓ no VIP on BACKUP (correct)"
Expected journal lines:
loglb01 Keepalived_vrrp: (VI_LOGS) Entering BACKUP STATE (init)
loglb01 Keepalived_vrrp: (VI_LOGS) Entering MASTER STATE
loglb02 Keepalived_vrrp: (VI_LOGS) Entering BACKUP STATE (init)
(Both nodes start in BACKUP, then loglb01 promotes itself within 3 s because it sees no MASTER with higher priority)
5. Failover test
Stop HAProxy on the master and confirm the VIP migrates:
# On loglb01sudo systemctl stop haproxydate -u +%T# Check ~5 s later on loglb02ip addr show eth0 | grep 10.0.0.10 # should now show the VIPjournalctl -u keepalived -n 5 --no-pager | grep "MASTER STATE"
Expected: loglb02 enters MASTER STATE within ~5 s of the HAProxy stop, attaches 10.0.0.10/24 to eth0, sends gratuitous ARP.
Clients on the L2 segment see no interruption beyond the brief gap.
Recover:
# On loglb01sudo systemctl start haproxy# Within ~5 s, loglb01 wins the election back (priority 110 vs 100)# and the VIP migrates back. Add `nopreempt` to the loglb02 config# if you'd rather have it keep the VIP — saves one extra glitch.
Where to go next
Filebeat: the log shipper that will start pushing Beats traffic into the VIP you just provisioned.