farnetiandrea.it/logs

What is the ELK Stack?

The ELK Stack is the open-source ecosystem for centralised logging, maintained by Elastic and the broader community.

The acronym covers three components:

  • E: Elasticsearch: a distributed search engine + document database optimised for logs and time-series text events.
  • L: Logstash: an event-processing pipeline (input → filter → output).
  • K: Kibana: the web UI that queries Elasticsearch and renders dashboards.

In modern deployments a fourth piece is almost always added: Filebeat, the lightweight log shipper that lives on every source machine and pushes events into the pipeline. The combination is sometimes called the Elastic Stack to underline that Beats are first-class citizens, not an add-on.


Architecture

flowchart LR
    subgraph VPS["☁️ VPS (farnetiandrea.it)"]
        Filebeat["📦 Filebeat<br/>(native)"]
        ES["🗄️ Elasticsearch<br/>(Docker)"]
        Kibana["📊 Kibana<br/>(Docker)"]
        Nginx["🌐 nginx<br/>/logs/ → Kibana"]
    end

    subgraph LBpair["⚖️ HAProxy + Keepalived (HA pair)"]
        LB1["lb01<br/>VIP active"]
        LB2["lb02<br/>VIP standby"]
    end

    subgraph LSpool["⚙️ Logstash workers"]
        LS1["logstash01<br/>(Docker)"]
        LS2["logstash02<br/>(Docker)"]
    end

    Visitor((🌍 visitor)) -- "HTTPS /logs" --> Nginx
    Nginx --> Kibana
    Kibana --> ES

    Filebeat -- "ships to VIP:5044" --> LB1
    LB1 -. "VRRP standby" .- LB2
    LB1 -- "balanced" --> LS1
    LB1 -- "balanced" --> LS2
    LS1 -- "indexes" --> ES
    LS2 -- "indexes" --> ES

All traffic between the VPS and the 4 VMs runs on a private LAN: the VMs have no public ports exposed.

The stack I use

RoleToolWhere it runsWhat it does
Log shipperFilebeaton the VPS (native)reads /var/log/*, journalctl, Docker logs; ships to the HAProxy VIP on :5044
Load-balancingHAProxy + Keepalivedlb01 + lb02 (HA pair)active/standby VIP, TCP-mode load balance to the Logstash pool
Parsing pipelineLogstashlogstash01 + logstash02 (Docker)parses, enriches, and ships parsed events to Elasticsearch
Storage + searchElasticsearchon the VPS (Docker)indexes events, runs queries, retains data with ILM policy
VisualisationKibanaon the VPS (Docker)UI for Discover / Visualize / Dashboard; exposed publicly via nginx on /logs/

Why this topology?

Because it mirrors a real, scalable, enterprise pattern for production.

The three layers each solve one specific problem:

  • HAProxy + Keepalived HA pair: smooths traffic spikes, hides individual Logstash nodes from the shippers, lets you do rolling upgrades / restarts on Logstash without losing events, and isolates failures (a crashed LS doesn’t affect Filebeat).
  • Multiple Logstash workers: parsing is CPU-heavy. Two identical workers double the throughput, and if one dies the load balancer just stops sending events to it.
  • Single Elasticsearch: this is the only simplification, but it can work fine like this for most cases.

Note

This series walks the components in deploy order, which for a push-based pipeline like ELK runs opposite to the data flow: the consumer side has to exist before the producer has anywhere to push to!


Deployment

Here’s the whole deployment (installation + configuration of each component) from start to finish.

1. Elasticsearch setup

We start from the storage layer: without Elasticsearch nothing downstream has a place to land.

A single-node container on the VPS, persistent storage on a bind-mounted volume, daily logs-* indices rotated by ILM.

First, the host-level prerequisites:

Prerequisites

  • Docker CE installed via the official method:

    sudo install -m 0755 -d /etc/apt/keyrings
    sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
        -o /etc/apt/keyrings/docker.asc
    sudo chmod a+r /etc/apt/keyrings/docker.asc
    echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \
        https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" \
        | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    sudo apt update
    sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
  • A user with sudo and the ability to run docker compose.

  • VPS reachable on a private network at a known IP (in this guide: 10.0.0.5).

Link to the full note →

Then the kernel tweak that ElasticSearch requires and the data directory layout:

Kernel settings

IMPORTANT

Elasticsearch uses memory-mapped files heavily and refuses to start if vm.max_map_count < 262144.

This must be set on the host, not in the container.

# Apply now
sudo sysctl -w vm.max_map_count=262144
 
# Make it persist across reboots
echo 'vm.max_map_count=262144' | sudo tee /etc/sysctl.d/99-elasticsearch.conf

Verify:

sysctl vm.max_map_count
# vm.max_map_count = 262144
Link to the full note →

Directory layout

sudo mkdir -p /opt/observability-logs/es-data
sudo chown -R 1000:1000 /opt/observability-logs/es-data
cd /opt/observability-logs

UID 1000 matches the elasticsearch user inside the official image: without this, the container can’t write to the bind-mounted data directory.

Link to the full note →

We generate the elastic superuser password and write the compose file:

Generate the elastic superuser password

WARNING

Use hex-only passwords. Special characters like ! and $ are interpreted by bash!

echo "ELASTIC=$(openssl rand -hex 24)"

Copy the hex value into your password manager now (What? You don’t have one? Check this out immediately!) then write it into the .env using EOF (the quotes around 'EOF' stop bash from interpreting anything inside):

sudo tee /opt/observability-logs/.env > /dev/null <<'EOF'
ELASTIC_PASSWORD=<paste hex value here>
EOF
sudo chmod 600 /opt/observability-logs/.env
Link to the full note →

docker-compose.yml

Create /opt/observability-logs/docker-compose.yml:

A few choices worth calling out:

  • bootstrap.memory_lock=true + ulimits.memlock: -1 pins ElasticSearch JVM memory into RAM so it can’t be swapped out. ES strongly discourages swapping (latency spikes); locking memory is the cleanest fix.
  • ES_JAVA_OPTS=-Xms2g -Xmx2g sets both min and max heap to 2 GB. Min == Max is best practice for the JVM. Any real log workload needs 2 GB+ to avoid hitting the parent circuit breaker.
  • xpack.security.http.ssl.enabled=false keeps the HTTP API on plain HTTP. The private LAN is trusted in this lab, and 9200 is never publicly exposed. In a production cluster (or anywhere outside a trusted network), enable HTTP TLS.
  • Two ports lines bind the same container port to two distinct host addresses: the private LAN IP and 127.0.0.1. This makes 9200 reachable to Logstash workers (over the private LAN), and to Kibana / local curl (over localhost), but nothing else on the public internet sees it.
Link to the full note →

Bring it up and verify the cluster is online (yellow is expected on single-node: replicas can’t be allocated):

Start it and verify

cd /opt/observability-logs
sudo docker compose up -d elasticsearch

The first start pulls the ~1 GB image and warms up the cluster: give it ~60 seconds.

sudo docker compose logs -f elasticsearch
# ... [INFO ][o.e.n.Node] [elasticsearch] started

Now let’s verify the installation is working:

ELASTIC=$(sudo grep '^ELASTIC_PASSWORD=' /opt/observability-logs/.env | cut -d= -f2-)
 
# Cluster health — expect "yellow" (single-node, replicas can't be allocated)
curl -s -u "elastic:$ELASTIC" http://localhost:9200/_cluster/health | python3 -m json.tool

Expected:

{
    "cluster_name": "docker-cluster",
    "status": "yellow",
    "number_of_nodes": 1,
    "number_of_data_nodes": 1,
    ...
}

INFO

Yellow is fine here. It means primary shards are allocated but replicas can’t be (single-node can’t host replicas of its own data: that would defeat the point).

Auth sanity check:

curl -s -o /dev/null -w "%{http_code}\n" -u "elastic:wrong" http://localhost:9200/
# 401
 
curl -s -o /dev/null -w "%{http_code}\n" -u "elastic:$ELASTIC" http://localhost:9200/
# 200
Link to the full note →

To close, attach an ILM policy + index template so old logs-* indices auto-delete after 14 days:

ILM policy + index template

Logstash will write to daily indices logs-YYYY.MM.dd.

Without lifecycle management those indices accumulate forever and eventually fill the disk or hit the cluster shard limit.

Set up an ILM policy that deletes indices older than 14 days:

ELASTIC=$(sudo grep '^ELASTIC_PASSWORD=' /opt/observability-logs/.env | cut -d= -f2-)
 
# ILM policy: keep 14 days, then delete
curl -sX PUT -u "elastic:$ELASTIC" \
  -H "Content-Type: application/json" \
  http://localhost:9200/_ilm/policy/logs_policy \
  -d '{
    "policy": {
      "phases": {
        "hot": {
          "actions": {
            "rollover": { "max_age": "1d", "max_size": "10gb" }
          }
        },
        "delete": {
          "min_age": "14d",
          "actions": { "delete": {} }
        }
      }
    }
  }' && echo

And an index template that applies it to every logs-*:

# Index template — applies the policy to all logs-* indices
curl -sX PUT -u "elastic:$ELASTIC" \
  -H "Content-Type: application/json" \
  http://localhost:9200/_index_template/logs_template \
  -d '{
    "index_patterns": ["logs-*"],
    "template": {
      "settings": {
        "index.lifecycle.name": "logs_policy",
        "number_of_shards": 1,
        "number_of_replicas": 0
      }
    },
    "priority": 500
  }' && echo
Link to the full note →


2. Kibana setup

Now the UI layer.

Kibana runs as a Docker container next to Elasticsearch on the VPS, served publicly under /logs/ via nginx.

First the prereqs and the credentials Kibana needs (the built-in kibana_system service user + the saved-object encryption key):

Prerequisites

Link to the full note →

1. Generate Kibana credentials

Two values needed:

  • kibana_system password: Kibana authenticates to ES with this built-in service user. It exists in ElasticSearch from day one, but starts with an empty / random password.
  • Encryption key: Kibana encrypts certain saved objects (alerts, reports, connectors). If you don’t set a persistent key, Kibana picks a random one at every restart and previously-encrypted saved objects become unreadable.
echo "KIBANA_SYSTEM=$(openssl rand -hex 24)"
echo "ENCRYPTION_KEY=$(openssl rand -hex 32)"

Copy both into your password manager, then append them to /opt/observability-logs/.env:

sudo tee -a /opt/observability-logs/.env > /dev/null <<'EOF'
KIBANA_SYSTEM_PASSWORD=<paste hex value here>
KIBANA_ENCRYPTION_KEY=<paste hex value here>
EOF
Link to the full note →

2. Add the password of kibana_system user in ElasticSearch

cd /opt/observability-logs
ELASTIC=$(grep '^ELASTIC_PASSWORD=' .env | cut -d= -f2-)
KS=$(grep '^KIBANA_SYSTEM_PASSWORD=' .env | cut -d= -f2-)
 
curl -sX POST -u "elastic:$ELASTIC" \
  -H "Content-Type: application/json" \
  http://localhost:9200/_security/user/kibana_system/_password \
  -d "{\"password\":\"$KS\"}"
echo
 
# Verify
curl -s -o /dev/null -w "kibana_system auth: HTTP %{http_code}\n" \
  -u "kibana_system:$KS" http://localhost:9200/_cluster/health
# kibana_system auth: HTTP 200
Link to the full note →

Then the compose service, and start:

3. Add the Kibana service in docker-compose

Append the Kibana service to /opt/observability-logs/docker-compose.yml, next to the existing Elasticsearch service:

A few notes on the choices:

  • SERVER_BASEPATH=/logs + SERVER_REWRITEBASEPATH=true: Kibana itself handles the /logs prefix on every URL it emits and accepts. The reverse-proxy does not strip the prefix, it passes the path through unchanged.
  • SERVER_PUBLICBASEURL=https://farnetiandrea.it/logs: used when generating absolute URLs (sharing links, email reports, etc.). Must match the public URL exactly.
  • 127.0.0.1:5601 binding: Kibana is reachable only from the same host, then the internet talks to nginx, and nginx talks to localhost.
  • depends_on: elasticsearch: condition: service_healthy: Kibana refuses to start cleanly if ES isn’t ready. Waiting for service_healthy (not just service_started) avoids the restart loop.

Create the data dir:

sudo mkdir -p /opt/observability-logs/kibana-data
sudo chown -R 1000:1000 /opt/observability-logs/kibana-data
Link to the full note →

4. Start Kibana

cd /opt/observability-logs
sudo docker compose up -d kibana
sudo docker compose logs -f kibana

First start takes ~90 seconds (plugin setup + migrations).

Wait for:

[INFO ][status] Kibana is now available

Verify locally:

curl -s -o /dev/null -w "HTTP %{http_code}\n" http://localhost:5601/logs/api/status
# HTTP 200
Link to the full note →

Expose it publicly via nginx and verify the login screen loads:

5. Nginx reverse-proxy at /logs/

Add a location /logs/ block to your existing virtual host in Nginx.

My config file lives at /etc/nginx/sites-available/farnetiandrea.it:

INFO

No trailing slash on proxy_pass (http://127.0.0.1:5601;, not http://127.0.0.1:5601/;).

With a trailing slash, nginx strips the matched prefix /logs/ before forwarding, but Kibana expects to see /logs/ because we set SERVER_REWRITEBASEPATH=true.

Without the trailing slash, nginx passes the full URI through and Kibana handles the prefix itself.

Reload nginx:

sudo nginx -t
sudo systemctl reload nginx
Link to the full note →

6. Verify public access

Open the URL in a browser. You should see the Welcome to Elastic login form.

Log in with elastic + the password from /opt/observability-logs/.env.

After login you land on the Kibana home.

If you see a 502, check sudo docker compose ps (Kibana healthy?) and sudo docker compose logs kibana (any error?).

Link to the full note →

And finally create a Data View that points Discover at the logs-* indices Logstash will populate later:

7. Create a Data View

A Data View tells Kibana which Elasticsearch indices to expose to “Discover”, “Dashboard”, and the rest of the UI.

Without it, Discover stays empty even when data exists.

Create one as soon as you have an index pattern to point at (typically once a log producer is shipping events).

  1. Log in as elastic.
  2. Stack Management → Data Views → Create data view.
  3. Name: logs. Index pattern: logs-*. Timestamp field: @timestamp.
  4. Save data view to Kibana.

Of course if you are following this guide step-by-step, you won’t see anything, for now.

Link to the full note →

Optional: public anonymous viewer

If you want visitors to land directly on Discover without a login (the same flow as farnetiandrea.it/logs), layer the viewer-mode on top of the base Kibana deploy.

The full walkthrough lives on its own page.

Three moving parts:

  • A least-privilege ElasticSearch role
  • An anonymous ElasticSearch user
  • An anonymous auth provider in kibana.yml.

It can be done now or after the rest of the series, doesn’t matter.


3. Logstash setup

Now we add the parsing layer.

Two identical Docker workers (logstash01 at 10.0.0.21, logstash02 at 10.0.0.22) on dedicated VMs on the private LAN, each writing to the central Elasticsearch with a least-privilege user (logstash_writer).

They’re peers, not primary/secondary: running two gives us horizontal capacity and fault isolation.

First the prereqs on each VM:

1. Prerequisites on each VM

  • Ubuntu 24.04 VM on the private network.
  • Docker CE installed (same recipe as in elasticsearch-setup).
  • Network reachability from the VM to the VPS at 10.0.0.5:9200.
Link to the full note →

Then from the VPS, create the dedicated ES user so a compromised worker can only append to logs-* and nothing else:

2. Generate the logstash_writer password on the VPS

The workers don’t need ES superuser.

We’ll create a dedicated logstash_writer user, with a role scoped to writing into our log indices and nothing else.

First, generate the password on the VPS, and append it to the central .env:

# On the VPS, /opt/observability-logs/
echo "LOGSTASH_WRITER=$(openssl rand -hex 24)"
 
sudo tee -a /opt/observability-logs/.env > /dev/null <<'EOF'
LOGSTASH_WRITER_PASSWORD=<paste hex value here>
EOF
Link to the full note →

3. Create the ES role and user (on the VPS)

cd /opt/observability-logs
ELASTIC=$(grep '^ELASTIC_PASSWORD=' .env | cut -d= -f2-)
LSW=$(grep '^LOGSTASH_WRITER_PASSWORD=' .env | cut -d= -f2-)
 
# Role: write to logs indices + minimal cluster operations needed for templates/ILM
curl -sX PUT -u "elastic:$ELASTIC" \
  -H "Content-Type: application/json" \
  http://localhost:9200/_security/role/logstash_writer \
  -d '{
    "cluster": ["monitor", "manage_index_templates", "manage_ilm"],
    "indices": [{
      "names": ["filebeat-*", "logstash-*", "logs-*"],
      "privileges": ["write", "create", "create_index", "manage",
                     "view_index_metadata", "auto_configure"]
    }]
  }' && echo
 
# User: the workers will authenticate with this
curl -sX PUT -u "elastic:$ELASTIC" \
  -H "Content-Type: application/json" \
  http://localhost:9200/_security/user/logstash_writer \
  -d "{
    \"password\": \"$LSW\",
    \"roles\": [\"logstash_writer\"],
    \"full_name\": \"Logstash worker writer\"
  }" && echo

Verify:

curl -s -o /dev/null -w "HTTP %{http_code}\n" \
  -u "logstash_writer:$LSW" http://localhost:9200/_cluster/health
# HTTP 200
Link to the full note →

On each worker VM, drop the .env, the YAML config, the pipeline file, and the compose:

4. Create a .env file on each worker

On each worker VM, paste the logstash_writer password (taken from the VPS .env) into a local .env using EOF, so bash doesn’t expand anything:

cat > /opt/observability-logs/.env <<'EOF'
LOGSTASH_WRITER_PASSWORD=<paste hex value from VPS .env here>
EOF
chmod 600 /opt/observability-logs/.env

Verify the length (must be 48 hex chars):

PW=$(grep '^LOGSTASH_WRITER_PASSWORD=' /opt/observability-logs/.env | cut -d= -f2-)
echo "Length: ${#PW}"
# Length: 48
Link to the full note →

5. config/logstash.yml

# /opt/observability-logs/config/logstash.yml
http.host: "0.0.0.0"
pipeline.workers: 2
pipeline.batch.size: 125
pipeline.batch.delay: 50

http.host: 0.0.0.0 is required so the monitoring API on :9600 is reachable from outside the container (used by the healthcheck and by future Prometheus scraping).

Link to the full note →

6. pipeline/main.conf

The Logstash pipeline itself.

Super simple: Beats in, Elasticsearch out.

# /opt/observability-logs/pipeline/main.conf
input {
  beats {
    port => 5044
    client_inactivity_timeout => 3600
  }
}
 
filter {
  # Per-event parsing / enrichment goes here. Keep empty for the first
  # end-to-end test; add grok / mutate / date filters once the pipeline
  # is verified working with raw events.
}
 
output {
  elasticsearch {
    hosts    => [ "http://10.0.0.5:9200" ]
    user     => "logstash_writer"
    password => "${LOGSTASH_WRITER_PASSWORD}"
    index    => "logs-%{+YYYY.MM.dd}"
  }
}

A couple of details:

  • ${LOGSTASH_WRITER_PASSWORD} is interpolated by Logstash at startup from its environment. The variable will be injected into the container by docker-compose (next section).
  • client_inactivity_timeout => 3600: the Beats input closes idle TCP connections after this many seconds. The default (60s) is too aggressive for long-lived Filebeat connections that may sit idle between batches: 1h is a safer upper bound.
  • Daily indices (logs-%{+YYYY.MM.dd}): easy to roll, easy to delete with ILM. One day per index means a mapping conflict is contained to a single day.
Link to the full note →

7. docker-compose.yml

IMPORTANT

Keep the pipeline/ directory containing only the pipeline file you intend to load.

Backup copies (main.conf.bak) placed in the same folder will be loaded as additional pipelines and cause Address already in use on port 5044. Store backups outside this directory.

Link to the full note →

Bring it up and watch for the “pipeline started” line:

8. Start it

cd /opt/observability-logs
sudo docker compose up -d
 
# Wait ~60s for Logstash to come up, then tail the logs
sudo docker compose logs -f logstash

Look for:

[INFO ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://logstash_writer:xxxxxx@10.0.0.5:9200/"}
[INFO ][logstash.javapipeline ][main] Pipeline started {"pipeline.id"=>"main"}
[INFO ][logstash.agent          ] Successfully started Logstash API endpoint {:port=>9600, :ssl_enabled=>false}

If you see Got response code '401' contacting Elasticsearch, the password in .env doesn’t match the one in ES: re-check it from the VPS .env.

Link to the full note →

And repeat the exact same recipe on the second VM: same image, same .env, same pipeline:

9. Repeat on logstash02

Steps are identical on the second VM.

Same image, same .env (same password), same pipeline, same ports.

Link to the full note →


4. HAProxy + Keepalived setup

The Logstash workers are up, but Filebeat shouldn’t talk to them directly: we want a single, highly-available endpoint in front.

Enter the LB pair: two HAProxy instances on loglb01 and loglb02, sharing a Virtual IP managed by Keepalived.

HAProxy on both LBs

Drop the same config on both loglb01 and loglb02, install and bring up the service:

1. The configuration file

/etc/haproxy/haproxy.cfgidentical on both loglb01 and loglb02:

  • global block: standard daemon setup. Log forwarding to syslog via /dev/log (UNIX socket).
  • defaultsmode tcp: the entire instance defaults to TCP. The stats frontend later overrides this to mode http for the dashboard.
  • defaultstimeout client/server/tunnel 24h: the critical setting for long-lived Beats connections. With the package default of 1m you’d see Filebeat reconnect every minute, polluting the logs.
  • frontend stats on :8404: the HTML/CSV stats UI. In TCP mode it would be opaque, so this single frontend overrides to mode http.
  • frontend beats_in on :5044: the actual Beats receiver. option tcplog ensures each session is logged with byte counts and a state code on close.
  • backend logstash_pool:
    • balance roundrobin distributes connections one-by-one across logstash01 and logstash02.
    • option tcp-check says “probe by opening a TCP socket”: no Beats handshake, just a connect.
    • default-server inter 5s fall 3 rise 2 applies to both server lines: probe cadence is 5 s, mark DOWN after three failures (15 s), mark UP after two successes (10 s).
Link to the full note →

2. Install and bring it up

# On both loglb01 and loglb02
sudo apt update
sudo apt install -y haproxy
 
# Drop the config above into /etc/haproxy/haproxy.cfg
sudo nano /etc/haproxy/haproxy.cfg
 
# Validate before applying — never skip this
sudo haproxy -c -f /etc/haproxy/haproxy.cfg
# Output: "Configuration file is valid"
 
# Enable + start
sudo systemctl enable --now haproxy
sudo systemctl status haproxy --no-pager | head -5
Link to the full note →

Then verify the pool from any host on the private LAN (both backends should show UP):

3. Verify the pool

From any host on the private network:

curl -s "http://10.0.0.11:8404/;csv" | awk -F',' '
  NR==1 { print "frontend/backend  svname        scur  bin       bout      status"; next }
  $1=="beats_in" || $1=="logstash_pool" {
    printf "%-17s %-13s %-5s %-9s %-9s %s\n", $1, $2, $5, $9, $10, $18
  }'

Expected output once Logstash workers are up:

frontend/backend  svname        scur  bin       bout      status
beats_in          FRONTEND      0     0         0         OPEN
logstash_pool     logstash01    0     0         0         UP
logstash_pool     logstash02    0     0         0         UP
logstash_pool     BACKEND       0     0         0         UP

As Filebeat clients connect, scur (current sessions) and bin/bout (byte counters) start moving. If a Logstash worker dies, its row flips to DOWN within ~15 s and scur redistributes to the survivors.

Link to the full note →

Keepalived for the VIP

Now we give the pair a shared 10.0.0.10 VIP.

The two configs are nearly identical, with state MASTER + priority 110 on loglb01 and state BACKUP + priority 100 on loglb02:

1. Master config: loglb01

/etc/keepalived/keepalived.conf on loglb01:

Link to the full note →

2. Backup config: loglb02

/etc/keepalived/keepalived.conf on loglb02.

Identical to loglb01 with two changes: state BACKUP, priority 100, and router_id loglb02.

Line-by-line commentary:

  • global_defs: enable_script_security + script_user root are required for vrrp_script to execute under modern Keepalived. router_id is a string used only for logging: useful to distinguish which host is logging what.
  • vrrp_script chk_haproxy: checks the local process every 2 s. weight -20 says “subtract 20 from priority while the check fails”. With master = 110 and backup = 100, a HAProxy crash on the master lowers it to 90 → backup (100) wins.
  • state MASTER / BACKUP: just the initial role, doesn’t actually decide who wins.
  • virtual_router_id 51: chosen arbitrarily, must match on both peers, must not collide with any other VRRP group on this L2.
  • priority 110 / 100: the real election weights. 10-point gap leaves headroom for the -20 track-script adjustment.
  • auth_pass: same 8-char value on both peers. Set with openssl rand -hex 4 to get a clean ASCII secret.
  • virtual_ipaddress 10.0.0.10/24: the VIP. Note the /24 matches the LAN’s CIDR: important for ARP resolution to work correctly.
  • track_script { chk_haproxy }: ties the health-check to the priority adjustment.
Link to the full note →

Install + bring it up on both nodes:

3. Install and bring it up

# Both loglb01 and loglb02
sudo apt update
sudo apt install -y keepalived
 
# Drop the config above into /etc/keepalived/keepalived.conf
sudo nano /etc/keepalived/keepalived.conf
sudo chmod 600 /etc/keepalived/keepalived.conf
 
# Validate before applying
sudo keepalived -t -f /etc/keepalived/keepalived.conf
# Silent output = valid
 
# Enable + start
sudo systemctl enable --now keepalived
sleep 3
sudo systemctl status keepalived --no-pager | head -6
Link to the full note →

Verify the election (loglb01 wins, owns the VIP) and then test failover by stopping HAProxy on the master, the VIP should migrate to loglb02 within ~5 seconds:

4. Verify election and VIP ownership

# On loglb01 — should be MASTER, owns the VIP
journalctl -u keepalived -n 10 --no-pager | grep -E "STATE|MASTER|BACKUP"
ip addr show eth0 | grep 10.0.0.10
 
# On loglb02 — should be BACKUP, no VIP attached
journalctl -u keepalived -n 10 --no-pager | grep -E "STATE|MASTER|BACKUP"
ip addr show eth0 | grep 10.0.0.10 && echo "✗ VIP attached on BACKUP (split-brain)" || echo "✓ no VIP on BACKUP (correct)"

Expected journal lines:

loglb01 Keepalived_vrrp: (VI_LOGS) Entering BACKUP STATE (init)
loglb01 Keepalived_vrrp: (VI_LOGS) Entering MASTER STATE
loglb02 Keepalived_vrrp: (VI_LOGS) Entering BACKUP STATE (init)

(Both nodes start in BACKUP, then loglb01 promotes itself within 3 s because it sees no MASTER with higher priority)

Link to the full note →

5. Failover test

Stop HAProxy on the master and confirm the VIP migrates:

# On loglb01
sudo systemctl stop haproxy
date -u +%T
 
# Check ~5 s later on loglb02
ip addr show eth0 | grep 10.0.0.10   # should now show the VIP
journalctl -u keepalived -n 5 --no-pager | grep "MASTER STATE"

Expected: loglb02 enters MASTER STATE within ~5 s of the HAProxy stop, attaches 10.0.0.10/24 to eth0, sends gratuitous ARP.

Clients on the L2 segment see no interruption beyond the brief gap.

Recover:

# On loglb01
sudo systemctl start haproxy
 
# Within ~5 s, loglb01 wins the election back (priority 110 vs 100)
# and the VIP migrates back. Add `nopreempt` to the loglb02 config
# if you'd rather have it keep the VIP — saves one extra glitch.
Link to the full note →


5. Filebeat setup

Last step!

Now that the entire consumer side is up and reachable through the VIP, we install Filebeat on the source host (in this lab: the VPS).

The agent runs natively via apt: no container, since shipping logs from a single host has trivial filesystem and journald access requirements.

First the prereqs and install from the official Elastic 8.x APT repo:

Prerequisites

Filebeat pushes logs somewhere, so the consumer side has to exist first.

If you’ve been following the series in the correct order, then everything below is already deployed:

Link to the full note →

1. Install Filebeat from the official repo

# Elastic GPG key
sudo install -m 0755 -d /etc/apt/keyrings
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch \
  | sudo gpg --dearmor -o /etc/apt/keyrings/elastic.gpg
sudo chmod a+r /etc/apt/keyrings/elastic.gpg
 
# Elastic 8.x APT repo
echo "deb [signed-by=/etc/apt/keyrings/elastic.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" \
  | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
 
sudo apt update
sudo apt install -y filebeat
 
# Verify
filebeat version
# filebeat version 8.x.x (amd64), libbeat 8.x.x [...]

IMPORTANT

Filebeat and Elasticsearch don’t have to match versions exactly.

Elastic guarantees forward compatibility within a major version, so a newer Filebeat against an older Elasticsearch is supported… BUT the reverse (older Filebeat → newer ES) is not.

Link to the full note →

Then the config, pointing output.logstash at the HAProxy VIP at 10.0.0.10:5044:

2. /etc/filebeat/filebeat.yml

The shipped default config is heavy with disabled modules.

Replace it cleanly:

A few notes on each section:

  • fields_under_root: true promotes the custom log_source to a top-level field instead of nesting it under fields.log_source. Makes Kibana queries cleaner (log_source:syslog vs fields.log_source:syslog).
  • Autodiscover with docker provider + hints.enabled: true auto-attaches to every running container, reads its stdout/stderr from /var/lib/docker/containers/<id>/*.log.
  • add_host_metadata enriches every event with the hostname, OS, architecture. Crucial later when you scale beyond one shipper.
  • add_docker_metadata does the same for Docker events: container name, image, labels.

WARNING

Always disable setup.template, setup.ilm, and setup.dashboards when shipping to Logstash. Otherwise Filebeat tries to reach Elasticsearch directly also, fails because it has no ES credentials, and prints a stream of warnings at every startup.

They’re harmless, but they confuse a real diagnosis.

Link to the full note →

Sanity-check the config, then enable and start the service:

3. Sanity-check the config

# YAML / structure validation
sudo filebeat test config
 
# Connectivity to the configured output (TCP reach + protocol handshake)
sudo filebeat test output

Expected output for the second one:

logstash: 10.0.0.10:5044...
  connection...
    parse host... OK
    dns lookup... OK
    addresses: 10.0.0.10
    dial up... OK
  TLS... WARN secure connection disabled
  talk to server... OK

(TLS warning is expected: we’re on a private network, and Logstash’s Beats input is not TLS-enabled in this lab)

If the dial-up step fails, the path between Filebeat and the VIP is broken.

Quick reachability checks:

ping -c 2 10.0.0.10
nc -zv 10.0.0.10 5044
Link to the full note →

4. Enable and start

sudo systemctl enable filebeat
sudo systemctl start filebeat
sudo systemctl status filebeat --no-pager

Then watch the agent’s own log:

sudo journalctl -u filebeat -n 30 --no-pager

What you want to see:

Connecting to backoff(async(tcp://10.0.0.10:5044))
Connection to backoff(...) established
Link to the full note →

And finally verify end-to-end from Elasticsearch.

The moment we see logs-* indices growing, the entire pipeline is alive:

5. Verify end-to-end from Elasticsearch

The pipeline is Filebeat → HAProxy → Logstash → Elasticsearch.

We can confirm each hop without touching Kibana:

ELASTIC=$(sudo grep '^ELASTIC_PASSWORD=' /opt/observability-logs/.env | cut -d= -f2-)
 
# Index showing up?
curl -s -u "elastic:$ELASTIC" "http://localhost:9200/_cat/indices?v" | grep "logs-"
 
# Event count
curl -s -u "elastic:$ELASTIC" "http://localhost:9200/logs-*/_count" | python3 -m json.tool
 
# Peek at a recent document to see fields
curl -s -u "elastic:$ELASTIC" "http://localhost:9200/logs-*/_search?size=1&sort=@timestamp:desc" \
  | python3 -m json.tool | head -40

What to look for in a sample document:

  • agent.type: "filebeat": confirms the producer
  • agent.version: "8.x.x": Filebeat version
  • host.hostname: "<your-host>": host metadata processor
  • log_source: "syslog" (or "journald"): the custom field we added per input
  • event.original: "May 22 10:27:19 ... systemd[1]: ...": the raw log line

If the count is positive and grows each time you re-run it, Filebeat → Logstash → ES is working end to end!

INFO

To see this data in Kibana Discover, you need a Data View pointed at logs-*.

Create it from Stack Management as described in the Kibana setup.

Link to the full note →

And that’s it!

We have a complete, scalable ELK stack mirroring the shape used in real production environments: five hosts, six components, end-to-end log pipeline from journalctl to Discover.

Congratulations!

To keep the stack healthy long-term, I leave you with the closing thoughts:

Final considerations

You’re done.

Congratulations!

If you’ve followed the series from the start, you now have an end-to-end log pipeline running on five hosts:

  • Elasticsearch indexing on the VPS
  • Kibana serving the dashboard (with anonymous read-only viewing)
  • Two Logstash workers parsing in parallel
  • Two HAProxy + Keepalived fronting the workers, with a single highly-available VIP
  • Filebeat pushing syslog / journald / Docker logs into that VIP.

Smaller than what you’d run in production, but pretty identical in shape.

General pipeline monitoring

A log pipeline that doesn’t tell you when it itself is broken, is half-built.

The five-minute health round:

ComponentQuick checkLooking for
Elasticsearchcurl :9200/_cluster/healthstatus: yellow (lab) / green (cluster)
Kibanacurl :5601/logs/api/statusHTTP 200, overall.level: available
Logstash workercurl :9600status: green, events.in rising
HAProxycurl :8404/;csvboth Logstash backends UP, scur > 0
Keepalivedip addr show | grep 10.0.0.10VIP on MASTER, absent on BACKUP
Filebeatjournalctl -u filebeat -n 20Connection ... established, no dial errors

From a lab to real production

For real production scenarios, consider:

What changes when you have 2000+ machines

  • backing up the ES data periodically: The ILM policy from the Elasticsearch setup only deletes old indices, but it doesn’t back anything up, and if the VPS disk dies, the logs go with it. The native answer is an ES Backup Repository, pointed at S3 or a separate volume.
  • More Logstash workers behind the same LB pair: HAProxy’s balance roundrobin scales horizontally for free until you saturate the LB itself.
  • Multiple LB pairs geographically distributed, often with DNS round-robin in front, when one VIP can’t handle the throughput anymore.
  • A Kafka cluster between Beats and Logstash as a buffer: absorbs traffic spikes that even a HA-LB can’t smooth out, and decouples producers from consumers (LS can be down for maintenance and no events are lost).
  • An Elasticsearch cluster with separate node types: 3+ master, 5-20+ data, 2-4 ingest, 2-4 coordinator. This is where the real bottleneck lives (indexing throughput, shard count, JVM heap pressure).
  • Multi-tenant Kibana spaces so different teams can have their own dashboards, saved searches, and role-based access on the same ES backend.
Link to the full note →

What to do next

Great, you’ve successfully implemented a working log system… but do you have a metrics system as well?

If the answer is no well, you’ve got a new project to work on!