My ELK Stack: how to centralise logs from your VMs!

What is the ELK Stack?

The ELK Stack is the open-source ecosystem for centralised logging, maintained by Elastic and the broader community.

The acronym covers three components:

E: Elasticsearch: a distributed search engine + document database optimised for logs and time-series text events.
L: Logstash: an event-processing pipeline (input → filter → output).
K: Kibana: the web UI that queries Elasticsearch and renders dashboards.

In modern deployments a fourth piece is almost always added: Filebeat, the lightweight log shipper that lives on every source machine and pushes events into the pipeline. The combination is sometimes called the Elastic Stack to underline that Beats are first-class citizens, not an add-on.

Architecture

flowchart LR
    subgraph VPS["☁️ VPS (farnetiandrea.it)"]
        Filebeat["📦 Filebeat<br/>(native)"]
        ES["🗄️ Elasticsearch<br/>(Docker)"]
        Kibana["📊 Kibana<br/>(Docker)"]
        Nginx["🌐 nginx<br/>/logs/ → Kibana"]
    end

    subgraph LBpair["⚖️ HAProxy + Keepalived (HA pair)"]
        LB1["lb01<br/>VIP active"]
        LB2["lb02<br/>VIP standby"]
    end

    subgraph LSpool["⚙️ Logstash workers"]
        LS1["logstash01<br/>(Docker)"]
        LS2["logstash02<br/>(Docker)"]
    end

    Visitor((🌍 visitor)) -- "HTTPS /logs" --> Nginx
    Nginx --> Kibana
    Kibana --> ES

    Filebeat -- "ships to VIP:5044" --> LB1
    LB1 -. "VRRP standby" .- LB2
    LB1 -- "balanced" --> LS1
    LB1 -- "balanced" --> LS2
    LS1 -- "indexes" --> ES
    LS2 -- "indexes" --> ES

All traffic between the VPS and the 4 VMs runs on a private LAN: the VMs have no public ports exposed.

The stack I use

Role	Tool	Where it runs	What it does
Log shipper	Filebeat	on the VPS (native)	reads `/var/log/*`, `journalctl`, Docker logs; ships to the HAProxy VIP on `:5044`
Load-balancing	HAProxy + Keepalived	lb01 + lb02 (HA pair)	active/standby VIP, TCP-mode load balance to the Logstash pool
Parsing pipeline	Logstash	logstash01 + logstash02 (Docker)	parses, enriches, and ships parsed events to Elasticsearch
Storage + search	Elasticsearch	on the VPS (Docker)	indexes events, runs queries, retains data with ILM policy
Visualisation	Kibana	on the VPS (Docker)	UI for Discover / Visualize / Dashboard; exposed publicly via nginx on `/logs/`

Why this topology?

Because it mirrors a real, scalable, enterprise pattern for production.

The three layers each solve one specific problem:

HAProxy + Keepalived HA pair: smooths traffic spikes, hides individual Logstash nodes from the shippers, lets you do rolling upgrades / restarts on Logstash without losing events, and isolates failures (a crashed LS doesn’t affect Filebeat).
Multiple Logstash workers: parsing is CPU-heavy. Two identical workers double the throughput, and if one dies the load balancer just stops sending events to it.
Single Elasticsearch: this is the only simplification, but it can work fine like this for most cases.

Note

This series walks the components in deploy order, which for a push-based pipeline like ELK runs opposite to the data flow: the consumer side has to exist before the producer has anywhere to push to!

Deployment

Here’s the whole deployment (installation + configuration of each component) from start to finish.

1. Elasticsearch setup

We start from the storage layer: without Elasticsearch nothing downstream has a place to land.

A single-node container on the VPS, persistent storage on a bind-mounted volume, daily logs-* indices rotated by ILM.

First, the host-level prerequisites:

Prerequisites

Docker CE installed via the official method:

sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
    -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \
    https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" \
    | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

A user with sudo and the ability to run docker compose.
VPS reachable on a private network at a known IP (in this guide: 10.0.0.5).

Link to the full note →

Then the kernel tweak that ElasticSearch requires and the data directory layout:

Kernel settings

IMPORTANT

Elasticsearch uses memory-mapped files heavily and refuses to start if vm.max_map_count < 262144.

This must be set on the host, not in the container.
# Apply now
sudo sysctl -w vm.max_map_count=262144
 
# Make it persist across reboots
echo 'vm.max_map_count=262144' | sudo tee /etc/sysctl.d/99-elasticsearch.conf
Verify:
sysctl vm.max_map_count
# vm.max_map_count = 262144
Link to the full note →

Directory layout
sudo mkdir -p /opt/observability-logs/es-data
sudo chown -R 1000:1000 /opt/observability-logs/es-data
cd /opt/observability-logs
UID 1000 matches the elasticsearch user inside the official image: without this, the container can’t write to the bind-mounted data directory.
Link to the full note →

We generate the elastic superuser password and write the compose file:

Generate the elastic superuser password

WARNING

Use hex-only passwords. Special characters like ! and $ are interpreted by bash!
echo "ELASTIC=$(openssl rand -hex 24)"
Copy the hex value into your password manager now (What? You don’t have one? Check this out immediately!) then write it into the .env using EOF (the quotes around 'EOF' stop bash from interpreting anything inside):
sudo tee /opt/observability-logs/.env > /dev/null <<'EOF'
ELASTIC_PASSWORD=<paste hex value here>
EOF
sudo chmod 600 /opt/observability-logs/.env
Link to the full note →

docker-compose.yml

Create /opt/observability-logs/docker-compose.yml:
Example: my docker-compose (ElasticSearch)
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.15.0
    container_name: elasticsearch
    user: "1000:1000"
    environment:
      - discovery.type=single-node
      - ES_JAVA_OPTS=-Xms2g -Xmx2g
      - bootstrap.memory_lock=true
      # Security ON, but TLS OFF on the HTTP layer.
      # Lab simplification: traffic stays on the private LAN.
      # In production this is non-negotiable — enable HTTPS.
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=false
      - xpack.security.transport.ssl.enabled=false
      - xpack.security.enrollment.enabled=false
      # Pre-set the password of the built-in `elastic` superuser so we don't
      # have to fish it out of the first-boot logs.
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
    ulimits:
      memlock: { soft: -1, hard: -1 }
      nofile: { soft: 65536, hard: 65536 }
    volumes:
      - /opt/observability-logs/es-data:/usr/share/elasticsearch/data
    ports:
      # Private LAN IP — for Logstash workers on dedicated VMs
      - "10.0.0.5:9200:9200"
      # Localhost — for Kibana (same host) and quick curl checks
      - "127.0.0.1:9200:9200"
    networks:
      - elk-net
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "curl -fsS -u elastic:${ELASTIC_PASSWORD} http://localhost:9200/_cluster/health || exit 1"]
      interval: 30s
      timeout: 5s
      retries: 5
 
networks:
  elk-net:
    driver: bridge
A few choices worth calling out:

bootstrap.memory_lock=true + ulimits.memlock: -1 pins ElasticSearch JVM memory into RAM so it can’t be swapped out. ES strongly discourages swapping (latency spikes); locking memory is the cleanest fix.

ES_JAVA_OPTS=-Xms2g -Xmx2g sets both min and max heap to 2 GB. Min == Max is best practice for the JVM. Any real log workload needs 2 GB+ to avoid hitting the parent circuit breaker.

xpack.security.http.ssl.enabled=false keeps the HTTP API on plain HTTP. The private LAN is trusted in this lab, and 9200 is never publicly exposed. In a production cluster (or anywhere outside a trusted network), enable HTTP TLS.

Two ports lines bind the same container port to two distinct host addresses: the private LAN IP and 127.0.0.1. This makes 9200 reachable to Logstash workers (over the private LAN), and to Kibana / local curl (over localhost), but nothing else on the public internet sees it.

Link to the full note →

Bring it up and verify the cluster is online (yellow is expected on single-node: replicas can’t be allocated):

Start it and verify

cd /opt/observability-logs
sudo docker compose up -d elasticsearch

The first start pulls the ~1 GB image and warms up the cluster: give it ~60 seconds.

sudo docker compose logs -f elasticsearch
# ... [INFO ][o.e.n.Node] [elasticsearch] started

Now let’s verify the installation is working:

ELASTIC=$(sudo grep '^ELASTIC_PASSWORD=' /opt/observability-logs/.env | cut -d= -f2-)
 
# Cluster health — expect "yellow" (single-node, replicas can't be allocated)
curl -s -u "elastic:$ELASTIC" http://localhost:9200/_cluster/health | python3 -m json.tool

Expected:

{
    "cluster_name": "docker-cluster",
    "status": "yellow",
    "number_of_nodes": 1,
    "number_of_data_nodes": 1,
    ...
}

INFO

Yellow is fine here. It means primary shards are allocated but replicas can’t be (single-node can’t host replicas of its own data: that would defeat the point).

Auth sanity check:

curl -s -o /dev/null -w "%{http_code}\n" -u "elastic:wrong" http://localhost:9200/
# 401
 
curl -s -o /dev/null -w "%{http_code}\n" -u "elastic:$ELASTIC" http://localhost:9200/
# 200

Link to the full note →

To close, attach an ILM policy + index template so old logs-* indices auto-delete after 14 days:

ILM policy + index template

Logstash will write to daily indices logs-YYYY.MM.dd.

Without lifecycle management those indices accumulate forever and eventually fill the disk or hit the cluster shard limit.

Set up an ILM policy that deletes indices older than 14 days:

ELASTIC=$(sudo grep '^ELASTIC_PASSWORD=' /opt/observability-logs/.env | cut -d= -f2-)
 
# ILM policy: keep 14 days, then delete
curl -sX PUT -u "elastic:$ELASTIC" \
  -H "Content-Type: application/json" \
  http://localhost:9200/_ilm/policy/logs_policy \
  -d '{
    "policy": {
      "phases": {
        "hot": {
          "actions": {
            "rollover": { "max_age": "1d", "max_size": "10gb" }
          }
        },
        "delete": {
          "min_age": "14d",
          "actions": { "delete": {} }
        }
      }
    }
  }' && echo

And an index template that applies it to every logs-*:

# Index template — applies the policy to all logs-* indices
curl -sX PUT -u "elastic:$ELASTIC" \
  -H "Content-Type: application/json" \
  http://localhost:9200/_index_template/logs_template \
  -d '{
    "index_patterns": ["logs-*"],
    "template": {
      "settings": {
        "index.lifecycle.name": "logs_policy",
        "number_of_shards": 1,
        "number_of_replicas": 0
      }
    },
    "priority": 500
  }' && echo

Link to the full note →

2. Kibana setup

Now the UI layer.

Kibana runs as a Docker container next to Elasticsearch on the VPS, served publicly under /logs/ via nginx.

First the prereqs and the credentials Kibana needs (the built-in kibana_system service user + the saved-object encryption key):

Prerequisites

Elasticsearch already up (see its dedicated page).

Nginx already running on your VPS.

Link to the full note →

1. Generate Kibana credentials

Two values needed:

kibana_system password: Kibana authenticates to ES with this built-in service user. It exists in ElasticSearch from day one, but starts with an empty / random password.

Encryption key: Kibana encrypts certain saved objects (alerts, reports, connectors). If you don’t set a persistent key, Kibana picks a random one at every restart and previously-encrypted saved objects become unreadable.
echo "KIBANA_SYSTEM=$(openssl rand -hex 24)"
echo "ENCRYPTION_KEY=$(openssl rand -hex 32)"
Copy both into your password manager, then append them to /opt/observability-logs/.env:
sudo tee -a /opt/observability-logs/.env > /dev/null <<'EOF'
KIBANA_SYSTEM_PASSWORD=<paste hex value here>
KIBANA_ENCRYPTION_KEY=<paste hex value here>
EOF
Link to the full note →

2. Add the password of kibana_system user in ElasticSearch

cd /opt/observability-logs
ELASTIC=$(grep '^ELASTIC_PASSWORD=' .env | cut -d= -f2-)
KS=$(grep '^KIBANA_SYSTEM_PASSWORD=' .env | cut -d= -f2-)
 
curl -sX POST -u "elastic:$ELASTIC" \
  -H "Content-Type: application/json" \
  http://localhost:9200/_security/user/kibana_system/_password \
  -d "{\"password\":\"$KS\"}"
echo
 
# Verify
curl -s -o /dev/null -w "kibana_system auth: HTTP %{http_code}\n" \
  -u "kibana_system:$KS" http://localhost:9200/_cluster/health
# kibana_system auth: HTTP 200

Link to the full note →

Then the compose service, and start:

3. Add the Kibana service in docker-compose

Append the Kibana service to /opt/observability-logs/docker-compose.yml, next to the existing Elasticsearch service:
Example: my docker-compose (kibana service)
  kibana:
    image: docker.elastic.co/kibana/kibana:8.15.0
    container_name: kibana
    user: "1000:1000"
    depends_on:
      elasticsearch:
        condition: service_healthy
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
      # Kibana authenticates to ES with the built-in kibana_system user
      - ELASTICSEARCH_USERNAME=kibana_system
      - ELASTICSEARCH_PASSWORD=${KIBANA_SYSTEM_PASSWORD}
      # Sub-path config — Kibana is served under farnetiandrea.it/logs
      - SERVER_BASEPATH=/logs
      - SERVER_REWRITEBASEPATH=true
      - SERVER_PUBLICBASEURL=https://farnetiandrea.it/logs
      - TELEMETRY_OPTIN=false
      # Persistent key for saved-object encryption (no random key per restart)
      - XPACK_ENCRYPTEDSAVEDOBJECTS_ENCRYPTIONKEY=${KIBANA_ENCRYPTION_KEY}
    volumes:
      - /opt/observability-logs/kibana-data:/usr/share/kibana/data
    ports:
      # Localhost only — nginx proxies to this port
      - "127.0.0.1:5601:5601"
    networks:
      - elk-net
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "curl -fsS http://localhost:5601/logs/api/status || exit 1"]
      interval: 30s
      timeout: 5s
      retries: 5
A few notes on the choices:

SERVER_BASEPATH=/logs + SERVER_REWRITEBASEPATH=true: Kibana itself handles the /logs prefix on every URL it emits and accepts. The reverse-proxy does not strip the prefix, it passes the path through unchanged.

SERVER_PUBLICBASEURL=https://farnetiandrea.it/logs: used when generating absolute URLs (sharing links, email reports, etc.). Must match the public URL exactly.

127.0.0.1:5601 binding: Kibana is reachable only from the same host, then the internet talks to nginx, and nginx talks to localhost.

depends_on: elasticsearch: condition: service_healthy: Kibana refuses to start cleanly if ES isn’t ready. Waiting for service_healthy (not just service_started) avoids the restart loop.

Create the data dir:
sudo mkdir -p /opt/observability-logs/kibana-data
sudo chown -R 1000:1000 /opt/observability-logs/kibana-data
Link to the full note →

4. Start Kibana
cd /opt/observability-logs
sudo docker compose up -d kibana
sudo docker compose logs -f kibana
First start takes ~90 seconds (plugin setup + migrations).

Wait for:
[INFO ][status] Kibana is now available
Verify locally:
curl -s -o /dev/null -w "HTTP %{http_code}\n" http://localhost:5601/logs/api/status
# HTTP 200
Link to the full note →

Expose it publicly via nginx and verify the login screen loads:

5. Nginx reverse-proxy at /logs/

Add a location /logs/ block to your existing virtual host in Nginx.

My config file lives at /etc/nginx/sites-available/farnetiandrea.it:
Example: my nginx config (Kibana location block)
# Kibana (ELK Stack) at /logs/
location /logs/ {
    # No trailing slash on proxy_pass: keep /logs in the request.
    # Kibana strips it internally because SERVER_REWRITEBASEPATH=true.
    proxy_pass http://127.0.0.1:5601;
 
    proxy_set_header Host              $host;
    proxy_set_header X-Real-IP         $remote_addr;
    proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
 
    # Kibana uses WebSockets for live updates and Discover tail
    proxy_http_version 1.1;
    proxy_set_header Upgrade    $http_upgrade;
    proxy_set_header Connection "upgrade";
 
    # Discover queries and dashboard panels can be slow on big indices
    proxy_read_timeout 90s;
    proxy_send_timeout 90s;
}
INFO

No trailing slash on proxy_pass (http://127.0.0.1:5601;, not http://127.0.0.1:5601/;).

With a trailing slash, nginx strips the matched prefix /logs/ before forwarding, but Kibana expects to see /logs/ because we set SERVER_REWRITEBASEPATH=true.

Without the trailing slash, nginx passes the full URI through and Kibana handles the prefix itself.

Reload nginx:
sudo nginx -t
sudo systemctl reload nginx
Link to the full note →

6. Verify public access

Open the URL in a browser. You should see the Welcome to Elastic login form.

Log in with elastic + the password from /opt/observability-logs/.env.

After login you land on the Kibana home.

If you see a 502, check sudo docker compose ps (Kibana healthy?) and sudo docker compose logs kibana (any error?).
Link to the full note →

And finally create a Data View that points Discover at the logs-* indices Logstash will populate later:

7. Create a Data View

A Data View tells Kibana which Elasticsearch indices to expose to “Discover”, “Dashboard”, and the rest of the UI.

Without it, Discover stays empty even when data exists.

Create one as soon as you have an index pattern to point at (typically once a log producer is shipping events).

Log in as elastic.

Stack Management → Data Views → Create data view.

Name: logs. Index pattern: logs-*. Timestamp field: @timestamp.

Save data view to Kibana.

Of course if you are following this guide step-by-step, you won’t see anything, for now.
Link to the full note →

Optional: public anonymous viewer

If you want visitors to land directly on Discover without a login (the same flow as farnetiandrea.it/logs), layer the viewer-mode on top of the base Kibana deploy.

The full walkthrough lives on its own page.

Three moving parts:

A least-privilege ElasticSearch role
An anonymous ElasticSearch user
An anonymous auth provider in kibana.yml.

It can be done now or after the rest of the series, doesn’t matter.

3. Logstash setup

Now we add the parsing layer.

Two identical Docker workers (logstash01 at 10.0.0.21, logstash02 at 10.0.0.22) on dedicated VMs on the private LAN, each writing to the central Elasticsearch with a least-privilege user (logstash_writer).

They’re peers, not primary/secondary: running two gives us horizontal capacity and fault isolation.

First the prereqs on each VM:

1. Prerequisites on each VM

Ubuntu 24.04 VM on the private network.

Docker CE installed (same recipe as in elasticsearch-setup).

Network reachability from the VM to the VPS at 10.0.0.5:9200.

Link to the full note →

Then from the VPS, create the dedicated ES user so a compromised worker can only append to logs-* and nothing else:

2. Generate the logstash_writer password on the VPS

The workers don’t need ES superuser.

We’ll create a dedicated logstash_writer user, with a role scoped to writing into our log indices and nothing else.

First, generate the password on the VPS, and append it to the central .env:
# On the VPS, /opt/observability-logs/
echo "LOGSTASH_WRITER=$(openssl rand -hex 24)"
 
sudo tee -a /opt/observability-logs/.env > /dev/null <<'EOF'
LOGSTASH_WRITER_PASSWORD=<paste hex value here>
EOF
Link to the full note →

3. Create the ES role and user (on the VPS)

cd /opt/observability-logs
ELASTIC=$(grep '^ELASTIC_PASSWORD=' .env | cut -d= -f2-)
LSW=$(grep '^LOGSTASH_WRITER_PASSWORD=' .env | cut -d= -f2-)
 
# Role: write to logs indices + minimal cluster operations needed for templates/ILM
curl -sX PUT -u "elastic:$ELASTIC" \
  -H "Content-Type: application/json" \
  http://localhost:9200/_security/role/logstash_writer \
  -d '{
    "cluster": ["monitor", "manage_index_templates", "manage_ilm"],
    "indices": [{
      "names": ["filebeat-*", "logstash-*", "logs-*"],
      "privileges": ["write", "create", "create_index", "manage",
                     "view_index_metadata", "auto_configure"]
    }]
  }' && echo
 
# User: the workers will authenticate with this
curl -sX PUT -u "elastic:$ELASTIC" \
  -H "Content-Type: application/json" \
  http://localhost:9200/_security/user/logstash_writer \
  -d "{
    \"password\": \"$LSW\",
    \"roles\": [\"logstash_writer\"],
    \"full_name\": \"Logstash worker writer\"
  }" && echo

Verify:

curl -s -o /dev/null -w "HTTP %{http_code}\n" \
  -u "logstash_writer:$LSW" http://localhost:9200/_cluster/health
# HTTP 200

Link to the full note →

On each worker VM, drop the .env, the YAML config, the pipeline file, and the compose:

4. Create a .env file on each worker

On each worker VM, paste the logstash_writer password (taken from the VPS .env) into a local .env using EOF, so bash doesn’t expand anything:
cat > /opt/observability-logs/.env <<'EOF'
LOGSTASH_WRITER_PASSWORD=<paste hex value from VPS .env here>
EOF
chmod 600 /opt/observability-logs/.env
Verify the length (must be 48 hex chars):
PW=$(grep '^LOGSTASH_WRITER_PASSWORD=' /opt/observability-logs/.env | cut -d= -f2-)
echo "Length: ${#PW}"
# Length: 48
Link to the full note →

5. config/logstash.yml
# /opt/observability-logs/config/logstash.yml
http.host: "0.0.0.0"
pipeline.workers: 2
pipeline.batch.size: 125
pipeline.batch.delay: 50
http.host: 0.0.0.0 is required so the monitoring API on :9600 is reachable from outside the container (used by the healthcheck and by future Prometheus scraping).
Link to the full note →

6. pipeline/main.conf

The Logstash pipeline itself.

Super simple: Beats in, Elasticsearch out.
# /opt/observability-logs/pipeline/main.conf
input {
  beats {
    port => 5044
    client_inactivity_timeout => 3600
  }
}
 
filter {
  # Per-event parsing / enrichment goes here. Keep empty for the first
  # end-to-end test; add grok / mutate / date filters once the pipeline
  # is verified working with raw events.
}
 
output {
  elasticsearch {
    hosts    => [ "http://10.0.0.5:9200" ]
    user     => "logstash_writer"
    password => "${LOGSTASH_WRITER_PASSWORD}"
    index    => "logs-%{+YYYY.MM.dd}"
  }
}
A couple of details:

${LOGSTASH_WRITER_PASSWORD} is interpolated by Logstash at startup from its environment. The variable will be injected into the container by docker-compose (next section).

client_inactivity_timeout => 3600: the Beats input closes idle TCP connections after this many seconds. The default (60s) is too aggressive for long-lived Filebeat connections that may sit idle between batches: 1h is a safer upper bound.

Daily indices (logs-%{+YYYY.MM.dd}): easy to roll, easy to delete with ILM. One day per index means a mapping conflict is contained to a single day.

Link to the full note →

7. docker-compose.yml

Example: my docker-compose (logstash service)

services:
  logstash:
    image: docker.elastic.co/logstash/logstash:8.15.0
    container_name: logstash
    user: "1000:1000"
    environment:
      LS_JAVA_OPTS: "-Xms1g -Xmx1g"
      # Passed through from the local .env so pipeline/main.conf can use ${LOGSTASH_WRITER_PASSWORD}
      LOGSTASH_WRITER_PASSWORD: "${LOGSTASH_WRITER_PASSWORD}"
    volumes:
      - /opt/observability-logs/pipeline:/usr/share/logstash/pipeline:ro
      - /opt/observability-logs/config/logstash.yml:/usr/share/logstash/config/logstash.yml:ro
      - /opt/observability-logs/data:/usr/share/logstash/data
    ports:
      - "5044:5044"        # Beats input — reached by HAProxy from the VIP
      - "9600:9600"        # Monitoring API — private network only
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "curl -fsS http://localhost:9600 || exit 1"]
      interval: 30s
      retries: 5

IMPORTANT

Keep the pipeline/ directory containing only the pipeline file you intend to load.

Backup copies (main.conf.bak) placed in the same folder will be loaded as additional pipelines and cause Address already in use on port 5044. Store backups outside this directory.

Link to the full note →

Bring it up and watch for the “pipeline started” line:

8. Start it

cd /opt/observability-logs
sudo docker compose up -d
 
# Wait ~60s for Logstash to come up, then tail the logs
sudo docker compose logs -f logstash

Look for:

[INFO ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://logstash_writer:xxxxxx@10.0.0.5:9200/"}
[INFO ][logstash.javapipeline ][main] Pipeline started {"pipeline.id"=>"main"}
[INFO ][logstash.agent          ] Successfully started Logstash API endpoint {:port=>9600, :ssl_enabled=>false}

If you see Got response code '401' contacting Elasticsearch, the password in .env doesn’t match the one in ES: re-check it from the VPS .env.

Link to the full note →

And repeat the exact same recipe on the second VM: same image, same .env, same pipeline:

9. Repeat on logstash02

Steps are identical on the second VM.

Same image, same .env (same password), same pipeline, same ports.
Link to the full note →

4. HAProxy + Keepalived setup

The Logstash workers are up, but Filebeat shouldn’t talk to them directly: we want a single, highly-available endpoint in front.

Enter the LB pair: two HAProxy instances on loglb01 and loglb02, sharing a Virtual IP managed by Keepalived.

HAProxy on both LBs

Drop the same config on both loglb01 and loglb02, install and bring up the service:

1. The configuration file

/etc/haproxy/haproxy.cfg — identical on both loglb01 and loglb02:
Example: my haproxy.cfg
global
    log /dev/log local0
    log /dev/log local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon
 
defaults
    log     global
    mode    tcp
    option  dontlognull
    timeout connect 5s
    timeout client  24h
    timeout server  24h
    timeout tunnel  24h
    timeout check   5s
 
# ─────────────────────────────────────────────────────────
# Stats UI on :8404 — private network only
# Browse http://<lb-ip>:8404/ to see live backend status
# ─────────────────────────────────────────────────────────
frontend stats
    bind *:8404
    mode http
    stats enable
    stats uri /
    stats refresh 10s
 
# ─────────────────────────────────────────────────────────
# Beats input on :5044 — round-robin to Logstash workers
# ─────────────────────────────────────────────────────────
frontend beats_in
    bind *:5044
    mode tcp
    option tcplog
    default_backend logstash_pool
 
backend logstash_pool
    mode tcp
    balance roundrobin
    option tcp-check
    default-server inter 5s fall 3 rise 2
    server logstash01 10.0.0.21:5044 check
    server logstash02 10.0.0.22:5044 check
global block: standard daemon setup. Log forwarding to syslog via /dev/log (UNIX socket).

defaults → mode tcp: the entire instance defaults to TCP. The stats frontend later overrides this to mode http for the dashboard.

defaults → timeout client/server/tunnel 24h: the critical setting for long-lived Beats connections. With the package default of 1m you’d see Filebeat reconnect every minute, polluting the logs.

frontend stats on :8404: the HTML/CSV stats UI. In TCP mode it would be opaque, so this single frontend overrides to mode http.

frontend beats_in on :5044: the actual Beats receiver. option tcplog ensures each session is logged with byte counts and a state code on close.

backend logstash_pool:

balance roundrobin distributes connections one-by-one across logstash01 and logstash02.

option tcp-check says “probe by opening a TCP socket”: no Beats handshake, just a connect.

default-server inter 5s fall 3 rise 2 applies to both server lines: probe cadence is 5 s, mark DOWN after three failures (15 s), mark UP after two successes (10 s).

Link to the full note →

2. Install and bring it up

# On both loglb01 and loglb02
sudo apt update
sudo apt install -y haproxy
 
# Drop the config above into /etc/haproxy/haproxy.cfg
sudo nano /etc/haproxy/haproxy.cfg
 
# Validate before applying — never skip this
sudo haproxy -c -f /etc/haproxy/haproxy.cfg
# Output: "Configuration file is valid"
 
# Enable + start
sudo systemctl enable --now haproxy
sudo systemctl status haproxy --no-pager | head -5

Link to the full note →

Then verify the pool from any host on the private LAN (both backends should show UP):

3. Verify the pool

From any host on the private network:
curl -s "http://10.0.0.11:8404/;csv" | awk -F',' '
  NR==1 { print "frontend/backend  svname        scur  bin       bout      status"; next }
  $1=="beats_in" || $1=="logstash_pool" {
    printf "%-17s %-13s %-5s %-9s %-9s %s\n", $1, $2, $5, $9, $10, $18
  }'
Expected output once Logstash workers are up:
frontend/backend  svname        scur  bin       bout      status
beats_in          FRONTEND      0     0         0         OPEN
logstash_pool     logstash01    0     0         0         UP
logstash_pool     logstash02    0     0         0         UP
logstash_pool     BACKEND       0     0         0         UP
As Filebeat clients connect, scur (current sessions) and bin/bout (byte counters) start moving. If a Logstash worker dies, its row flips to DOWN within ~15 s and scur redistributes to the survivors.
Link to the full note →

Keepalived for the VIP

Now we give the pair a shared 10.0.0.10 VIP.

The two configs are nearly identical, with state MASTER + priority 110 on loglb01 and state BACKUP + priority 100 on loglb02:

1. Master config: loglb01

/etc/keepalived/keepalived.conf on loglb01:

Example: my keepalived.conf (loglb01, MASTER)

global_defs {
    enable_script_security
    script_user root
    router_id loglb01
}
 
# Health check: kill the local priority if HAProxy stops running
vrrp_script chk_haproxy {
    script "/usr/bin/pgrep -x haproxy"
    interval 2
    weight  -20
    fall     2
    rise     2
}
 
vrrp_instance VI_LOGS {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 110
    advert_int 1
 
    authentication {
        auth_type PASS
        auth_pass <8-char-secret>
    }
 
    virtual_ipaddress {
        10.0.0.10/24
    }
 
    track_script {
        chk_haproxy
    }
}

Link to the full note →

2. Backup config: loglb02

/etc/keepalived/keepalived.conf on loglb02.

Identical to loglb01 with two changes: state BACKUP, priority 100, and router_id loglb02.
Example: my keepalived.conf (loglb02, BACKUP)
global_defs {
    enable_script_security
    script_user root
    router_id loglb02
}
 
vrrp_script chk_haproxy {
    script "/usr/bin/pgrep -x haproxy"
    interval 2
    weight  -20
    fall     2
    rise     2
}
 
vrrp_instance VI_LOGS {
    state BACKUP
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
 
    authentication {
        auth_type PASS
        auth_pass <same-8-char-secret>
    }
 
    virtual_ipaddress {
        10.0.0.10/24
    }
 
    track_script {
        chk_haproxy
    }
}
Line-by-line commentary:

global_defs: enable_script_security + script_user root are required for vrrp_script to execute under modern Keepalived. router_id is a string used only for logging: useful to distinguish which host is logging what.

vrrp_script chk_haproxy: checks the local process every 2 s. weight -20 says “subtract 20 from priority while the check fails”. With master = 110 and backup = 100, a HAProxy crash on the master lowers it to 90 → backup (100) wins.

state MASTER / BACKUP: just the initial role, doesn’t actually decide who wins.

virtual_router_id 51: chosen arbitrarily, must match on both peers, must not collide with any other VRRP group on this L2.

priority 110 / 100: the real election weights. 10-point gap leaves headroom for the -20 track-script adjustment.

auth_pass: same 8-char value on both peers. Set with openssl rand -hex 4 to get a clean ASCII secret.

virtual_ipaddress 10.0.0.10/24: the VIP. Note the /24 matches the LAN’s CIDR: important for ARP resolution to work correctly.

track_script { chk_haproxy }: ties the health-check to the priority adjustment.

Link to the full note →

Install + bring it up on both nodes:

3. Install and bring it up

# Both loglb01 and loglb02
sudo apt update
sudo apt install -y keepalived
 
# Drop the config above into /etc/keepalived/keepalived.conf
sudo nano /etc/keepalived/keepalived.conf
sudo chmod 600 /etc/keepalived/keepalived.conf
 
# Validate before applying
sudo keepalived -t -f /etc/keepalived/keepalived.conf
# Silent output = valid
 
# Enable + start
sudo systemctl enable --now keepalived
sleep 3
sudo systemctl status keepalived --no-pager | head -6

Link to the full note →

Verify the election (loglb01 wins, owns the VIP) and then test failover by stopping HAProxy on the master, the VIP should migrate to loglb02 within ~5 seconds:

4. Verify election and VIP ownership

# On loglb01 — should be MASTER, owns the VIP
journalctl -u keepalived -n 10 --no-pager | grep -E "STATE|MASTER|BACKUP"
ip addr show eth0 | grep 10.0.0.10
 
# On loglb02 — should be BACKUP, no VIP attached
journalctl -u keepalived -n 10 --no-pager | grep -E "STATE|MASTER|BACKUP"
ip addr show eth0 | grep 10.0.0.10 && echo "✗ VIP attached on BACKUP (split-brain)" || echo "✓ no VIP on BACKUP (correct)"

Expected journal lines:

loglb01 Keepalived_vrrp: (VI_LOGS) Entering BACKUP STATE (init)
loglb01 Keepalived_vrrp: (VI_LOGS) Entering MASTER STATE
loglb02 Keepalived_vrrp: (VI_LOGS) Entering BACKUP STATE (init)

(Both nodes start in BACKUP, then loglb01 promotes itself within 3 s because it sees no MASTER with higher priority)

Link to the full note →

5. Failover test

Stop HAProxy on the master and confirm the VIP migrates:
# On loglb01
sudo systemctl stop haproxy
date -u +%T
 
# Check ~5 s later on loglb02
ip addr show eth0 | grep 10.0.0.10   # should now show the VIP
journalctl -u keepalived -n 5 --no-pager | grep "MASTER STATE"
Expected: loglb02 enters MASTER STATE within ~5 s of the HAProxy stop, attaches 10.0.0.10/24 to eth0, sends gratuitous ARP.

Clients on the L2 segment see no interruption beyond the brief gap.

Recover:
# On loglb01
sudo systemctl start haproxy
 
# Within ~5 s, loglb01 wins the election back (priority 110 vs 100)
# and the VIP migrates back. Add `nopreempt` to the loglb02 config
# if you'd rather have it keep the VIP — saves one extra glitch.
Link to the full note →

5. Filebeat setup

Last step!

Now that the entire consumer side is up and reachable through the VIP, we install Filebeat on the source host (in this lab: the VPS).

The agent runs natively via apt: no container, since shipping logs from a single host has trivial filesystem and journald access requirements.

First the prereqs and install from the official Elastic 8.x APT repo:

Prerequisites

Filebeat pushes logs somewhere, so the consumer side has to exist first.

If you’ve been following the series in the correct order, then everything below is already deployed:

A working Logstash pool (check out the Logstash setup).

A working HAProxy + Keepalived HA pair with a VIP (check out HAProxy + Keepalived VRRP). In this guide my VIP is 10.0.0.10:5044.

Link to the full note →

1. Install Filebeat from the official repo
# Elastic GPG key
sudo install -m 0755 -d /etc/apt/keyrings
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch \
  | sudo gpg --dearmor -o /etc/apt/keyrings/elastic.gpg
sudo chmod a+r /etc/apt/keyrings/elastic.gpg
 
# Elastic 8.x APT repo
echo "deb [signed-by=/etc/apt/keyrings/elastic.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" \
  | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
 
sudo apt update
sudo apt install -y filebeat
 
# Verify
filebeat version
# filebeat version 8.x.x (amd64), libbeat 8.x.x [...]
IMPORTANT

Filebeat and Elasticsearch don’t have to match versions exactly.

Elastic guarantees forward compatibility within a major version, so a newer Filebeat against an older Elasticsearch is supported… BUT the reverse (older Filebeat → newer ES) is not.

Link to the full note →

Then the config, pointing output.logstash at the HAProxy VIP at 10.0.0.10:5044:

2. /etc/filebeat/filebeat.yml

The shipped default config is heavy with disabled modules.

Replace it cleanly:
Example: my filebeat.yml setup
sudo cp /etc/filebeat/filebeat.yml /etc/filebeat/filebeat.yml.orig
 
sudo tee /etc/filebeat/filebeat.yml > /dev/null <<'EOF'
# ============================== Inputs ==================================
filebeat.inputs:
  - type: filestream
    id: syslog-files
    enabled: true
    paths:
      - /var/log/syslog
      - /var/log/auth.log
    fields:
      log_source: "syslog"
    fields_under_root: true
 
  - type: journald
    id: systemd
    enabled: true
    fields:
      log_source: "journald"
    fields_under_root: true
 
# ============================== Autodiscover ============================
# Pick up logs from any Docker container on this host automatically.
filebeat.autodiscover:
  providers:
    - type: docker
      hints.enabled: true
 
# ============================== Processors ==============================
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  - add_docker_metadata: ~
 
# ============================== Output ==================================
# Ship to the HAProxy VIP. The two LB nodes share this VIP via Keepalived;
# whichever HAProxy currently owns it serves the traffic, with automatic
# failover.
output.logstash:
  hosts: ["10.0.0.10:5044"]
 
# ============================== Setup ===================================
# Output is Logstash, which writes to logs-* with its own index pattern.
# Disable Filebeat's direct-to-ES setup steps — those target "filebeat-*"
# indices we don't actually use here.
setup.template.enabled: false
setup.ilm.enabled: false
setup.dashboards.enabled: false
 
# ============================== Logging =================================
logging.level: info
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7
  permissions: 0644
EOF
 
sudo chmod 600 /etc/filebeat/filebeat.yml
A few notes on each section:

fields_under_root: true promotes the custom log_source to a top-level field instead of nesting it under fields.log_source. Makes Kibana queries cleaner (log_source:syslog vs fields.log_source:syslog).

Autodiscover with docker provider + hints.enabled: true auto-attaches to every running container, reads its stdout/stderr from /var/lib/docker/containers/<id>/*.log.

add_host_metadata enriches every event with the hostname, OS, architecture. Crucial later when you scale beyond one shipper.

add_docker_metadata does the same for Docker events: container name, image, labels.

WARNING

Always disable setup.template, setup.ilm, and setup.dashboards when shipping to Logstash. Otherwise Filebeat tries to reach Elasticsearch directly also, fails because it has no ES credentials, and prints a stream of warnings at every startup.

They’re harmless, but they confuse a real diagnosis.

Link to the full note →

Sanity-check the config, then enable and start the service:

3. Sanity-check the config
# YAML / structure validation
sudo filebeat test config
 
# Connectivity to the configured output (TCP reach + protocol handshake)
sudo filebeat test output
Expected output for the second one:
logstash: 10.0.0.10:5044...
  connection...
    parse host... OK
    dns lookup... OK
    addresses: 10.0.0.10
    dial up... OK
  TLS... WARN secure connection disabled
  talk to server... OK
(TLS warning is expected: we’re on a private network, and Logstash’s Beats input is not TLS-enabled in this lab)

If the dial-up step fails, the path between Filebeat and the VIP is broken.

Quick reachability checks:
ping -c 2 10.0.0.10
nc -zv 10.0.0.10 5044
Link to the full note →

4. Enable and start

sudo systemctl enable filebeat
sudo systemctl start filebeat
sudo systemctl status filebeat --no-pager

Then watch the agent’s own log:

sudo journalctl -u filebeat -n 30 --no-pager

What you want to see:

Connecting to backoff(async(tcp://10.0.0.10:5044))
Connection to backoff(...) established

Link to the full note →

And finally verify end-to-end from Elasticsearch.

The moment we see logs-* indices growing, the entire pipeline is alive:

5. Verify end-to-end from Elasticsearch

The pipeline is Filebeat → HAProxy → Logstash → Elasticsearch.

We can confirm each hop without touching Kibana:
ELASTIC=$(sudo grep '^ELASTIC_PASSWORD=' /opt/observability-logs/.env | cut -d= -f2-)
 
# Index showing up?
curl -s -u "elastic:$ELASTIC" "http://localhost:9200/_cat/indices?v" | grep "logs-"
 
# Event count
curl -s -u "elastic:$ELASTIC" "http://localhost:9200/logs-*/_count" | python3 -m json.tool
 
# Peek at a recent document to see fields
curl -s -u "elastic:$ELASTIC" "http://localhost:9200/logs-*/_search?size=1&sort=@timestamp:desc" \
  | python3 -m json.tool | head -40
What to look for in a sample document:

agent.type: "filebeat": confirms the producer

agent.version: "8.x.x": Filebeat version

host.hostname: "<your-host>": host metadata processor

log_source: "syslog" (or "journald"): the custom field we added per input

event.original: "May 22 10:27:19 ... systemd[1]: ...": the raw log line

If the count is positive and grows each time you re-run it, Filebeat → Logstash → ES is working end to end!

INFO

To see this data in Kibana Discover, you need a Data View pointed at logs-*.

Create it from Stack Management as described in the Kibana setup.

Link to the full note →

And that’s it!

We have a complete, scalable ELK stack mirroring the shape used in real production environments: five hosts, six components, end-to-end log pipeline from journalctl to Discover.

Congratulations!

To keep the stack healthy long-term, I leave you with the closing thoughts:

Final considerations

You’re done.

Congratulations!

If you’ve followed the series from the start, you now have an end-to-end log pipeline running on five hosts:

Elasticsearch indexing on the VPS

Kibana serving the dashboard (with anonymous read-only viewing)

Two Logstash workers parsing in parallel

Two HAProxy + Keepalived fronting the workers, with a single highly-available VIP

Filebeat pushing syslog / journald / Docker logs into that VIP.

Smaller than what you’d run in production, but pretty identical in shape.

General pipeline monitoring

A log pipeline that doesn’t tell you when it itself is broken, is half-built.

The five-minute health round:

Component Quick check Looking for
Elasticsearch curl :9200/_cluster/health status: yellow (lab) / green (cluster)
Kibana curl :5601/logs/api/status HTTP 200, overall.level: available
Logstash worker curl :9600 status: green, events.in rising
HAProxy curl :8404/;csv both Logstash backends UP, scur > 0
Keepalived ip addr show | grep 10.0.0.10 VIP on MASTER, absent on BACKUP
Filebeat journalctl -u filebeat -n 20 Connection ... established, no dial errors

From a lab to real production

For real production scenarios, consider:

What changes when you have 2000+ machines

backing up the ES data periodically: The ILM policy from the Elasticsearch setup only deletes old indices, but it doesn’t back anything up, and if the VPS disk dies, the logs go with it. The native answer is an ES Backup Repository, pointed at S3 or a separate volume.

More Logstash workers behind the same LB pair: HAProxy’s balance roundrobin scales horizontally for free until you saturate the LB itself.

Multiple LB pairs geographically distributed, often with DNS round-robin in front, when one VIP can’t handle the throughput anymore.

A Kafka cluster between Beats and Logstash as a buffer: absorbs traffic spikes that even a HA-LB can’t smooth out, and decouples producers from consumers (LS can be down for maintenance and no events are lost).

An Elasticsearch cluster with separate node types: 3+ master, 5-20+ data, 2-4 ingest, 2-4 coordinator. This is where the real bottleneck lives (indexing throughput, shard count, JVM heap pressure).

Multi-tenant Kibana spaces so different teams can have their own dashboards, saved searches, and role-based access on the same ES backend.

Link to the full note →

Component	Quick check	Looking for
Elasticsearch	`curl :9200/_cluster/health`	`status: yellow` (lab) / `green` (cluster)
Kibana	`curl :5601/logs/api/status`	HTTP 200, `overall.level: available`
Logstash worker	`curl :9600`	`status: green`, `events.in` rising
HAProxy	`curl :8404/;csv`	both Logstash backends `UP`, `scur > 0`
Keepalived	`ip addr show \| grep 10.0.0.10`	VIP on MASTER, absent on BACKUP
Filebeat	`journalctl -u filebeat -n 20`	`Connection ... established`, no `dial` errors

What to do next

Great, you’ve successfully implemented a working log system… but do you have a metrics system as well?

If the answer is no well, you’ve got a new project to work on!

Andrea Farneti - Wiki

Notes

My ELK Stack: how to centralise logs from your VMs!

What is the ELK Stack?

Architecture

The stack I use

Deployment

1. Elasticsearch setup

Prerequisites

Kernel settings

Directory layout

Generate the elastic superuser password

docker-compose.yml

Start it and verify

ILM policy + index template

2. Kibana setup

Prerequisites

1. Generate Kibana credentials

2. Add the password of kibana_system user in ElasticSearch

3. Add the Kibana service in docker-compose

4. Start Kibana

5. Nginx reverse-proxy at /logs/

6. Verify public access

7. Create a Data View

Optional: public anonymous viewer

3. Logstash setup

1. Prerequisites on each VM

2. Generate the logstash_writer password on the VPS

3. Create the ES role and user (on the VPS)

4. Create a .env file on each worker

5. config/logstash.yml

6. pipeline/main.conf

7. docker-compose.yml

8. Start it

9. Repeat on logstash02

4. HAProxy + Keepalived setup

HAProxy on both LBs

1. The configuration file

2. Install and bring it up

3. Verify the pool

Keepalived for the VIP

1. Master config: loglb01

2. Backup config: loglb02

3. Install and bring it up

4. Verify election and VIP ownership

5. Failover test

5. Filebeat setup

Prerequisites

1. Install Filebeat from the official repo

2. /etc/filebeat/filebeat.yml

3. Sanity-check the config

4. Enable and start

5. Verify end-to-end from Elasticsearch

Final considerations

General pipeline monitoring

From a lab to real production

What to do next

Graph View

Table of Contents