Postgres HA: Patroni + etcd + HAProxy

Streaming Replication Won’t Save You at 2 AM

Here’s the thing about vanilla Postgres streaming replication: it’s great until it isn’t. You’ve got a primary, two standbys, data flowing in real time — and then the primary dies. Now what? You’re SSH-ing into a standby at 2 AM, running pg_promote, manually updating your app’s connection string, and praying you didn’t just promote a lagging replica with 30 seconds of missing transactions.

That’s the gap Patroni fills. It provides leader election using etcd as a Distributed Configuration Store (DCS), automatic promotion of the best replica, and a REST API that HAProxy uses for health checks so it knows exactly where to route traffic. No manual intervention. No 2 AM heroics.

This is a full end-to-end walkthrough. Real version numbers, real commands, real trade-offs.

Architecture Overview

Three layers, six VMs (or containers, or LXC — pick your poison):

┌─────────────────────────────────────────┐
│              HAProxy (1 node)           │
│  :5000 → primary only (read/write)      │
│  :5001 → replicas only (read-only)      │
└────────────┬───────────────┬────────────┘
             │               │
    ┌────────▼──────┐ ┌──────▼────────┐
    │  pg-node-1    │ │  pg-node-2    │ ... pg-node-3
    │  Patroni 4.x  │ │  Patroni 4.x  │
    │  Postgres 17  │ │  Postgres 17  │
    └───────┬───────┘ └───────┬───────┘
            │                 │
    ┌───────▼─────────────────▼───────┐
    │     etcd cluster (3 nodes)      │
    │   etcd-1 / etcd-2 / etcd-3      │
    └─────────────────────────────────┘

etcd gives you quorum-based leader election. Patroni holds a lease in etcd. If the primary can’t renew its lease (network partition, OOM kill, whatever), Patroni on a replica picks up the lease and promotes itself. HAProxy’s health check hits Patroni’s REST API — /master returns 200 on the current primary, /replica returns 200 on standbys. Clean, deterministic routing.

Node IPs for this guide:

Host	IP	Role
etcd-1	10.0.0.11	etcd
etcd-2	10.0.0.12	etcd
etcd-3	10.0.0.13	etcd
pg-1	10.0.0.21	Patroni + Postgres
pg-2	10.0.0.22	Patroni + Postgres
pg-3	10.0.0.23	Patroni + Postgres
haproxy	10.0.0.30	HAProxy

Step 1: etcd 3.5 Cluster

Install on all three etcd nodes:

ETCD_VER=v3.5.14
curl -L https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz \
  | tar xz -C /usr/local/bin --strip-components=1 etcd-${ETCD_VER}-linux-amd64/etcd \
                                                    etcd-${ETCD_VER}-linux-amd64/etcdctl

Create the data dir and systemd unit on each node. Replace etcd-1, 10.0.0.11, and the --initial-cluster values per host:

mkdir -p /var/lib/etcd

# /etc/systemd/system/etcd.service — on etcd-1
[Unit]
Description=etcd
After=network.target

[Service]
Type=notify
User=root
ExecStart=/usr/local/bin/etcd \
  --name etcd-1 \
  --data-dir /var/lib/etcd \
  --listen-peer-urls http://10.0.0.11:2380 \
  --listen-client-urls http://10.0.0.11:2379,http://127.0.0.1:2379 \
  --advertise-client-urls http://10.0.0.11:2379 \
  --initial-advertise-peer-urls http://10.0.0.11:2380 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster etcd-1=http://10.0.0.11:2380,etcd-2=http://10.0.0.12:2380,etcd-3=http://10.0.0.13:2380 \
  --initial-cluster-state new
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

On etcd-2: same but --name etcd-2, --listen-peer-urls http://10.0.0.12:2380, etc. On etcd-3: same pattern with 10.0.0.13.

systemctl daemon-reload
systemctl enable --now etcd

Verify all three nodes see each other:

etcdctl --endpoints=http://10.0.0.11:2379,http://10.0.0.12:2379,http://10.0.0.13:2379 endpoint health

You want three lines all saying is healthy. If you get quorum errors, check firewall rules on 2379/2380.

Step 2: Postgres 17 + Patroni 4.x

On all three Postgres nodes:

# Postgres 17 from PGDG
apt install -y curl ca-certificates
install -d /usr/share/postgresql-common/pgdg
curl -o /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc \
  https://www.postgresql.org/media/keys/ACCC4CF8.asc
echo "deb [signed-by=/usr/share/postgresql-common/pgdg/apt.postgresql.org.asc] \
  https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" \
  > /etc/apt/sources.list.d/pgdg.list
apt update && apt install -y postgresql-17

# Stop and disable the default service — Patroni manages the lifecycle
systemctl stop postgresql
systemctl disable postgresql

# Patroni 4.x
apt install -y python3-pip python3-psycopg2
pip3 install patroni[etcd3] --break-system-packages

Patroni needs Python’s etcd3 extras. The [etcd3] install target pulls in python-etcd3 and grpcio for the gRPC-based etcd v3 API. If you’re on a distro that screams about --break-system-packages, use a venv — python3 -m venv /opt/patroni && /opt/patroni/bin/pip install patroni[etcd3].

Step 3: Patroni Configuration

The patroni.yml below goes on each node. Only name, connect_address, and listen change per node.

# /etc/patroni/patroni.yml — on pg-1
scope: postgres-ha
namespace: /service/
name: pg-1

restapi:
  listen: 10.0.0.21:8008
  connect_address: 10.0.0.21:8008

etcd3:
  hosts:
    - 10.0.0.11:2379
    - 10.0.0.12:2379
    - 10.0.0.13:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576   # 1 MB — don't promote a badly lagging replica
    synchronous_mode: on
    postgresql:
      use_pg_rewind: true
      parameters:
        max_connections: 200
        shared_buffers: 512MB
        wal_level: replica
        max_wal_senders: 10
        max_replication_slots: 10
        hot_standby: on
        synchronous_commit: on
        wal_log_hints: on               # required for pg_rewind

  initdb:
    - encoding: UTF8
    - data-checksums

  pg_hba:
    - host replication replicator 10.0.0.0/24 scram-sha-256
    - host all all 10.0.0.0/24 scram-sha-256

  users:
    admin:
      password: "changeme_admin"
      options:
        - createrole
        - createdb
    replicator:
      password: "changeme_repl"
      options:
        - replication

postgresql:
  listen: 10.0.0.21:5432
  connect_address: 10.0.0.21:5432
  data_dir: /var/lib/postgresql/17/main
  bin_dir: /usr/lib/postgresql/17/bin
  pgpass: /tmp/pgpass

  authentication:
    replication:
      username: replicator
      password: "changeme_repl"
    superuser:
      username: postgres
      password: "changeme_super"
    rewind:
      username: rewind_user
      password: "changeme_rewind"

  parameters:
    archive_mode: on
    archive_command: >-
      pgbackrest --stanza=main archive-push %p

watchdog:
  mode: required
  device: /dev/watchdog
  safety_margin: 5

tags:
  nofailover: false
  noloadbalance: false
  clonedfrom: false
  nosync: false

On pg-2 and pg-3, change name: pg-2 / pg-3, and both listen/connect_address IP values.

The watchdog block is important. With mode: required, Patroni will refuse to start if it can’t open /dev/watchdog. That’s intentional — a hung Postgres node that can’t communicate should fence itself rather than let HAProxy route to a split-brain primary. Load the kernel module: modprobe softdog && echo 'softdog' >> /etc/modules.

Create the systemd service for Patroni:

[Unit]
Description=Patroni Cluster Manager
After=network.target

[Service]
Type=simple
User=postgres
Group=postgres
ExecStart=/usr/local/bin/patroni /etc/patroni/patroni.yml
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
TimeoutSec=30
Restart=no

[Install]
WantedBy=multi-user.target

mkdir -p /etc/patroni
chown postgres:postgres /etc/patroni
chmod 700 /etc/patroni/patroni.yml  # contains passwords
systemctl daemon-reload
systemctl enable --now patroni

Start pg-1 first. It will initialize the cluster and bootstrap. Then start pg-2 and pg-3 — they’ll clone from pg-1 automatically.

Check cluster state:

patronictl -c /etc/patroni/patroni.yml list

Expected output:

+ Cluster: postgres-ha (7123456789012345678) +---------+----+-----------+
| Member | Host        | Role    | State   | TL | Lag in MB |
+--------+-------------+---------+---------+----+-----------+
| pg-1   | 10.0.0.21:5432 | Leader  | running |  1 |           |
| pg-2   | 10.0.0.22:5432 | Replica | running |  1 |         0 |
| pg-3   | 10.0.0.23:5432 | Replica | running |  1 |         0 |
+--------+-------------+---------+---------+----+-----------+

Step 4: HAProxy 2.9

On the haproxy node:

apt install -y haproxy=2.9.*

The HAProxy config uses Patroni’s REST API for health checks. /primary returns HTTP 200 only on the current primary. /replica returns 200 only on standbys. HAProxy routes accordingly — no manual intervention, no custom scripts.

global
    maxconn 100
    log /dev/log local0

defaults
    log global
    mode tcp
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s

#---------------------------------------------------------------------
# Read/Write — primary only
#---------------------------------------------------------------------
listen postgres_rw
    bind *:5000
    option httpchk
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server pg-1 10.0.0.21:5432 maxconn 100 check port 8008 check-ssl verify none
    server pg-2 10.0.0.22:5432 maxconn 100 check port 8008 check-ssl verify none
    server pg-3 10.0.0.23:5432 maxconn 100 check port 8008 check-ssl verify none

#---------------------------------------------------------------------
# Read-Only — replicas only
#---------------------------------------------------------------------
listen postgres_ro
    bind *:5001
    option httpchk GET /replica
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server pg-1 10.0.0.21:5432 maxconn 100 check port 8008 check-ssl verify none
    server pg-2 10.0.0.22:5432 maxconn 100 check port 8008 check-ssl verify none
    server pg-3 10.0.0.23:5432 maxconn 100 check port 8008 check-ssl verify none

#---------------------------------------------------------------------
# Stats page
#---------------------------------------------------------------------
listen stats
    bind *:7000
    mode http
    stats enable
    stats uri /
    stats refresh 10s
    stats show-node

The option httpchk without a path defaults to GET / — override the path for the read/write listener to hit /primary explicitly. HAProxy 2.9 sends to port 8008 but we need the path. Set it per-listener:

# Add this line to the postgres_rw listen block:
#   option httpchk GET /primary

Updated rw block:

Updated: As of Patroni 4.0, the /master endpoint was removed — use /primary for the primary and /replica for standbys.

listen postgres_rw
    bind *:5000
    option httpchk GET /primary
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server pg-1 10.0.0.21:5432 maxconn 100 check port 8008
    server pg-2 10.0.0.22:5432 maxconn 100 check port 8008
    server pg-3 10.0.0.23:5432 maxconn 100 check port 8008

systemctl enable --now haproxy

Test connectivity:

psql -h 10.0.0.30 -p 5000 -U admin -d postgres -c "SELECT pg_is_in_recovery();"
# Returns: f (false) — you're on the primary

psql -h 10.0.0.30 -p 5001 -U admin -d postgres -c "SELECT pg_is_in_recovery();"
# Returns: t (true) — you're on a replica

Step 5: The Trade-Off You’re Signing Up For

Honestly, this is the part most guides skip over. With synchronous_mode: on and synchronous_commit: on, the primary won’t acknowledge a write until at least one synchronous standby has written it to its WAL. Zero data loss — but if your synchronous standbys are both down or partitioned, the primary blocks writes. It won’t just degrade gracefully; it stops.

That’s the deal. You pick one:

synchronous_commit: on — Zero data loss, writes can stall during partial failures
synchronous_commit: local — Writes always succeed, tiny window of data loss on failover

For a homelab database, local is probably fine. For anything financial, use on and accept the stall risk. Patroni’s synchronous_node_count parameter lets you tune how many sync standbys are required — default is 1.

Step 6: Failover Test

This is the fun part. Kill the primary hard:

# On pg-1 (the current primary)
kill -9 $(head -1 /var/lib/postgresql/17/main/postmaster.pid)

Watch what happens on any other node:

watch -n 1 patronictl -c /etc/patroni/patroni.yml list

Within ttl seconds (30 in our config), you’ll see pg-2 or pg-3 acquire the leader lease and promote:

+ Cluster: postgres-ha (7123456789012345678) +---------+----+-----------+
| Member | Host        | Role    | State   | TL | Lag in MB |
+--------+-------------+---------+---------+----+-----------+
| pg-1   | 10.0.0.21:5432 | Replica | stopped |    |   unknown |
| pg-2   | 10.0.0.22:5432 | Leader  | running |  2 |           |
| pg-3   | 10.0.0.23:5432 | Replica | running |  2 |         0 |
+--------+-------------+---------+---------+----+-----------+

HAProxy’s health check picks this up within the inter 3s interval. Port 5000 now routes to pg-2. Port 5001 routes to pg-3 (and eventually pg-1 once it rejoins).

When pg-1 comes back, Patroni uses pg_rewind to reconcile its WAL with the new primary’s timeline, then rejoins as a replica. No manual steps.

# Manually trigger failover without killing anything (useful for maintenance):
patronictl -c /etc/patroni/patroni.yml failover postgres-ha --master pg-1 --candidate pg-2 --force

Step 7: pgBackRest Integration

Patroni manages the cluster; pgBackRest handles backups. They play nicely together — configure archive_command in Patroni’s postgresql parameters block (as shown in patroni.yml above) so WAL archiving works on whichever node is currently the primary.

# On the node that will run backups (or a dedicated backup host)
apt install -y pgbackrest

[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
log-level-console=info
log-level-file=detail

[main]
pg1-path=/var/lib/postgresql/17/main
pg1-port=5432
pg1-user=postgres

# Initialize the stanza (run once)
pgbackrest --stanza=main stanza-create

# Full backup
pgbackrest --stanza=main backup --type=full

# Verify
pgbackrest --stanza=main info

The archive_command in patroni.yml calls pgBackRest for each WAL segment. Combined with a nightly full backup and continuous WAL archiving, you’ve got point-in-time recovery on top of your HA cluster.

Gotchas Worth Knowing Before You Start

Clock skew. etcd’s lease TTL is wall-clock time. If your nodes have drifted clocks, Patroni’s heartbeat math gets weird and you’ll see spurious failovers. Install chrony on every node:

apt install -y chrony
systemctl enable --now chrony
chronyc tracking  # verify offset < 1s

Watchdog timeout vs. TTL. Patroni’s ttl (30s) should be at least twice the loop_wait (10s). The watchdog safety_margin (5s) is subtracted from the watchdog kernel timeout — make sure your watchdog device timeout is greater than ttl + safety_margin. For softdog the default is 60s, which works fine.

etcd quorum loss. If two of your three etcd nodes go down, etcd goes into read-only mode. Patroni can’t renew leases, can’t elect a new leader, and your cluster freezes in its current state. The primary keeps serving existing connections, but no failover can happen. Three etcd nodes tolerate one failure; five nodes tolerate two. Plan accordingly.

pg_rewind and wal_log_hints. Without wal_log_hints: on in postgresql.parameters, pg_rewind won’t work and rejoining a demoted primary requires a full re-clone. Enable it now, not after your first messy failover.

maximum_lag_on_failover. The 1 MB setting means Patroni won’t promote a replica that’s more than 1 MB behind the primary’s WAL. That’s usually fine on a LAN, but if you have a heavily loaded primary and slow replicas, tune this up or you’ll find no eligible candidate for promotion.

Should You Bother?

Honestly? For a homelab personal project where 10 minutes of downtime is fine — probably not. This setup has real operational weight: seven nodes minimum, etcd to babysit, TLS certificates if you care about security, and watchdog kernel modules. It’s not “install and forget.”

But for anything that actually matters — a side project people depend on, a small business app, a home automation database that controls your HVAC — Patroni + etcd + HAProxy is the right answer. It’s what production teams at scale use, and for good reason. Automatic failover in under 30 seconds, zero data loss with synchronous mode, clean read/write splitting, and enough observability (REST API, patronictl, HAProxy stats) to know what’s happening without logging into every node.

The 2 AM difference between “Postgres is down, paging on-call” and “Postgres failed over automatically, I’ll review the logs in the morning” is worth the setup cost.

Start with the etcd cluster, validate it’s healthy, then add Patroni one node at a time. Kill things deliberately. Build muscle memory for what failover looks like before production traffic depends on it.

Postgres HA: Patroni + etcd + HAProxy

Streaming Replication Won’t Save You at 2 AM

Architecture Overview

Step 1: etcd 3.5 Cluster

Step 2: Postgres 17 + Patroni 4.x

Step 3: Patroni Configuration

Step 4: HAProxy 2.9

Step 5: The Trade-Off You’re Signing Up For

Step 6: Failover Test

Step 7: pgBackRest Integration

Gotchas Worth Knowing Before You Start

Should You Bother?

Responses from around the web

Discussion

Related Posts

Wiki.js for your documentation in docker

NocoDB DB Management System

Whisparr & Mylar3: Specialty *arr Apps Explained

LazyLibrarian + Readarr: Automating Your Book Library

Postgres HA: Patroni + etcd + HAProxy

Streaming Replication Won’t Save You at 2 AM

Architecture Overview

Step 1: etcd 3.5 Cluster

Step 2: Postgres 17 + Patroni 4.x

Step 3: Patroni Configuration

Step 4: HAProxy 2.9

Step 5: The Trade-Off You’re Signing Up For

Step 6: Failover Test

Step 7: pgBackRest Integration

Gotchas Worth Knowing Before You Start

Should You Bother?

Related Reading

Responses from around the web

Discussion

Related Posts

Wiki.js for your documentation in docker

NocoDB DB Management System

Whisparr & Mylar3: Specialty *arr Apps Explained

LazyLibrarian + Readarr: Automating Your Book Library