Streaming Replication Won’t Save You at 2 AM
Here’s the thing about vanilla Postgres streaming replication: it’s great until it isn’t. You’ve got a primary, two standbys, data flowing in real time — and then the primary dies. Now what? You’re SSH-ing into a standby at 2 AM, running pg_promote, manually updating your app’s connection string, and praying you didn’t just promote a lagging replica with 30 seconds of missing transactions.
That’s the gap Patroni fills. It provides leader election using etcd as a Distributed Configuration Store (DCS), automatic promotion of the best replica, and a REST API that HAProxy uses for health checks so it knows exactly where to route traffic. No manual intervention. No 2 AM heroics.
This is a full end-to-end walkthrough. Real version numbers, real commands, real trade-offs.
Architecture Overview
Three layers, six VMs (or containers, or LXC — pick your poison):
┌─────────────────────────────────────────┐│ HAProxy (1 node) ││ :5000 → primary only (read/write) ││ :5001 → replicas only (read-only) │└────────────┬───────────────┬────────────┘ │ │ ┌────────▼──────┐ ┌──────▼────────┐ │ pg-node-1 │ │ pg-node-2 │ ... pg-node-3 │ Patroni 4.x │ │ Patroni 4.x │ │ Postgres 17 │ │ Postgres 17 │ └───────┬───────┘ └───────┬───────┘ │ │ ┌───────▼─────────────────▼───────┐ │ etcd cluster (3 nodes) │ │ etcd-1 / etcd-2 / etcd-3 │ └─────────────────────────────────┘etcd gives you quorum-based leader election. Patroni holds a lease in etcd. If the primary can’t renew its lease (network partition, OOM kill, whatever), Patroni on a replica picks up the lease and promotes itself. HAProxy’s health check hits Patroni’s REST API — /master returns 200 on the current primary, /replica returns 200 on standbys. Clean, deterministic routing.
Node IPs for this guide:
| Host | IP | Role |
|---|---|---|
| etcd-1 | 10.0.0.11 | etcd |
| etcd-2 | 10.0.0.12 | etcd |
| etcd-3 | 10.0.0.13 | etcd |
| pg-1 | 10.0.0.21 | Patroni + Postgres |
| pg-2 | 10.0.0.22 | Patroni + Postgres |
| pg-3 | 10.0.0.23 | Patroni + Postgres |
| haproxy | 10.0.0.30 | HAProxy |
Step 1: etcd 3.5 Cluster
Install on all three etcd nodes:
ETCD_VER=v3.5.14curl -L https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz \ | tar xz -C /usr/local/bin --strip-components=1 etcd-${ETCD_VER}-linux-amd64/etcd \ etcd-${ETCD_VER}-linux-amd64/etcdctlCreate the data dir and systemd unit on each node. Replace etcd-1, 10.0.0.11, and the --initial-cluster values per host:
mkdir -p /var/lib/etcd# /etc/systemd/system/etcd.service — on etcd-1[Unit]Description=etcdAfter=network.target
[Service]Type=notifyUser=rootExecStart=/usr/local/bin/etcd \ --name etcd-1 \ --data-dir /var/lib/etcd \ --listen-peer-urls http://10.0.0.11:2380 \ --listen-client-urls http://10.0.0.11:2379,http://127.0.0.1:2379 \ --advertise-client-urls http://10.0.0.11:2379 \ --initial-advertise-peer-urls http://10.0.0.11:2380 \ --initial-cluster-token etcd-cluster-1 \ --initial-cluster etcd-1=http://10.0.0.11:2380,etcd-2=http://10.0.0.12:2380,etcd-3=http://10.0.0.13:2380 \ --initial-cluster-state newRestart=alwaysRestartSec=5
[Install]WantedBy=multi-user.targetOn etcd-2: same but --name etcd-2, --listen-peer-urls http://10.0.0.12:2380, etc.
On etcd-3: same pattern with 10.0.0.13.
systemctl daemon-reloadsystemctl enable --now etcdVerify all three nodes see each other:
etcdctl --endpoints=http://10.0.0.11:2379,http://10.0.0.12:2379,http://10.0.0.13:2379 endpoint healthYou want three lines all saying is healthy. If you get quorum errors, check firewall rules on 2379/2380.
Step 2: Postgres 17 + Patroni 4.x
On all three Postgres nodes:
# Postgres 17 from PGDGapt install -y curl ca-certificatesinstall -d /usr/share/postgresql-common/pgdgcurl -o /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc \ https://www.postgresql.org/media/keys/ACCC4CF8.ascecho "deb [signed-by=/usr/share/postgresql-common/pgdg/apt.postgresql.org.asc] \ https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" \ > /etc/apt/sources.list.d/pgdg.listapt update && apt install -y postgresql-17
# Stop and disable the default service — Patroni manages the lifecyclesystemctl stop postgresqlsystemctl disable postgresql
# Patroni 4.xapt install -y python3-pip python3-psycopg2pip3 install patroni[etcd3] --break-system-packagesPatroni needs Python’s etcd3 extras. The [etcd3] install target pulls in python-etcd3 and grpcio for the gRPC-based etcd v3 API. If you’re on a distro that screams about --break-system-packages, use a venv — python3 -m venv /opt/patroni && /opt/patroni/bin/pip install patroni[etcd3].
Step 3: Patroni Configuration
The patroni.yml below goes on each node. Only name, connect_address, and listen change per node.
# /etc/patroni/patroni.yml — on pg-1scope: postgres-hanamespace: /service/name: pg-1
restapi: listen: 10.0.0.21:8008 connect_address: 10.0.0.21:8008
etcd3: hosts: - 10.0.0.11:2379 - 10.0.0.12:2379 - 10.0.0.13:2379
bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 # 1 MB — don't promote a badly lagging replica synchronous_mode: on postgresql: use_pg_rewind: true parameters: max_connections: 200 shared_buffers: 512MB wal_level: replica max_wal_senders: 10 max_replication_slots: 10 hot_standby: on synchronous_commit: on wal_log_hints: on # required for pg_rewind
initdb: - encoding: UTF8 - data-checksums
pg_hba: - host replication replicator 10.0.0.0/24 scram-sha-256 - host all all 10.0.0.0/24 scram-sha-256
users: admin: password: "changeme_admin" options: - createrole - createdb replicator: password: "changeme_repl" options: - replication
postgresql: listen: 10.0.0.21:5432 connect_address: 10.0.0.21:5432 data_dir: /var/lib/postgresql/17/main bin_dir: /usr/lib/postgresql/17/bin pgpass: /tmp/pgpass
authentication: replication: username: replicator password: "changeme_repl" superuser: username: postgres password: "changeme_super" rewind: username: rewind_user password: "changeme_rewind"
parameters: archive_mode: on archive_command: >- pgbackrest --stanza=main archive-push %p
watchdog: mode: required device: /dev/watchdog safety_margin: 5
tags: nofailover: false noloadbalance: false clonedfrom: false nosync: falseOn pg-2 and pg-3, change name: pg-2 / pg-3, and both listen/connect_address IP values.
The watchdog block is important. With mode: required, Patroni will refuse to start if it can’t open /dev/watchdog. That’s intentional — a hung Postgres node that can’t communicate should fence itself rather than let HAProxy route to a split-brain primary. Load the kernel module: modprobe softdog && echo 'softdog' >> /etc/modules.
Create the systemd service for Patroni:
[Unit]Description=Patroni Cluster ManagerAfter=network.target
[Service]Type=simpleUser=postgresGroup=postgresExecStart=/usr/local/bin/patroni /etc/patroni/patroni.ymlExecReload=/bin/kill -HUP $MAINPIDKillMode=processTimeoutSec=30Restart=no
[Install]WantedBy=multi-user.targetmkdir -p /etc/patronichown postgres:postgres /etc/patronichmod 700 /etc/patroni/patroni.yml # contains passwordssystemctl daemon-reloadsystemctl enable --now patroniStart pg-1 first. It will initialize the cluster and bootstrap. Then start pg-2 and pg-3 — they’ll clone from pg-1 automatically.
Check cluster state:
patronictl -c /etc/patroni/patroni.yml listExpected output:
+ Cluster: postgres-ha (7123456789012345678) +---------+----+-----------+| Member | Host | Role | State | TL | Lag in MB |+--------+-------------+---------+---------+----+-----------+| pg-1 | 10.0.0.21:5432 | Leader | running | 1 | || pg-2 | 10.0.0.22:5432 | Replica | running | 1 | 0 || pg-3 | 10.0.0.23:5432 | Replica | running | 1 | 0 |+--------+-------------+---------+---------+----+-----------+Step 4: HAProxy 2.9
On the haproxy node:
apt install -y haproxy=2.9.*The HAProxy config uses Patroni’s REST API for health checks. /primary returns HTTP 200 only on the current primary. /replica returns 200 only on standbys. HAProxy routes accordingly — no manual intervention, no custom scripts.
global maxconn 100 log /dev/log local0
defaults log global mode tcp retries 2 timeout client 30m timeout connect 4s timeout server 30m timeout check 5s
#---------------------------------------------------------------------# Read/Write — primary only#---------------------------------------------------------------------listen postgres_rw bind *:5000 option httpchk http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server pg-1 10.0.0.21:5432 maxconn 100 check port 8008 check-ssl verify none server pg-2 10.0.0.22:5432 maxconn 100 check port 8008 check-ssl verify none server pg-3 10.0.0.23:5432 maxconn 100 check port 8008 check-ssl verify none
#---------------------------------------------------------------------# Read-Only — replicas only#---------------------------------------------------------------------listen postgres_ro bind *:5001 option httpchk GET /replica http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server pg-1 10.0.0.21:5432 maxconn 100 check port 8008 check-ssl verify none server pg-2 10.0.0.22:5432 maxconn 100 check port 8008 check-ssl verify none server pg-3 10.0.0.23:5432 maxconn 100 check port 8008 check-ssl verify none
#---------------------------------------------------------------------# Stats page#---------------------------------------------------------------------listen stats bind *:7000 mode http stats enable stats uri / stats refresh 10s stats show-nodeThe option httpchk without a path defaults to GET / — override the path for the read/write listener to hit /primary explicitly. HAProxy 2.9 sends to port 8008 but we need the path. Set it per-listener:
# Add this line to the postgres_rw listen block:# option httpchk GET /primaryUpdated rw block:
Updated: As of Patroni 4.0, the /master endpoint was removed — use /primary for the primary and /replica for standbys.
listen postgres_rw bind *:5000 option httpchk GET /primary http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server pg-1 10.0.0.21:5432 maxconn 100 check port 8008 server pg-2 10.0.0.22:5432 maxconn 100 check port 8008 server pg-3 10.0.0.23:5432 maxconn 100 check port 8008systemctl enable --now haproxyTest connectivity:
psql -h 10.0.0.30 -p 5000 -U admin -d postgres -c "SELECT pg_is_in_recovery();"# Returns: f (false) — you're on the primary
psql -h 10.0.0.30 -p 5001 -U admin -d postgres -c "SELECT pg_is_in_recovery();"# Returns: t (true) — you're on a replicaStep 5: The Trade-Off You’re Signing Up For
Honestly, this is the part most guides skip over. With synchronous_mode: on and synchronous_commit: on, the primary won’t acknowledge a write until at least one synchronous standby has written it to its WAL. Zero data loss — but if your synchronous standbys are both down or partitioned, the primary blocks writes. It won’t just degrade gracefully; it stops.
That’s the deal. You pick one:
synchronous_commit: on— Zero data loss, writes can stall during partial failuressynchronous_commit: local— Writes always succeed, tiny window of data loss on failover
For a homelab database, local is probably fine. For anything financial, use on and accept the stall risk. Patroni’s synchronous_node_count parameter lets you tune how many sync standbys are required — default is 1.
Step 6: Failover Test
This is the fun part. Kill the primary hard:
# On pg-1 (the current primary)kill -9 $(head -1 /var/lib/postgresql/17/main/postmaster.pid)Watch what happens on any other node:
watch -n 1 patronictl -c /etc/patroni/patroni.yml listWithin ttl seconds (30 in our config), you’ll see pg-2 or pg-3 acquire the leader lease and promote:
+ Cluster: postgres-ha (7123456789012345678) +---------+----+-----------+| Member | Host | Role | State | TL | Lag in MB |+--------+-------------+---------+---------+----+-----------+| pg-1 | 10.0.0.21:5432 | Replica | stopped | | unknown || pg-2 | 10.0.0.22:5432 | Leader | running | 2 | || pg-3 | 10.0.0.23:5432 | Replica | running | 2 | 0 |+--------+-------------+---------+---------+----+-----------+HAProxy’s health check picks this up within the inter 3s interval. Port 5000 now routes to pg-2. Port 5001 routes to pg-3 (and eventually pg-1 once it rejoins).
When pg-1 comes back, Patroni uses pg_rewind to reconcile its WAL with the new primary’s timeline, then rejoins as a replica. No manual steps.
# Manually trigger failover without killing anything (useful for maintenance):patronictl -c /etc/patroni/patroni.yml failover postgres-ha --master pg-1 --candidate pg-2 --forceStep 7: pgBackRest Integration
Patroni manages the cluster; pgBackRest handles backups. They play nicely together — configure archive_command in Patroni’s postgresql parameters block (as shown in patroni.yml above) so WAL archiving works on whichever node is currently the primary.
# On the node that will run backups (or a dedicated backup host)apt install -y pgbackrest
[global]repo1-path=/var/lib/pgbackrestrepo1-retention-full=2log-level-console=infolog-level-file=detail
[main]pg1-path=/var/lib/postgresql/17/mainpg1-port=5432pg1-user=postgres# Initialize the stanza (run once)pgbackrest --stanza=main stanza-create
# Full backuppgbackrest --stanza=main backup --type=full
# Verifypgbackrest --stanza=main infoThe archive_command in patroni.yml calls pgBackRest for each WAL segment. Combined with a nightly full backup and continuous WAL archiving, you’ve got point-in-time recovery on top of your HA cluster.
Gotchas Worth Knowing Before You Start
Clock skew. etcd’s lease TTL is wall-clock time. If your nodes have drifted clocks, Patroni’s heartbeat math gets weird and you’ll see spurious failovers. Install chrony on every node:
apt install -y chronysystemctl enable --now chronychronyc tracking # verify offset < 1sWatchdog timeout vs. TTL. Patroni’s ttl (30s) should be at least twice the loop_wait (10s). The watchdog safety_margin (5s) is subtracted from the watchdog kernel timeout — make sure your watchdog device timeout is greater than ttl + safety_margin. For softdog the default is 60s, which works fine.
etcd quorum loss. If two of your three etcd nodes go down, etcd goes into read-only mode. Patroni can’t renew leases, can’t elect a new leader, and your cluster freezes in its current state. The primary keeps serving existing connections, but no failover can happen. Three etcd nodes tolerate one failure; five nodes tolerate two. Plan accordingly.
pg_rewind and wal_log_hints. Without wal_log_hints: on in postgresql.parameters, pg_rewind won’t work and rejoining a demoted primary requires a full re-clone. Enable it now, not after your first messy failover.
maximum_lag_on_failover. The 1 MB setting means Patroni won’t promote a replica that’s more than 1 MB behind the primary’s WAL. That’s usually fine on a LAN, but if you have a heavily loaded primary and slow replicas, tune this up or you’ll find no eligible candidate for promotion.
Should You Bother?
Honestly? For a homelab personal project where 10 minutes of downtime is fine — probably not. This setup has real operational weight: seven nodes minimum, etcd to babysit, TLS certificates if you care about security, and watchdog kernel modules. It’s not “install and forget.”
But for anything that actually matters — a side project people depend on, a small business app, a home automation database that controls your HVAC — Patroni + etcd + HAProxy is the right answer. It’s what production teams at scale use, and for good reason. Automatic failover in under 30 seconds, zero data loss with synchronous mode, clean read/write splitting, and enough observability (REST API, patronictl, HAProxy stats) to know what’s happening without logging into every node.
The 2 AM difference between “Postgres is down, paging on-call” and “Postgres failed over automatically, I’ll review the logs in the morning” is worth the setup cost.
Start with the etcd cluster, validate it’s healthy, then add Patroni one node at a time. Kill things deliberately. Build muscle memory for what failover looks like before production traffic depends on it.