PostgreSQL High Availability: A Practical Guide with Patroni and pgBouncer
Introduction Your database is the last thing that should go down. Yet setting up PostgreSQL for high availability remains one of the most under-documented areas in the DevOps world. Most teams run a single Postgres instance until the day it dies, then scramble to set up replication while their users stare at error pages. This guide walks through a production-ready PostgreSQL HA setup using the tools that have become the industry standard: Patroni for automated failover, pgBouncer for connection pooling, streaming replication for data redundancy, and pgBackRest for backups. We'll cover the architecture, actual configuration files, failure scenarios, and the operational runbooks you'll need. If you're running Postgres in production and don't have automated failover, this article is for you. A production HA Postgres setup has four layers: ┌─────────────┐ │ HAProxy │ (Virtual IP / DNS) │ Port 5432 │ └──────┬──────┘ │ ┌────────────┼────────────┐ │ │ │ ┌─────┴─────┐ ┌───┴───┐ ┌─────┴─────┐ │ pgBouncer │ │pgBouncer│ │ pgBouncer │ │ Node 1 │ │ Node 2 │ │ Node 3 │ └─────┬──────┘ └───┬───┘ └─────┬──────┘ │ │ │ ┌─────┴─────┐ ┌───┴───┐ ┌─────┴─────┐ │ Patroni + │ │Patroni│ │ Patroni + │ │ Postgres │ │ + PG │ │ Postgres │ │ (Primary) │ │(Replica)│ │ (Replica) │ └────────────┘ └────────┘ └────────────┘ │ │ │ ┌─────────────────────────────────────┐ │ etcd Cluster (3 nodes) │ └─────────────────────────────────────┘ Patroni manages the Postgres instances and handles leader election via a distributed consensus store (etcd, ZooKeeper, or Consul). When the primary fails, Patroni automatically promotes a replica and reconfigures the others to follow the new primary. pgBouncer sits between your application and Postgres, pooling connections to avoid the expensive fork-per-connection model. Each Postgres node runs its own pgBouncer instance. HAProxy (or a DNS-based solution) routes traffic to the current primary. Patroni exposes a REST API that HAProxy uses for health checks. Install Patroni on each Postgres node: pip3 install patroni[etcd] Here's a production Patroni configuration (/etc/patroni/patroni.yml): scope: postgres-cluster name: pg-node-1 restapi: listen: 0.0.0.0:8008 connect_address: 10.0.1.10:8008 etcd3: hosts: 10.0.1.20:2379,10.0.1.21:2379,10.0.1.22:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 # 1MB postgresql: use_pg_rewind: true use_slots: true parameters: wal_level: replica hot_standby: "on" max_wal_senders: 10 max_replication_slots: 10 wal_log_hints: "on" archive_mode: "on" archive_command: > pgbackrest --stanza=main archive-push %p initdb: - encoding: UTF8 - data-checksums pg_hba: - host replication replicator 10.0.1.0/24 md5 - host all all 10.0.1.0/24 md5 postgresql: listen: 0.0.0.0:5432 connect_address: 10.0.1.10:5432 data_dir: /var/lib/postgresql/16/main bin_dir: /usr/lib/postgresql/16/bin authentication: superuser: username: postgres password: "your-secure-password" replication: username: replicator password: "your-replication-password" parameters: shared_buffers: 4GB effective_cache_size: 12GB work_mem: 64MB maintenance_work_mem: 512MB max_connections: 200 checkpoint_completion_target: 0.9 wal_buffers: 64MB random_page_cost: 1.1 # SSD storage Key settings to understand: maximum_lag_on_failover: A replica won't be promoted if it's more than 1MB behind the primary. This prevents data loss but means failover might not happen if all replicas are lagging heavily. use_pg_rewind: Allows the old primary to rejoin the cluster as a replica after failover without a full base backup. This is critical for fast recovery. use_slots: Replication slots prevent the primary from removing WAL segments that replicas still need, avoiding replication breakage during network issues. Start Patroni as a systemd service: sudo systemctl enable patroni sudo systemctl start patroni Check cluster status: patronictl -c /etc/patroni/patroni.yml list Output: + Cluster: postgres-cluster ----+---------+---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +-----------+------------+---------+---------+----+-----------+ | pg-node-1 | 10.0.1.10 | Leader | running | 1 | | | pg-node-2 | 10.0.1.11 | Replica | running | 1 | 0.0 | | pg-node-3 | 10.0.1.12 | Replica | running | 1 | 0.0 | +-----------+------------+---------+---------+----+-----------+ Without connection pooling, each application connection forks a new Postgres backend process (~10MB RAM each). With 500 application connections, that's 5GB just for connection overhead. pgBouncer solves this by multiplexing thousands of application connections over a small pool of actual database connections. ; /etc/pgbouncer/pgbouncer.ini [databases] myapp = host=127.0.0.1 port=5432 dbname=myapp [pgbouncer] listen_addr = 0.0.0.0 listen_port = 6432 auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt ; Pool settings pool_mode = transaction default_pool_size = 25 min_pool_size = 5 reserve_pool_size = 5 reserve_pool_timeout = 3 ; Connection limits max_client_conn = 1000 max_db_connections = 100 ; Timeouts server_idle_timeout = 300 client_idle_timeout = 0 query_timeout = 60 query_wait_timeout = 30 ; Logging log_connections = 1 log_disconnections = 1 log_pooler_errors = 1 stats_period = 60 Critical decision: pool_mode. Transaction mode is the right default for most applications - it assigns a server connection for the duration of a transaction, then returns it to the pool. Session mode holds the connection for the entire client session (defeats the purpose of pooling). Statement mode is the most aggressive but breaks multi-statement transactions and prepared statements. Warning: Transaction pooling breaks SET commands, prepared statements that span transactions, and LISTEN/NOTIFY. If your application uses these, you'll need session pooling for those specific connections. pgBackRest is the gold standard for Postgres backups. It supports full, differential, and incremental backups with parallel compression and encryption. ; /etc/pgbackrest/pgbackrest.conf [global] repo1-type=s3 repo1-s3-bucket=your-pg-backups repo1-s3-region=us-east-1 repo1-s3-endpoint=s3.amazonaws.com repo1-path=/backups repo1-retention-full=4 repo1-retention-diff=14 repo1-cipher-type=aes-256-cbc repo1-cipher-pass=your-encryption-passphrase process-max=4 compress-type=zst compress-level=3 [main] pg1-path=/var/lib/postgresql/16/main pg1-port=5432 Create the stanza and run your first backup: # Initialize the backup repository pgbackrest --stanza=main stanza-create # Full backup pgbackrest --stanza=main --type=full backup # Differential backup (only changed since last full) pgbackrest --stanza=main --type=diff backup # Incremental backup (only changed since last backup of any type) pgbackrest --stanza=main --type=incr backup Set up a cron schedule: # Full backup every Sunday at 1 AM 0 1 * * 0 pgbackrest --stanza=main --type=full backup # Differential backup every day at 1 AM (except Sunday) 0 1 * * 1-6 pgbackrest --stanza=main --type=diff backup Point-in-time recovery is where pgBackRest shines. Because Patroni is already archiving WAL segments to pgBackRest, you can restore to any point in time: pgbackrest --stanza=main --type=time \ --target="2026-04-09 14:30:00" \ --target-action=promote restore Setting up HA means nothing if you don't test it. Here are the failure scenarios you must validate: 1. Primary node crash: # Simulate primary failure sudo systemctl stop patroni # on the primary node Expected: Within 30 seconds (TTL), Patroni promotes a replica. HAProxy health checks detect the change and route traffic to the new primary. Application sees a brief connection error, retries, and reconnects. 2. Network partition: # Isolate primary from etcd iptables -A OUTPUT -p tcp --dport 2379 -j DROP Expected: Primary can't reach etcd, Patroni demotes it to read-only. A replica with etcd access gets promoted. When network heals, the old primary uses pg_rewind to rejoin as a replica. 3. Switchover (planned maintenance): # Graceful switchover to a specific replica patronictl -c /etc/patroni/patroni.yml switchover \ --master pg-node-1 --candidate pg-node-2 --force Expected: Zero or near-zero downtime. The primary finishes in-flight transactions, transfers leadership, and becomes a replica. 4. Full cluster restore from backup: # Stop all Patroni instances # On the new primary node: pgbackrest --stanza=main --type=time \ --target="2026-04-09 10:00:00" restore # Start Patroni - it will bootstrap from the restored data sudo systemctl start patroni Run these tests quarterly. Document the results. Your future self at 3 AM will thank you. You need visibility into replication lag, connection pool utilization, and failover events. Key metrics to track: -- Replication lag (run on primary) SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes FROM pg_stat_replication; -- Connection counts SELECT state, count(*) FROM pg_stat_activity GROUP BY state; -- Long-running queries SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state != 'idle' AND now() - pg_stat_activity.query_start > interval '30 seconds'; For Prometheus monitoring, use postgres_exporter and set up alerts: # Prometheus alerting rules groups: - name: postgres rules: - alert: PostgresReplicationLag expr: pg_replication_lag_seconds > 30 for: 5m labels: severity: warning annotations: summary: "Replication lag is {{ $value }}s on {{ $labels.instance }}" - alert: PostgresConnectionPoolExhausted expr: pgbouncer_pools_server_active / pgbouncer_pools_server_max > 0.9 for: 2m labels: severity: critical annotations: summary: "pgBouncer pool >90% utilized on {{ $labels.instance }}" - alert: PatroniNoLeader expr: patroni_master == 0 for: 1m labels: severity: critical annotations: summary: "No Patroni leader detected in cluster" Building a production-grade PostgreSQL HA cluster takes time and expertise - and maintaining it takes even more. At InstaDevOps, we design, deploy, and manage database infrastructure alongside your entire DevOps stack, starting at $2,999/month. Book a free 15-minute consultation to discuss your database reliability needs: https://calendly.com/instadevops/15min
