[PG] Postgres HA and Scale

Scaling PostgreSQL

當我們的資料庫需要處理大量的資料時，我們首先需要檢查我們的資料庫是否已經達到其處理能力的極限，像是 SQL 優化或參數調整等等，如果還是沒辦法提供足夠的效能，我們就需要考慮擴充我們的資料庫。

Vertical Scaling: 增加資源，像是 CPU、Memory、Storage 等等
Horizontal Scaling: 增加節點，像是增加資料庫節點、增加資料庫叢集等等

Scaling

Streaming Replication

Streaming Replication 是 PostgreSQL 用來實現高可用性的主要機制，它透過將主節點的 WAL 日誌傳送到從節點，從而實現接近實時的資料同步。

在 postgres 中，預設的執行方式是非同步的，也就是說主節點在執行完一個 transaction 後，不需要等待從節點確認收到 WAL 日誌，就可以繼續執行下一個 transaction，這種方法的好處是效能較高，缺點是如果發生故障，可能會因為 WAL 缺少而導致資料不一致。我們也可以設置為同步，這樣主節點在執行完後，需要等待從節點確認收到 WAL 日誌，才能繼續執行，這種方法的好處是資料一致性較高且可以保證資料的完整性，缺點是效能較低。我們可以針對不同的資料(table)，來選擇不同的方式。

Streaming Replication

首先使用 initdb 來初始化資料庫集群，資料庫集群是由單個資料庫實例所管理的資料庫集合。

docker run -d --name pg-ha1 -e POSTGRES_PASSWORD=password -p 5432:5432 postgres:16-alpine

docker exec -it pg-ha1 /bin/bash

su postgres

initdb -D /tmp/primary_db/
vi /tmp/primary_db/postgresql.conf

# set listen_addresses to '*'
# set port to 5433
pg_ctl -D /tmp/primary_db start
psql --port=5433 postgres
create user requser replication;

vi /tmp/primary_db/pg_hba.conf

# add the following line
host all requser 127.0.0.1/32 trust

pg_ctl -D /tmp/primary_db restart

接著我們創建一個 replication 的資料庫

pg_basebackup -h localhost -U requser --checkpoint=fast -D /tmp/replica_db -R --slot=some_slot -C --port 5433
vi /tmp/replica_db/postgresql.conf

# set listen_addresses to '*'
# set port to 5434
pg_ctl -D /tmp/replica_db start

最後讓我們來做主節點的測試

psql postgres --port=5433

postgres=# SELECT * FROM pg_stat_replication;
-- result
-[ RECORD 1 ]----+------------------------------
pid              | 595
usesysid         | 16384
usename          | requser
application_name | walreceiver
client_addr      | ::1
client_hostname  |
client_port      | 49798
backend_start    | 2024-08-03 12:15:14.549951+00
backend_xmin     |
state            | streaming
sent_lsn         | 0/3000148
write_lsn        | 0/3000148
flush_lsn        | 0/3000148
replay_lsn       | 0/3000148
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async
reply_time       | 2024-08-03 12:33:45.648063+00

從節點的測試

psql postgres --port=5434

SELECT * FROM pg_stat_wal_receiver;
-- result
-[ RECORD 1 ]---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 594
status                | streaming
receive_start_lsn     | 0/3000000
receive_start_tli     | 1
written_lsn           | 0/3000148
flushed_lsn           | 0/3000148
received_tli          | 1
last_msg_send_time    | 2024-08-03 12:36:45.703365+00
last_msg_receipt_time | 2024-08-03 12:36:45.70343+00
latest_end_lsn        | 0/3000148
latest_end_time       | 2024-08-03 12:15:14.551133+00
slot_name             | some_slot
sender_host           | localhost
sender_port           | 5433
conninfo              | user=requser passfile=/var/lib/postgresql/.pgpass channel_binding=prefer dbname=replication host=localhost port=5433 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable

接著我們在主節點上創建一個 table，然後在從節點上查詢，看看是否能夠查詢到。

CREATE TABLE test_table (id SERIAL PRIMARY KEY, name VARCHAR(255));
INSERT INTO test_table (name) VALUES ('test');

SELECT * FROM test_table;
-- result
 id | name
----+------
  1 | test

Logical Replication

logical replication 是一種透過將 SQL 語句來進行複製的方式，這種方式的好處是資料複製的效能較高適用於網路環境較差的地方，且可以在不同版本之間進行複製，同時也可以選擇複製特定的資料，缺點是無法執行一些 DDL 的指令，像是 CREATE TABLE 或是 ALTER TABLE 等等，同時 sequence 的值默認也不會複製，這代表如果需要 failover 的話，需要另外處理 sequence 的值。

先開好 docker 的環境

docker run -d --name pg-ha2 -e POSTGRES_PASSWORD=password -p 5432:5432 postgres:16-alpine

docker exec -it pg-ha2 /bin/bash

su postgres

創建資料庫並修改參數

initdb -D /tmp/publish_db/
initdb -D /tmp/subscribe_db/

vi /tmp/publish_db/postgresql.conf

# set wal_level to logical
# set port to 5433
pg_ctl -D /tmp/publish_db start

vi /tmp/subscribe_db/postgresql.conf
# set port to 5434
pg_ctl -D /tmp/subscribe_db start

創建好表，並使用 pg_dump 來進行複製

psql postgres --port=5433

CREATE DATABASE pub;
\c pub
CREATE TABLE test_table (id SERIAL PRIMARY KEY, name VARCHAR(255));
INSERT INTO test_table (name) VALUES ('test');

psql postgres --port=5434
CREATE DATABASE sub;

pg_dump -t test_table -p 5433 -s pub | psql -p 5434 -d sub

創建 publication 和 subscription

CREATE PUBLICATION test_publication FOR TABLE test_table;
CREATE SUBSCRIPTION test_subscription CONNECTION 'host=localhost port=5433 user=postgres dbname=pub' PUBLICATION test_publication;

SELECT * FROM test_table;

接著我們可以透過 pg_stat_replication 來查看 replication 的狀態，可以關注 lag 的部分

write_lag: 主節點寫入 WAL 的時間
flush_lag: 主節點將 WAL 刷新到磁盤的時間
replay_lag: 從節點應用 WAL 的時間

SELECT * FROM pg_stat_replication;
-- result
-[ RECORD 1 ]----+------------------------------
pid              | 222
usesysid         | 10
usename          | postgres
application_name | test_subscription
client_addr      | ::1
client_hostname  |
client_port      | 36752
backend_start    | 2024-08-04 07:23:31.011034+00
backend_xmin     |
state            | streaming
sent_lsn         | 0/196EA40
write_lsn        | 0/196EA40
flush_lsn        | 0/196EA40
replay_lsn       | 0/196EA40
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async
reply_time       | 2024-08-04 07:41:12.252314+00

我們也可以使用以下指令來查看 publication 和 subscription 的狀態

\dRp
-- result
List of publications
-[ RECORD 1 ]----------------
Name       | test_publication
Owner      | postgres
All tables | f
Inserts    | t
Updates    | t
Deletes    | t
Truncates  | t
Via root   | f

\dRs
-- result
List of subscriptions
-[ RECORD 1 ]-------------------
Name        | test_subscription
Owner       | postgres
Enabled     | t
Publication | {test_publication}

PgBouncer

在正常的情況下，postgres 可以在一秒鐘處理 350 個 transaction，如果超過的話，我們可能需要考慮使用 connection pool 來進行優化，這樣可以減少每次連接資料庫時創建 process (fork) 所需要消耗的資源。 PgBouncer 本身沒有支援 multi-host 、 failover 以及 load balancing，所以如果需要這樣的功能，我們需要使用其他的工具。

pgbouncer 包含三種模式 :

session: 默認和最安全的模式，client 在連接期間，會持續保持連接，直到 session 結束
transaction: client 僅在 transaction 期間保持與資料庫的連接
statement: 每個 SQL 語句執行後，連接就會返回到池中

在 docker 執行時遇到各種權限問題，待修改 ...

docker run -d --name pg-ha3 -e POSTGRES_PASSWORD=password -p 5432:5432 postgres:16-alpine

docker exec -it --user root pg-ha3 /bin/bash

su postgres

initdb -D /tmp/pgbouncer_db/
vi /tmp/pgbouncer_db/postgresql.conf

# set port to 5433
pg_ctl -D /tmp/pgbouncer_db start

exit # exit from postgres user
apk update && apk add pgbouncer

cd etc/pgbouncer
cp pgbouncer.ini pgbouncer.ini.bak
vi pgbouncer.ini
# set test_db= dbname=postgres host=localhost port=5433 auth_user=pguser
# set logfile = /tmp/pgbouncer.log

psql -p 5433 -U postgres
CREATE USER pguser SUPERUSER;

pgbouncer pgbouncer.ini

# adjust min_pool_size、default_pool_size、max_client_conn

Partition

對於資料量大的表，不管是查詢或是 vacuum 都會需要較長的時間，這時候我們可以考慮將表進行分割提升效能。在新版的 postgres 中，partition 多使用 declarative 的方式來進行分割，而不是 inheritance 的方式。

partition 的方法大致有以下四種 :

Range Partition: 依照範圍來進行分割，像是時間、數值等等
List Partition: 依照特定的值來進行分割
Hash Partition: 依照 hash 值來進行分割
Composite Partition: 結合以上兩種或多種方式來進行分割

docker run -d --name pg-ha4 -e POSTGRES_PASSWORD=password -p 5432:5432 postgres:16-alpine

docker exec -it pg-ha4 /bin/bash

psql -U postgres -d postgres

CREATE TABLE customers (
    id INT,
    name TEXT,
    age NUMERIC
) PARTITION BY RANGE (age);

CREATE TABLE customers_young PARTITION OF customers FOR VALUES FROM (MINVALUE) TO (25);
CREATE TABLE customers_adult PARTITION OF customers FOR VALUES FROM (25) TO (65);
CREATE TABLE customers_old PARTITION OF customers FOR VALUES FROM (65) TO (MAXVALUE);

INSERT INTO customers (id, name, age) VALUES (1, 'John Doe', 20), (2, 'Jane Doe', 30), (3, 'John Smith', 60), (4, 'Jane Smith', 70);

SELECT tableoid::regclass, * FROM customers;
-- result
    tableoid     | id |    name    | age
-----------------+----+------------+-----
 customers_young |  1 | John Doe   |  20
 customers_adult |  2 | Jane Doe   |  30
 customers_adult |  3 | John Smith |  60
 customers_old   |  4 | Jane Smith |  70
(4 rows)

Sharding

sharding 是將資料分割到多個資料庫，以減輕單個資料庫的負擔。

sharding 的關鍵在於如何選擇 sharding key，我們需要依據業務場景來選擇合適的 sharding key，來保證我們的查詢或寫入盡量落在同一個資料庫中。

PostgreSQL High Availability

在生產環境中，我們需要考慮在各種不同的環境下都能穩定地提供服務，這時候我們就需要考慮高可用性的問題。

HA 主要分為三個步驟 :

replication : 主節點將資料複製到從節點，以保證資料的一致性
failover : 當主節點故障時，從節點自動升級為主節點

pg_ctl promote
pgpool II
PAF
Patroni
Repmgr

failback : 當主節點恢復正常時，從節點降級為從節點

pg rewind
pgpool II
PAF
Patroni
Repmgr

在處理 HA 之前要先定義一些問題

資料的重要性
RTO(Recovery Time Objective) && RPO(Recovery Point Objective)
auto failover 的機制
cross-center replication
do we open replication for read

解決的架構圖大致如下

HA Problem

Pgpool II

pgpool 是一個位在資料庫跟客戶端的 middleware，可以提供 load balancing、failover、connection pooling 等功能，是一個很好的 HA 解決方案。

docker 開發方法待補充...

docker run -d --name pg-ha5 -e POSTGRES_PASSWORD=password -p 5432:5432 postgres:16-alpine

docker exec -it pg-ha5 /bin/bash

su postgres

接著需要設定 streaming application，如前所示再來安裝 pgpool

apk update && apk add pgpool-ii

Scaling PostgreSQL​

Streaming Replication​

Logical Replication​

PgBouncer​

Partition​

Sharding​

PostgreSQL High Availability​

Pgpool II​

Reference​

Scaling PostgreSQL

Streaming Replication

Logical Replication

PgBouncer

Partition

Sharding

PostgreSQL High Availability

Pgpool II

Reference