PostgreSQL 事务ID回卷问题分析
概述
PG的 MVCC 机制通过保存数据行多个历史版本并通过记录上的事务 ID 元组(xmin,xmax) 来控制不同版本记录的可见性。在 PG里,事务 ID(XID)是 32 位无符号,顺序生成,单调递增。当到达大值(2^32-1) 后又开始从 3 开始使用。这种现象一般称为事务 ID 回卷(Wraparound)。单纯看这个机制可能看出历史事务修改的记录的可见性可能发生反转,一个记录从“历史记录”变成“未来记录”,数据就莫名其妙的“丢失”了。PG 数据库在碰到这种情形下会直接拒绝服务(报错如下)。
ERROR: database is not accepting commands to avoid wraparound data loss in database "devops"
HINT: Stop the postmaster and vacuum that database in single-user mode.
PG 事务特性
事务 ID 类型
pgtest=# select locktype, database,transactionid,virtualtransaction,pid,mode,granted from pg_locks where transactionid <> '' ;
locktype | database | transactionid | virtualtransaction | pid | mode | granted
---------------+----------+---------------+--------------------+------+---------------+---------
transactionid | | 2146496507 | 5/12667 | 2600 | ExclusiveLock | t
(1 row)
事务 ID 比较
/*
* TransactionIdPrecedes --- is id1 logically < id2?
*/
bool TransactionIdPrecedes(TransactionId id1, TransactionId id2) // 结果返回一个bool值
{
/*
* If either ID is a permanent XID then we can just do unsigned
* comparison. If both are normal, do a modulo-2^32 comparison.
*/
int32 diff;
if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2)) //若其中一个不是普通id,则其一定较新(较大)
return (id1 < id2);
diff = (int32) (id1 - id2);
return (diff < );
}
PG 表的数据更新
xmin:事务 ID,记录插入时的事务 ID。 xmax:事务 ID,记录删除或更新时的事务 ID。更新操作对应的删除和插入两笔记录。如果为 0,表示还没有删除或者删除事务还没有提交或者已回滚。删除也不会真的删除元组。 ctid:表示元组在表中的物理位置。
pgtest=# create table t1(id bigint not null primary key, c1 varchar(50));
CREATE TABLE
pgtest=# insert into t1 values(1,'A') returning xmin;
xmin
------
1645
(1 row)
INSERT 1
pgtest=# delete from t1 returning xmax;
xmax
------
1646
(1 row)
DELETE 1
pgtest=# insert into t1 values(2,'B') returning xmin;
xmin
------
1647
(1 row)
INSERT 1
pgtest=# update t1 set c1=c1||'*' where id=2 returning xmax;
xmax
------
(1 row)
UPDATE 1
pgtest=# update t1 set c1=c1||'*' where id=2 returning xmax;
xmax
------
(1 row)
UPDATE 1
查看当前会话的事务 ID,方法如下。数据库里所有会话的事务 ID 都是单调递增不重复的。
pgtest=# SELECT pg_current_xact_id();
pg_current_xact_id
--------------------
1664
(1 row)
pgtest=# SELECT pg_current_xact_id();
pg_current_xact_id
--------------------
1665
(1 row)
pgtest=# CREATE EXTENSION pageinspect;
CREATE EXTENSION
SELECT lp, t_ctid AS ctid,
t_xmin AS xmin,
t_xmax AS xmax,
to_hex(t_infomask) AS infomask,
to_hex(t_infomask2) AS infomask2,
t_attrs AS attrs
FROM heap_page_item_attrs(get_raw_page('t1', ), 't1')
LIMIT 10;
lp | ctid | xmin | xmax | infomask | infomask2 | attrs
----+-------+------+------+----------+-----------+---------------------------------------
1 | (0,1) | 1645 | 1646 | 502 | 2002 | {"\\x0100000000000000","\\x0541"}
2 | (0,3) | 1647 | 1662 | 502 | 4002 | {"\\x0200000000000000","\\x0542"}
3 | (0,4) | 1662 | 1663 | 2102 | c002 | {"\\x0200000000000000","\\x07422a"}
4 | (0,4) | 1663 | 0 | 2802 | 8002 | {"\\x0200000000000000","\\x09422a2a"}
(4 rows)
冻结事务 ID
pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age, pg_current_xact_id() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1');
oid | relname | relfrozenxid | age | pg_current_xact_id
-------+---------+--------------+-------+--------------------
18163 | t1 | 1644 | 11895 | 13539
(1 row)
pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age, pg_current_xact_id() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1');
oid | relname | relfrozenxid | age | pg_current_xact_id
-------+---------+--------------+-------+--------------------
18163 | t1 | 1644 | 11905 | 13549
(1 row)
SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as age
from pg_class c left join pg_class t on c.reltoastrelid = t.oid
where c.relkind in ('r', 'm') and c.relname like '%t1%'
order by age desc ;
VACUUM 操作
select name, setting, short_desc from pg_settings where name in ('autovacuum','autovacuum_freeze_max_age','log_autovacuum_min_duration','vacuum_failsafe_age','vacuum_freeze_min_age','vacuum_freeze_table_age');
pgtest=# select name, setting, short_desc from pg_settings where name in ('autovacuum','autovacuum_freeze_max_age','log_autovacuum_min_duration','vacuum_failsafe_age','vacuum_freeze_min_age','vacuum_freeze_table_age');
name | setting | short_desc
-----------------------------+------------+--------------------------------------------------------------------------------
autovacuum | off | Starts the autovacuum subprocess.
autovacuum_freeze_max_age | 200000000 | Age at which to autovacuum a table to prevent transaction ID wraparound.
log_autovacuum_min_duration | | Sets the minimum execution time above which autovacuum actions will be logged.
vacuum_failsafe_age | 1600000000 | Age at which VACUUM should trigger failsafe to avoid a wraparound outage.
vacuum_freeze_min_age | 50000000 | Minimum age at which VACUUM should freeze a table row.
vacuum_freeze_table_age | 150000000 | Age at which VACUUM should scan whole table to freeze tuples.
(6 rows)
autovacuum:表示是否开启自动 vacuum
操作。注意,即使是设置为off
,为了避免事务 ID回卷带来大问题,PG依然会在合适的时机开启autovacuum
。vacuum_freeze_min_age:PG 认为事务年龄超过这个的就应该针对部分记录做 freeze
操作。vacuum_freeze_table_age:PG 认为事务年龄超过这个的就应该扫描全表做 freeze
操作。autovacuum_freeze_max_age:PG 认为事务年龄超过这个(默认2亿)就自动对表发起 vacuum
操作。这是强制的。PG 还有别的参数会尽可能降低自动vacuum
操作对数据库读写的性能影响。vacuum_failsafe_age:PG 认为事务年龄超过这个(默认值16亿)就强制对表发起不计成本(不考虑对数据库读写的性能影响,以及减少不必要的索引清理任务)的 vacuum
操作。因为此时再不及时做vacuum
,后面就危险了。这是 PG 14 增加的参数,是后一道保障。log_autovacuum_min_duration:是否在 PG 日志里记录 vacuum
信息。默认是-1
不记录,这里为了展示设置为 0(记录所有的vacuum
信息)。
PG 事务 ID 回卷测试
autovacuum
postgres@MQBOOK:~$ pg_resetwal -x 2097483648 -D 15/main
Write-ahead log reset
postgres@MQBOOK:~$ sudo systemctl restart postgresql@15-main
Job for postgresql@15-main.service failed because the service did not take the steps required by its unit configuration.
See "systemctl status postgresql@15-main.service" and "journalctl -xeu postgresql@15-main.service" for details.
postgres@MQBOOK:~/15/main/log$ tail -n 50 -f postgresql-2023-04-17_152517.log
2023-04-17 15:25:17.823 CST [35990] LOG: starting PG15.2 (Ubuntu 15.2-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit
2023-04-17 15:25:17.823 CST [35990] LOG: listening on IPv4 address "127.0.0.1", port 25432
2023-04-17 15:25:17.832 CST [35990] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.25432"
2023-04-17 15:25:17.852 CST [35994] LOG: database system was shut down at 2023-04-17 15:24:56 CST
2023-04-17 15:25:17.853 CST [35994] FATAL: could not access status of transaction 2097483648
2023-04-17 15:25:17.853 CST [35994] DETAIL: Could not open file "pg_xact/07D0": No such file or directory.
2023-04-17 15:25:17.855 CST [35990] LOG: startup process (PID 35994) exited with exit code 1
2023-04-17 15:25:17.855 CST [35990] LOG: aborting startup due to startup process failure
2023-04-17 15:25:17.857 CST [35990] LOG: database system is shut down
查看 PG的 pg_xact 目录,确实没有那个日志。这个是正常的, 因为事务 ID 是直接修改过来的,不是靠事务推进的。
postgres@MQBOOK:~/15/main/pg_xact$ ls -lrth
total 8.K
-rw------- 1 postgres postgres 8.0K Apr 17 15:21 0000
postgres@MQBOOK:~/15/main/pg_xact$ file 0000
0000: ASCII text, with very long lines (3406), with no line terminators
postgres@MQBOOK:~/15/main/pg_xact$
postgres@MQBOOK:~$ dd if=/dev/zero of=15/main/pg_xact/07D bs=8192 count=20
20+ records in
20+ records out
163840 bytes (164 kB, 160 KiB) copied, .000231961 s, 706 MB/s
postgres@MQBOOK:~$ sudo systemctl start postgresql@15-main
pgtest=# SELECT lp, t_ctid AS ctid,
t_xmin AS xmin,
t_xmax AS xmax,
to_hex(t_infomask) AS infomask,
to_hex(t_infomask2) AS infomask2,
t_attrs AS attrs
FROM heap_page_item_attrs(get_raw_page('t1', ), 't1')
LIMIT 10;
lp | ctid | xmin | xmax | infomask | infomask2 | attrs
----+-------+------+------+----------+-----------+---------------------------------------
1 | | | | | |
2 | | | | | |
3 | | | | | |
4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"}
(4 rows)
pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age, pg_current_xact_id() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1');
oid | relname | relfrozenxid | age | pg_current_xact_id
-------+---------+--------------+-----+--------------------
18163 | t1 | 2097483657 | 0 | 2097483657
(1 row)
pgtest=#
2023-04-17 15:43:44.470 CST [36099] WARNING: bypassing nonessential maintenance of table "pgtest.public.t1" as a failsafe after 0 index scans
2023-04-17 15:43:44.470 CST [36099] DETAIL: The table's relfrozenxid or relminmxid is too far in the past.
2023-04-17 15:43:44.470 CST [36099] HINT: Consider increasing configuration parameter "maintenance_work_mem" or "autovacuum_work_mem".
You might also need to consider other ways for VACUUM to keep up with the allocation of transaction IDs.
2023-04-17 15:43:44.471 CST [36099] LOG: automatic aggressive vacuum to prevent wraparound of table "pgtest.public.t1": index scans:
pages: removed, 1 remain, 1 scanned (100.00% of total)
tuples: 3 removed, 1 remain, are dead but not yet removable
removable cutoff: 2097483657, which was XIDs old when operation ended
new relfrozenxid: 2097483657, which is 2097482013 XIDs ahead of previous value
new relminmxid: 193, which is 30 MXIDs ahead of previous value
index scan bypassed by failsafe: 1 pages from table (100.00% of total) have 1 dead item identifiers
I/O timings: read: 0.169 ms, write: 0.000 ms
avg read rate: 52.301 MB/s, avg write rate: 32.688 MB/s
buffer usage: 30 hits, 8 misses, 5 dirtied
WAL usage: 3 records, 2 full page images, 1604 bytes
system usage: CPU: user: .00 s, system: .00 s, elapsed: .00 s
pgtest=# insert into t1 values(3,'C');
INSERT 1
pgtest=# insert into t1 values(1,'A*') returning xmin;
xmin
------------
2097483705
(1 row)
INSERT 1
pgtest=# select xmin, xmax, cmin, cmax, ctid from t1;
xmin | xmax | cmin | cmax | ctid
------------+------+------+------+-------
2097483696 | 0 | | 0 | (,3)
1663 | 0 | | 0 | (,4)
2097483705 | 0 | | 0 | (,5)
(3 rows)
lp | ctid | xmin | xmax | infomask | infomask2 | attrs
----+-------+------------+------+----------+-----------+---------------------------------------
1 | | | | | |
2 | | | | | |
3 | (0,3) | 2097483696 | 0 | 902 | 2 | {"\\x0300000000000000","\\x0543"}
4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"}
5 | (0,5) | 2097483705 | 0 | 902 | 2 | {"\\x0100000000000000","\\x07412a"}
(5 rows)
postgres@MQBOOK:~$ sudo systemctl stop postgresql@15-main
postgres@MQBOOK:~$
postgres@MQBOOK:~$ pg_resetwal -x 2146483648 -D 15/main/
Write-ahead log reset
postgres@MQBOOK:~$ sudo systemctl restart postgresql@15-main
postgres@MQBOOK:~/15/main/log$ tail -f postgresql-2023-04-17_161411.log
2023-04-17 16:14:11.591 CST [36159] LOG: starting PG15.2 (Ubuntu 15.2-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.-1ubuntu1~22.04) 11.3., 64-bit
2023-04-17 16:14:11.592 CST [36159] LOG: listening on IPv4 address "127.0.0.1", port 25432
2023-04-17 16:14:11.601 CST [36159] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.25432"
2023-04-17 16:14:11.623 CST [36163] LOG: database system was shut down at 2023-04-17 16:13:39 CST
2023-04-17 16:14:11.624 CST [36163] FATAL: could not access status of transaction 2146483648
2023-04-17 16:14:11.624 CST [36163] DETAIL: Could not open file "pg_xact/07FF": No such file or directory.
2023-04-17 16:14:11.625 CST [36159] LOG: startup process (PID 36163) exited with exit code 1
2023-04-17 16:14:11.625 CST [36159] LOG: aborting startup due to startup process failure
2023-04-17 16:14:11.627 CST [36159] LOG: database system is shut down
postgres@MQBOOK:~/15/main/pg_xact$ dd if=/dev/zero of=~/15/main/pg_xact/07FF bs=8192 count=10
10+ records in
10+ records out
81920 bytes (82 kB, 80 KiB) copied, .000203643 s, 402 MB/s
查看表的页记录。
lp | ctid | xmin | xmax | infomask | infomask2 | attrs
----+-------+------------+------+----------+-----------+---------------------------------------
1 | | | | | |
2 | | | | | |
3 | (0,3) | 2097483696 | 0 | 902 | 2 | {"\\x0300000000000000","\\x0543"}
4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"}
5 | (0,5) | 2097483705 | 0 | 902 | 2 | {"\\x0100000000000000","\\x07412a"}
(5 rows)
pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age, pg_current_xact_id() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1');
oid | relname | relfrozenxid | age | pg_current_xact_id
-------+---------+--------------+----------+--------------------
18163 | t1 | 2097483657 | 49000000 | 2146483657
(1 row)
观察发现,除了表的年龄变大了,接近 vacuum_freeze_min_age
值,不够出发 vacuum freeze
条件,所以并没有变化。
继续推进事务 ID 到 4092967296 ,比上次事务 ID 前进了 接近 20亿,离 2^32 还差 2亿以上 。重启实例。
postgres@MQBOOK:~$ pg_resetwal -x 4092967296 -D 15/main/
Write-ahead log reset
postgres@MQBOOK:~$ dd if=/dev/zero of=15/main/pg_xact/0F3F bs=8192 count=20
20+ records in
20+ records out
163840 bytes (164 kB, 160 KiB) copied, .000228365 s, 717 MB/s
postgres@MQBOOK:~$ sudo systemctl start postgresql@15-main
此时表的年龄已经达到 autovacuum_freeze_max_age
和 vacuum_failsafe_age
阈值,表也是立即进行 aggresive vacuum
。
10216 2023-04-17 17:13:34.725 CST [729] WARNING: bypassing nonessential maintenance of table "pgtest.public.t1" as a failsafe after 0 index scans
10217 2023-04-17 17:13:34.725 CST [729] DETAIL: The table's relfrozenxid or relminmxid is too far in the past.
10218 2023-04-17 17:13:34.725 CST [729] HINT: Consider increasing configuration parameter "maintenance_work_mem" or "autovacuum_work_mem".
10219 You might also need to consider other ways for VACUUM to keep up with the allocation of transaction IDs.
10220 2023-04-17 17:13:34.730 CST [729] LOG: automatic aggressive vacuum to prevent wraparound of table "pgtest.public.t1": index scans:
10221 pages: removed, 1 remain, 1 scanned (100.00% of total)
10222 tuples: removed, 3 remain, are dead but not yet removable
10223 removable cutoff: 4092967303, which was XIDs old when operation ended
10224 new relfrozenxid: 4092967303, which is 1995483646 XIDs ahead of previous value
10225 new relminmxid: 211, which is 18 MXIDs ahead of previous value
10226 index scan bypassed by failsafe: 1 pages from table (100.00% of total) have 1 dead item identifiers
10227 I/O timings: read: 7.039 ms, write: 0.000 ms
10228 avg read rate: 8.431 MB/s, avg write rate: 5.269 MB/s
10229 buffer usage: 30 hits, 8 misses, 5 dirtied
10230 WAL usage: 2 records, 2 full page images, 1270 bytes
10231 system usage: CPU: user: .00 s, system: .00 s, elapsed: .00 s
查看表的页和年龄,发生了变化。
lp | ctid | xmin | xmax | infomask | infomask2 | attrs
----+-------+------------+------+----------+-----------+---------------------------------------
1 | | | | | |
2 | | | | | |
3 | (0,3) | 2097483696 | 0 | b02 | 2 | {"\\x0300000000000000","\\x0543"}
4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"}
5 | (0,5) | 2097483705 | 0 | b02 | 2 | {"\\x0100000000000000","\\x07412a"}
(5 rows)
pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age, pg_current_xact_id() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1');
oid | relname | relfrozenxid | age | pg_current_xact_id
-------+---------+--------------+-----+--------------------
18163 | t1 | 4092967303 | 0 | 4092967303
(1 row)
pgtest=# select xmin, xmax, cmin, cmax, ctid from t1;
xmin | xmax | cmin | cmax | ctid
------------+------+------+------+-------
2097483696 | 0 | | 0 | (,3)
1663 | 0 | | 0 | (,4)
2097483705 | 0 | | 0 | (,5)
(3 rows)
事务 ID 回卷
接下来模拟事务 ID 回卷,将事务 ID 推进到 3 。相比上次事务 ID 大概前进了 2亿多。表的年龄达到参数 autovacuum_freeze_max_age
值,触发自动的 vacuum
。
postgres@MQBOOK:~$ pg_resetwal -x 3 -D 15/main/
Write-ahead log reset
postgres@MQBOOK:~$ dd if=/dev/zero of=15/main/pg_xact/0000 bs=8192 count=20
20+ records in
20+ records out
163840 bytes (164 kB, 160 KiB) copied, .000218186 s, 751 MB/s
postgres@MQBOOK:~$ sudo systemctl start postgresql@15-main
7714 2023-04-17 17:18:27.632 CST [833] LOG: automatic aggressive vacuum to prevent wraparound of table "pgtest.public.t1": index scans: 1 7715 pages: removed, 1 remain, 1 scanned (100.00% of total)
7716 tuples: removed, 3 remain, are dead but not yet removable
7717 removable cutoff: 10, which was XIDs old when operation ended
7718 new relfrozenxid: 10, which is 202000003 XIDs ahead of previous value
7719 new relminmxid: 213, which is 2 MXIDs ahead of previous value
7720 index scan needed: 1 pages from table (100.00% of total) had 1 dead item identifiers removed
7721 index "t1_pkey": pages: 2 in total, newly deleted, currently deleted, reusable
7722 I/O timings: read: 0.040 ms, write: 0.000 ms
7723 avg read rate: 431.630 MB/s, avg write rate: 172.652 MB/s
7724 buffer usage: 36 hits, 10 misses, 4 dirtied
7725 WAL usage: 5 records, 4 full page images, 9865 bytes
7726 system usage: CPU: user: .00 s, system: .00 s, elapsed: .00 s
查看表的页和年龄,以及继续更新表,即使事务 ID 回卷了,表都能正常使用。
pgtest=# select xmin, xmax, cmin, cmax, ctid from t1;
xmin | xmax | cmin | cmax | ctid
------------+------+------+------+-------
2097483696 | 0 | | 0 | (,3)
1663 | 0 | | 0 | (,4)
2097483705 | 0 | | 0 | (,5)
(3 rows)
pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age, pg_current_xact_id() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1');
oid | relname | relfrozenxid | age | pg_current_xact_id
-------+---------+--------------+-----+--------------------
18163 | t1 | 10 | 0 | 10
(1 row)
lp | ctid | xmin | xmax | infomask | infomask2 | attrs
----+-------+------------+------+----------+-----------+---------------------------------------
1 | | | | | |
2 | | | | | |
3 | (0,3) | 2097483696 | 0 | b02 | 2 | {"\\x0300000000000000","\\x0543"}
4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"}
5 | (0,5) | 2097483705 | 0 | b02 | 2 | {"\\x0100000000000000","\\x07412a"}
(5 rows)
pgtest=# insert into t1 values(4,'D');
INSERT 1
pgtest=# update t1 set c1=c1||'*' where id=3;
UPDATE 1
pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age, pg_current_xact_id() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1');
oid | relname | relfrozenxid | age | pg_current_xact_id
-------+---------+--------------+-----+--------------------
18163 | t1 | 10 | 12 | 22
(1 row)
lp | ctid | xmin | xmax | infomask | infomask2 | attrs
----+-------+------------+------+----------+-----------+---------------------------------------
1 | (0,1) | 12 | 0 | 902 | 2 | {"\\x0400000000000000","\\x0544"}
2 | | | | | |
3 | (0,6) | 2097483696 | 21 | 302 | 4002 | {"\\x0300000000000000","\\x0543"}
4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"}
5 | (0,5) | 2097483705 | 0 | b02 | 2 | {"\\x0100000000000000","\\x07412a"}
6 | (0,6) | 21 | 0 | 2802 | 8002 | {"\\x0300000000000000","\\x07432a"}
(6 rows)
事务 ID 冻结失败
整个 PG 的事务 ID 随着业务读写不断推进,同时 PG 内部的 vacuum
机制也会在适当的时候被触发 对老的事务 ID 进行冻结回收。正常情况下,PG 是不会碰到文章开头说的那种报错。
如果碰到了,可能是短事务的 TPS 和只读查询的 QPS 非常高,事务 ID 消耗速度非常快,同时表的数据量非常大, vacuum
速度非常慢并且由于未知原因报错中断了。可能性都出在这个未知原因上。如:日志文件损坏、有未决的 prepared transaction
或者有长事务一直锁表未提交等等。
下面就通过模拟存在 prepared transaction
来阻碍 autovacuum
。
pgtest=# BEGIN;
BEGIN
pgtest=*# select pg_current_xact_id();
pg_current_xact_id
--------------------
55
(1 row)
pgtest=*# lock t1 in share update exclusive mode;
LOCK TABLE
pgtest=*# PREPARE TRANSACTION 'trx01';
PREPARE TRANSACTION
然后关闭数据库。一般的事务就自动回滚了,但这个 prepared transaction
会一直存在。这个在 PG 里可能会是一个很大的风险。以后碰到再专门讨论。
postgres@MQBOOK:~$ sudo systemctl stop postgresql@15-main
postgres@MQBOOK:~$
postgres@MQBOOK:~$ pg_resetwal -x 2137483648 -D 15/main/
Write-ahead log reset
postgres@MQBOOK:~$ dd if=/dev/zero of=15/main/pg_xact/07F6 bs=8192 count=15
15+ records in
15+ records out
122880 bytes (123 kB, 120 KiB) copied, .00023577 s, 521 MB/s
此时 PG 依然启动失败。查看 PG 的日志看到了文章开头的那个报错了。
postgres@MQBOOK:~$ sudo systemctl start postgresql@15-main
10787 2023-04-17 17:39:27.560 CST [972] WARNING: database "pgtest" must be vacuumed within 9999999 transactions
10788 2023-04-17 17:39:27.560 CST [972] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
10789 You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
此时因为事务 ID 还有部分可用,数据库还可以读写。只是读写过程都会提示要尽快做冻结操作。按道理,PG 会自动做 vacuum
, 但是这个表由于前面的 prepared transaction
影响,应该是 vacuum
失败了。
pgtest=# select pg_current_xact_id();
WARNING: database "pgtest" must be vacuumed within 9999999 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
pg_current_xact_id
--------------------
2137483658
(1 row)
pgtest=# insert into t1 values(7,'E');
WARNING: database "pgtest" must be vacuumed within 9999997 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
INSERT 1
pgtest=# SELECT lp, t_ctid AS ctid,
t_xmin AS xmin,
t_xmax AS xmax,
to_hex(t_infomask) AS infomask,
to_hex(t_infomask2) AS infomask2,
t_attrs AS attrs
FROM heap_page_item_attrs(get_raw_page('t1', ), 't1')
LIMIT 10;
lp | ctid | xmin | xmax | infomask | infomask2 | attrs
----+-------+------------+------+----------+-----------+---------------------------------------
1 | (0,1) | 12 | 0 | 902 | 2 | {"\\x0400000000000000","\\x0544"}
2 | | | | | |
3 | (0,6) | 2097483696 | 21 | 702 | 4002 | {"\\x0300000000000000","\\x0543"}
4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"}
5 | (0,5) | 2097483705 | 0 | b02 | 2 | {"\\x0100000000000000","\\x07412a"}
6 | (0,6) | 21 | 0 | 2902 | 8002 | {"\\x0300000000000000","\\x07432a"}
7 | (0,7) | 2137483660 | 0 | 802 | 2 | {"\\x0700000000000000","\\x0545"}
(7 rows)
再继续推进事务 ID 到 2^31 。
postgres@MQBOOK:~$ sudo systemctl stop postgresql@15-main
postgres@MQBOOK:~$
postgres@MQBOOK:~$ pg_resetwal -x 2146483648 -D 15/main/
Write-ahead log reset
postgres@MQBOOK:~$ dd if=/dev/zero of=15/main/pg_xact/07FF bs=8192 count=15
15+ records in
15+ records out
122880 bytes (123 kB, 120 KiB) copied, .000276528 s, 444 MB/s
再次启动 PG 实例,果然 PG 就直接罢工了。
postgres@MQBOOK:~$ sudo systemctl start postgresql@15-main
2023-04-17 17:44:38.270 CST [1027] LOG: POWA connected to database powa
2023-04-17 17:44:38.280 CST [1027] ERROR: Failed to lock the powa_snapshot_metas record:
state : 54000
message: database is not accepting commands to avoid wraparound data loss in database "pgtest"
detail :
hint : Stop the postmaster and vacuum that database in single-user mod
查看一下数据库此时的库表的年龄。
WITH max_age AS (
SELECT 2000000000 as max_old_xid
, setting AS autovacuum_freeze_max_age
FROM pg_catalog.pg_settings
WHERE name = 'autovacuum_freeze_max_age' )
, per_database_stats AS (
SELECT datname
, m.max_old_xid::int
, m.autovacuum_freeze_max_age::int
, age(d.datfrozenxid) AS oldest_current_xid
FROM pg_catalog.pg_database d
JOIN max_age m ON (true)
WHERE d.datallowconn )
select max(oldest_current_xid) AS oldest_current_xid
, max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS percent_towards_wraparound
, max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float))) AS percent_towards_emergency_autovac
FROM per_database_stats
;
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
2146483638 | 107 | 1073
pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1');
oid | relname | relfrozenxid | age
-------+---------+--------------+------------
18163 | t1 | 10 | 2146483638
(1 row)
lp | ctid | xmin | xmax | infomask | infomask2 | attrs
----+-------+------------+------+----------+-----------+---------------------------------------
1 | (0,1) | 12 | 0 | 902 | 2 | {"\\x0400000000000000","\\x0544"}
2 | | | | | |
3 | (0,6) | 2097483696 | 21 | 702 | 4002 | {"\\x0300000000000000","\\x0543"}
4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"}
5 | (0,5) | 2097483705 | 0 | b02 | 2 | {"\\x0100000000000000","\\x07412a"}
6 | (0,6) | 21 | 0 | 2902 | 8002 | {"\\x0300000000000000","\\x07432a"}
7 | (0,7) | 2137483660 | 0 | 902 | 2 | {"\\x0700000000000000","\\x0545"}
(7 rows)
pgtest=# insert into t1 values(8,'G') returning xmin;
ERROR: database is not accepting commands to avoid wraparound data loss in database "pgtest"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
pgtest=#
pgtest=# vacuum freeze t1;
表的年龄达到了 2^31 了。不允许再开启新的事务。
此时手动对表发起 vacuum freeze
会被阻塞,查看一下 PG 的阻塞情况。
SELECT blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS current_statement_in_blocking_process
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
blocked_pid | blocked_user | blocking_pid | blocking_user | blocked_statement | current_statement_in_blocking_process
-------------+--------------+--------------+---------------+------------------------------------------------------+------------------------------------------------------
1030 | | 1075 | postgres | autovacuum: VACUUM public.t1 (to prevent wraparound) | vacuum freeze t1;
1075 | postgres | 1030 | | vacuum freeze t1; | autovacuum: VACUUM public.t1 (to prevent wraparound)
(2 rows)
select locktype, database , virtualtransaction , pid, mode, waitstart from pg_locks where database = (select oid from pg_database where datname = 'pgtest') and mode like '%Exclusive%';
locktype | database | virtualtransaction | pid | mode | waitstart
----------+----------+--------------------+------+--------------------------+-------------------------------
relation | 18153 | -1/55 | | ShareUpdateExclusiveLock |
relation | 18153 | 4/4 | 1030 | ShareUpdateExclusiveLock | 2023-04-17 17:44:38.280852+08
(2 rows)
发现 PG 已经开始对表开始 autovacuum
,但锁一直不释放,明显是遇到问题了。这里看不到阻塞的根本原因(前面模拟的 prepared transaction
)。有关这种事务特点以后再分析。
pgtest=# rollback prepared 'trx01';
ROLLBACK PREPARED
pgtest=# select locktype, database , virtualtransaction , pid, mode, waitstart from pg_locks where database = (select oid from pg_database where datname = 'pgtest') and mode like '%Exclusive%';
locktype | database | virtualtransaction | pid | mode | waitstart
----------+----------+--------------------+-----+------+-----------
( rows)
总结
本文分析解释了 PG 的事务 ID 的使用和回卷现象特点以及实际问题特征和解决方法。PG 14/15 相比 PG 9/10/11/12 在处理这个问题的 vacuum
策略上更加丰富灵活,所以出现事务 ID 用尽的概率是非常低的。一旦出现,结果很严重,排查也比较麻烦。首先看文件是否损坏、然后看是否有未提交或未决事务(如 prepared transaction
) 存在等等。如果平时表特别大,建议适当调整 vacuum
参数改变 vacuum
的性能和效果。有关 vacuum
的详细调优方法以后碰到再分享。
相关文章