数据库block corruption的检测和恢复模拟
在生产库中一旦出现坏块就需要及时处理,比如需要确认坏块影响对象范围,对应数据文件是否备份,是否有足够旧的归档,针对不同对象如何恢复,本文给出一个完整的模拟过程,后续讨论了几种坏块恢复方式,以及文件是否备份,归档等的处理方法。我们做过这样的验证,手里有对应的脚本,再遇到类似问题就心里不慌了。
1 模拟环境准备
创建表空间
SYS@orcl2>create tablespace corp_tbs datafile '/oracle/db/corp_tbs.dbf' size 10m;
Tablespace created.
创建表
SYS@orcl2>create table scott.corp_tab tablespace corp_tbs as select * from scott.dept;
Table created.
Elapsed: 00:00:00.19
检查表段块分配情况
SYS@orcl2>select dbms_rowid.rowid_block_number(rowid),dbms_rowid.rowid_relative_fno(rowid) from scott.corp_tab;
DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID)
------------------------------------ ------------------------------------
139 11
139 11
139 11
139 11
说明该表仅仅使用了一块,就是11号文件的139号数据块
SYS@orcl2> select file_name,tablespace_name from dba_data_files where file_id=11
FILE_NAME TABLESPACE_NAME
------------------------------ ------------------------------
/oracle/db/corp_tbs.dbf CORP_TBS
2 模拟破坏数据块,我们使用vim编辑数据文件,随机插入几个字符
RMAN> recover datafile 11 block 139 clear;
Starting recover at 08-SEP-20
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=30 instance=orcl2 device type=DISK
Finished recover at 08-SEP-20
3 对块中的表数据做一个全表扫描,触发错误
测试库为11.2.0.4该版本数据库在告警日志中已经有非常丰富的信息
Hex dump of (file 11, block 139) in trace file /oracle/db/diag/rdbms/orcl/orcl2/trace/orcl2_ora_27567.trc
Corrupt block relative dba: 0x02c0008b (file 11, block 139)
Bad header found during buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x02c0008b
last change scn: 0x0000.0050188a seq: 0x1 flg: 0x04
spare1: 0x91 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x188a0601
check value in block header: 0xf626
computed block checksum: 0x5891
Reading datafile '/oracle/db/corp_tbs.dbf' for corruption at rdba: 0x02c0008b (file 11, block 139)
Reread (file 11, block 139) found same corrupt data (no logical check)
Tue Sep 08 22:50:57 2020
Corrupt Block Found
TSN = 10, TSNAME = CORP_TBS
RFN = 11, BLK = 139, RDBA = 46137483
OBJN = 16186, OBJD = 16187, OBJECT = CORP_TAB, SUBOBJECT =
Errors in file /oracle/db/diag/rdbms/orcl/orcl2/trace/orcl2_ora_27567.trc (incident=2140394):
ORA-01578: ORACLE data block corrupted (file # 11, block # 139) <<<<<<<<<<ORA-1578错误出现了
ORA-01110: data file 11: '/oracle/db/corp_tbs.dbf'
SEGMENT OWNER = SCOTT, SEGMENT TYPE = Table Segment
Incident details in: /oracle/db/diag/rdbms/orcl/orcl2/incident/incdir_2140394/orcl2_ora_27567_i2140394.trc
Tue Sep 08 22:51:01 2020
Sweep [inc][2140394]: completed
Tue Sep 08 22:51:01 2020
Dumping diagnostic data in directory=[cdmp_20200908225101], requested by (instance=2, osid=27567), summary=[incident=2140394].
Hex dump of (file 11, block 139) in trace file /oracle/db/diag/rdbms/orcl/orcl2/incident/incdir_2140394/orcl2_m000_29080_i2140394_a.trc
Corrupt block relative dba: 0x02c0008b (file 11, block 139)
Bad header found during validation
Data in bad block:
type: 6 format: 2 rdba: 0x02c0008b
last change scn: 0x0000.0050188a seq: 0x1 flg: 0x04
spare1: 0x91 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x188a0601
check value in block header: 0xf626
computed block checksum: 0x5891
Reread of blocknum=139, file=/oracle/db/corp_tbs.dbf. found same corrupt data
Reread of blocknum=139, file=/oracle/db/corp_tbs.dbf. found same corrupt data
Reread of blocknum=139, file=/oracle/db/corp_tbs.dbf. found same corrupt data
Reread of blocknum=139, file=/oracle/db/corp_tbs.dbf. found same corrupt data
Reread of blocknum=139, file=/oracle/db/corp_tbs.dbf. found same corrupt data
涉及文件号,块号、坏块中涉及的对象
SYS@orcl2>select * from V$DATABASE_BLOCK_CORRUPTION;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
11 139 1 5249162 CORRUPT
Elapsed: 00:00:00.03
通过如下脚本检查更详细的坏块涉及的对象信息
set pagesize 2000
set linesize 280
SELECT e.owner, e.segment_type, e.segment_name, e.partition_name, c.file#
, greatest(e.block_id, c.block#) corr_start_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1) corr_end_block#
, least(e.block_id+e.blocks-1, c.block#+c.blocks-1)
- greatest(e.block_id, c.block#) + 1 blocks_corrupted
, corruption_type description
FROM dba_extents e, v$database_block_corruption c
WHERE e.file_id = c.file#
AND e.block_id <= c.block# + c.blocks - 1
AND e.block_id + e.blocks - 1 >= c.block#
UNION
SELECT s.owner, s.segment_type, s.segment_name, s.partition_name, c.file#
, header_block corr_start_block#
, header_block corr_end_block#
, 1 blocks_corrupted
, corruption_type||' Segment Header' description
FROM dba_segments s, v$database_block_corruption c
WHERE s.header_file = c.file#
AND s.header_block between c.block# and c.block# + c.blocks - 1
UNION
SELECT null owner, null segment_type, null segment_name, null partition_name, c.file#
, greatest(f.block_id, c.block#) corr_start_block#
, least(f.block_id+f.blocks-1, c.block#+c.blocks-1) corr_end_block#
, least(f.block_id+f.blocks-1, c.block#+c.blocks-1)
- greatest(f.block_id, c.block#) + 1 blocks_corrupted
, 'Free Block' description
FROM dba_free_space f, v$database_block_corruption c
WHERE f.file_id = c.file#
AND f.block_id <= c.block# + c.blocks - 1
AND f.block_id + f.blocks - 1 >= c.block#
order by file#, corr_start_block#;
OWNER SEGMENT_TYPE SEGMENT_NAME PARTITION_NAME FILE# CORR_START_BLOCK# CORR_END_BLOCK# BLOCKS_CORRUPTED DESCRIPTION
------------------------------ ------------------ -------------------- ------------------------------ ---------- ----------------- --------------- ---------------- ------------------------
SCOTT TABLE CORP_TAB 11 139 139 1 CORRUPT
这个信息详细,用户,段名,段类型,文件号,起始块号,终止块号,一共坏了几个块
4 恢复坏块
1 RMAN块恢复(基于备份,效果好)
2 标记坏块做逻辑导出导入(该表变更时间不可用,使用DBMS_REPAIR跳过坏块)
3 恢复整个数据文件(基于备份restore,recover业务影响整个数据文件涉及表对象)
4 如果是其他对象如索引,约束等可以重建解决,索引的重建要充分考虑业务需求,防止由于索引失效引起的性能问题
5 验证我们是否有该文件的备份
RMAN> list backup of datafile 11;
using target database control file instead of recovery catalog
List of Backup Sets
===================
BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
40 Full 83.96M DISK 00:00:37 08-SEP-20
BP Key: 40 Status: AVAILABLE Compressed: YES Tag: TAG20200908T221449
Piece Name: /oracle/db/product/11.2/dbs/1av9u9u3_1_1
List of Datafiles in backup set 40
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- --------- ----
11 Full 5245565 08-SEP-20 /oracle/db/corp_tbs.dbf
发现有备份,我可以尝试直接恢复该数据块,如果该数据文件中没有,比如这是个较早的备份就需要更旧归档文件,可以参考下面注意部分
注意:
如果没有需要事先catalog该文件的映像
RMAN>catalog datafilecopy ‘/oracle/db/corp_tab.dbf’;
如果相关归档不存在(老化,挪走等原因),需要catalog归档
RMAN>catalog archivelog ‘/uo1/backup/archivelog/arch_01.dbf’
或者catalog一个目录下所有归档文件。
下面我们尝试做RMAN block recover
RMAN> blockrecover datafile 11 block 139;
Starting recover at 08-SEP-20
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1914 instance=orcl2 device type=DISK
channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00011
channel ORA_DISK_1: reading from backup piece /oracle/db/product/11.2/dbs/1av9u9u3_1_1
channel ORA_DISK_1: piece handle=/oracle/db/product/11.2/dbs/1av9u9u3_1_1 tag=TAG20200908T221449
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01
starting media recovery
media recovery complete, elapsed time: 00:00:03
Finished recover at 08-SEP-20
RMAN恢复成功,我们后续做业务验证,查询该坏块中的表
5 业务验证
SYS@orcl2>select * from scott.corp_tab
DEPTNO DNAME LOC
---------- -------------- -------------
20 RESEARCH DALLAS
10 ACCOUNTING NEW YORK
30 SALES CHICAGO
40 OPERATIONS BOSTON
Elapsed: 00:00:00.01
模拟验证完毕!
相关文章