数据库block corruption的检测和恢复模拟

2020-09-08 00:00:00 数据 文件 备份 归档 恢复

在生产库中一旦出现坏块就需要及时处理,比如需要确认坏块影响对象范围,对应数据文件是否备份,是否有足够旧的归档,针对不同对象如何恢复,本文给出一个完整的模拟过程,后续讨论了几种坏块恢复方式,以及文件是否备份,归档等的处理方法。我们做过这样的验证,手里有对应的脚本,再遇到类似问题就心里不慌了。

1 模拟环境准备

创建表空间

SYS@orcl2>create tablespace corp_tbs datafile '/oracle/db/corp_tbs.dbf' size 10m;

Tablespace created.

 创建表

SYS@orcl2>create table scott.corp_tab tablespace corp_tbs as select * from scott.dept;

Table created.

Elapsed: 00:00:00.19

检查表段块分配情况

SYS@orcl2>select dbms_rowid.rowid_block_number(rowid),dbms_rowid.rowid_relative_fno(rowid) from scott.corp_tab;

DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID)

------------------------------------ ------------------------------------

                                 139                                   11

                                 139                                   11

                                 139                                   11

                                 139                                   11

说明该表仅仅使用了一块,就是11号文件的139号数据块

SYS@orcl2> select file_name,tablespace_name from dba_data_files where file_id=11

FILE_NAME                                  TABLESPACE_NAME

------------------------------ ------------------------------

/oracle/db/corp_tbs.dbf              CORP_TBS

2 模拟破坏数据块,我们使用vim编辑数据文件,随机插入几个字符

RMAN> recover datafile 11 block 139 clear;

Starting recover at 08-SEP-20

using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=30 instance=orcl2 device type=DISK

Finished recover at 08-SEP-20

 3 对块中的表数据做一个全表扫描,触发错误

测试库为11.2.0.4该版本数据库在告警日志中已经有非常丰富的信息

Hex dump of (file 11, block 139) in trace file /oracle/db/diag/rdbms/orcl/orcl2/trace/orcl2_ora_27567.trc

Corrupt block relative dba: 0x02c0008b (file 11, block 139)

Bad header found during buffer read

Data in bad block:

 type: 6 format: 2 rdba: 0x02c0008b

 last change scn: 0x0000.0050188a seq: 0x1 flg: 0x04

 spare1: 0x91 spare2: 0x0 spare3: 0x0

 consistency value in tail: 0x188a0601

 check value in block header: 0xf626

 computed block checksum: 0x5891

Reading datafile '/oracle/db/corp_tbs.dbf' for corruption at rdba: 0x02c0008b (file 11, block 139)

Reread (file 11, block 139) found same corrupt data (no logical check)

Tue Sep 08 22:50:57 2020

Corrupt Block Found

         TSN = 10, TSNAME = CORP_TBS

         RFN = 11, BLK = 139, RDBA = 46137483

         OBJN = 16186, OBJD = 16187, OBJECT = CORP_TAB, SUBOBJECT = 

Errors in file /oracle/db/diag/rdbms/orcl/orcl2/trace/orcl2_ora_27567.trc  (incident=2140394):

ORA-01578: ORACLE data block corrupted (file # 11, block # 139)                                              <<<<<<<<<<ORA-1578错误出现了

ORA-01110: data file 11: '/oracle/db/corp_tbs.dbf'

         SEGMENT OWNER = SCOTT, SEGMENT TYPE = Table Segment

Incident details in: /oracle/db/diag/rdbms/orcl/orcl2/incident/incdir_2140394/orcl2_ora_27567_i2140394.trc

Tue Sep 08 22:51:01 2020

Sweep [inc][2140394]: completed

Tue Sep 08 22:51:01 2020

Dumping diagnostic data in directory=[cdmp_20200908225101], requested by (instance=2, osid=27567), summary=[incident=2140394].

Hex dump of (file 11, block 139) in trace file /oracle/db/diag/rdbms/orcl/orcl2/incident/incdir_2140394/orcl2_m000_29080_i2140394_a.trc

Corrupt block relative dba: 0x02c0008b (file 11, block 139)

Bad header found during validation

Data in bad block:

 type: 6 format: 2 rdba: 0x02c0008b

 last change scn: 0x0000.0050188a seq: 0x1 flg: 0x04

 spare1: 0x91 spare2: 0x0 spare3: 0x0

 consistency value in tail: 0x188a0601

 check value in block header: 0xf626

 computed block checksum: 0x5891

Reread of blocknum=139, file=/oracle/db/corp_tbs.dbf. found same corrupt data

Reread of blocknum=139, file=/oracle/db/corp_tbs.dbf. found same corrupt data

Reread of blocknum=139, file=/oracle/db/corp_tbs.dbf. found same corrupt data

Reread of blocknum=139, file=/oracle/db/corp_tbs.dbf. found same corrupt data

Reread of blocknum=139, file=/oracle/db/corp_tbs.dbf. found same corrupt data

涉及文件号,块号、坏块中涉及的对象

SYS@orcl2>select * from V$DATABASE_BLOCK_CORRUPTION;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO

---------- ---------- ---------- ------------------ ---------

        11          139                   1                                  5249162 CORRUPT

Elapsed: 00:00:00.03

通过如下脚本检查更详细的坏块涉及的对象信息

set pagesize 2000

set linesize 280

SELECT e.owner, e.segment_type, e.segment_name, e.partition_name, c.file#

     , greatest(e.block_id, c.block#) corr_start_block#

     , least(e.block_id+e.blocks-1, c.block#+c.blocks-1) corr_end_block#

     , least(e.block_id+e.blocks-1, c.block#+c.blocks-1)

       - greatest(e.block_id, c.block#) + 1 blocks_corrupted

     , corruption_type description

  FROM dba_extents e, v$database_block_corruption c

 WHERE e.file_id = c.file#

   AND e.block_id <= c.block# + c.blocks - 1

   AND e.block_id + e.blocks - 1 >= c.block#

UNION

SELECT s.owner, s.segment_type, s.segment_name, s.partition_name, c.file#

     , header_block corr_start_block#

     , header_block corr_end_block#

     , 1 blocks_corrupted

     , corruption_type||' Segment Header' description

  FROM dba_segments s, v$database_block_corruption c

 WHERE s.header_file = c.file#

   AND s.header_block between c.block# and c.block# + c.blocks - 1

UNION

SELECT null owner, null segment_type, null segment_name, null partition_name, c.file#

     , greatest(f.block_id, c.block#) corr_start_block#

     , least(f.block_id+f.blocks-1, c.block#+c.blocks-1) corr_end_block#

     , least(f.block_id+f.blocks-1, c.block#+c.blocks-1)

       - greatest(f.block_id, c.block#) + 1 blocks_corrupted

     , 'Free Block' description

  FROM dba_free_space f, v$database_block_corruption c

 WHERE f.file_id = c.file#

   AND f.block_id <= c.block# + c.blocks - 1

   AND f.block_id + f.blocks - 1 >= c.block#

order by file#, corr_start_block#;

OWNER                          SEGMENT_TYPE       SEGMENT_NAME         PARTITION_NAME                      FILE# CORR_START_BLOCK# CORR_END_BLOCK# BLOCKS_CORRUPTED DESCRIPTION

------------------------------ ------------------ -------------------- ------------------------------ ---------- ----------------- --------------- ---------------- ------------------------

SCOTT                                TABLE                           CORP_TAB                                                                   11                   139                             139                       1                CORRUPT

这个信息详细,用户,段名,段类型,文件号,起始块号,终止块号,一共坏了几个块

4 恢复坏块

1 RMAN块恢复(基于备份,效果好)

2 标记坏块做逻辑导出导入(该表变更时间不可用,使用DBMS_REPAIR跳过坏块)

3 恢复整个数据文件(基于备份restore,recover业务影响整个数据文件涉及表对象)

4 如果是其他对象如索引,约束等可以重建解决,索引的重建要充分考虑业务需求,防止由于索引失效引起的性能问题

5 验证我们是否有该文件的备份

RMAN> list backup of datafile 11;

using target database control file instead of recovery catalog

List of Backup Sets

===================

BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ ---------------

40      Full    83.96M     DISK        00:00:37     08-SEP-20      

        BP Key: 40   Status: AVAILABLE  Compressed: YES  Tag: TAG20200908T221449

        Piece Name: /oracle/db/product/11.2/dbs/1av9u9u3_1_1

  List of Datafiles in backup set 40

  File LV Type Ckp SCN    Ckp Time  Name

  ---- -- ---- ---------- --------- ----

  11      Full 5245565    08-SEP-20 /oracle/db/corp_tbs.dbf

发现有备份,我可以尝试直接恢复该数据块,如果该数据文件中没有,比如这是个较早的备份就需要更旧归档文件,可以参考下面注意部分

注意:

如果没有需要事先catalog该文件的映像

RMAN>catalog datafilecopy ‘/oracle/db/corp_tab.dbf’;

如果相关归档不存在(老化,挪走等原因),需要catalog归档

RMAN>catalog archivelog ‘/uo1/backup/archivelog/arch_01.dbf’

或者catalog一个目录下所有归档文件。

下面我们尝试做RMAN block recover

RMAN> blockrecover datafile 11 block 139;

Starting recover at 08-SEP-20

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=1914 instance=orcl2 device type=DISK

channel ORA_DISK_1: restoring block(s)

channel ORA_DISK_1: specifying block(s) to restore from backup set

restoring blocks of datafile 00011

channel ORA_DISK_1: reading from backup piece /oracle/db/product/11.2/dbs/1av9u9u3_1_1

channel ORA_DISK_1: piece handle=/oracle/db/product/11.2/dbs/1av9u9u3_1_1 tag=TAG20200908T221449

channel ORA_DISK_1: restored block(s) from backup piece 1

channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01

starting media recovery

media recovery complete, elapsed time: 00:00:03

Finished recover at 08-SEP-20

RMAN恢复成功,我们后续做业务验证,查询该坏块中的表

5 业务验证

SYS@orcl2>select * from scott.corp_tab

    DEPTNO DNAME          LOC

---------- -------------- -------------

        20 RESEARCH       DALLAS

        10 ACCOUNTING     NEW YORK

        30 SALES          CHICAGO

        40 OPERATIONS     BOSTON

Elapsed: 00:00:00.01

模拟验证完毕!

相关文章