误操作offline drop的业务影响以及恢复场景测试---“血”的教训
在数据库恢复过程中,比如我们用了第三方的备份工具,在实施恢复操作时会导出需要的某个表空间这样可以大大减少恢复时间,是十分值得推荐的做法,这个过程中由于修改控制文件中数据文件的相关信息,需要执行alter datafile XX offline drop的操作。也就是这部分的数据文件是不需要的。一般,实施人员都会非常小心这个操作,尤其是同时在操作测试库和生产库,万一在生产库做了个该操作,会直接导致相关用户表无法访问,且数据文件需要恢复的情况,如果在非高峰期这个问题不严重,如果在高峰期这个问题就会被放大。下面我们还是测试下这个行为的症状和解决方法。
alter datafile 6 offline drop;
1 创建用户,创建测试表
SQL> create user lin identified by oracle default tablespace tbs1;
User created.
SQL> alter user lin quota unlimited on tbs1;
User altered.
SQL>
SQL> create table lin.objs as select * from dba_objects;
Table created.
2 创建索引表空间
SQL> create tablespace tbs_idx datafile '/oracle/oradata/prod/idx.dbf' size 100m;
Tablespace created.
SQL> alter user lin quota unlimited on tbs_idx;
User altered.
SQL> alter index lin.idx_objs_id rebuild tablespace tbs_idx;
Index altered.
测试说明:我们在表空间tbs1创建了一张表,在表空间tbs_idx创建了对应表的一个索引lin.idx_objs_id
3 查询数据文件状态
SQL> select file#,name,status from v$datafile where file# in (5,6);
FILE# NAME STATUS
---------- -------------------------------------------------- --------------
5 /oracle/oradata/prod/tbs1.dbf ONLINE
6 /oracle/oradata/prod/idx.dbf ONLINE
SQL> select file_id,file_name,status from dba_data_files where file_id in (5,6);
FILE_ID FILE_NAME STATUS
---------- -------------------------------------------------- ------------------
5 /oracle/oradata/prod/tbs1.dbf AVAILABLE
6 /oracle/oradata/prod/idx.dbf AVAILABLE
此时其实 ,两个文件都是ONLINE且是AVAILABLE的
4 下面我们尝试offline drop操作
SQL> alter database datafile 5 offline drop;
Database altered.
SQL> select file#,name,status from v$datafile where file# in (5,6);
FILE# NAME STATUS
---------- -------------------------------------------------- --------------
5 /oracle/oradata/prod/tbs1.dbf RECOVER
6 /oracle/oradata/prod/idx.dbf ONLINE
SQL> select file_id,file_name,status from dba_data_files where file_id in (5,6);
FILE_ID FILE_NAME STATUS
---------- -------------------------------------------------- ------------------
5 /oracle/oradata/prod/tbs1.dbf AVAILABLE
6 /oracle/oradata/prod/idx.dbf AVAILABLE
我们看到这个文件5需要恢复,但是通过dba_data_files可以看到,该文件还是存在的
SQL> !ls -lrt /oracle/oradata/prod/tbs*
-rw-r----- 1 oracle oracle 104865792 Apr 30 00:49 /oracle/oradata/prod/tbs1.dbf
也就是这个文件在归档模式下其实不会被删除,但是需要恢复,注意此时的恢复是需要日志文件的,如果日志文件已经被归档且拿走了
就需要重新拉回来。
5 下面我们看看用户的反应。
SQL> select * from lin.objs;
select * from lin.objs
*
ERROR at line 1:
ORA-00376: file 5 cannot be read at this time
ORA-01110: data file 5: '/oracle/oradata/prod/tbs1.dbf'
*******
SQL> delete from lin.objs;
delete from lin.objs
*
ERROR at line 1:
ORA-00376: file 5 cannot be read at this time
ORA-01110: data file 5: '/oracle/oradata/prod/tbs1.dbf'
总之要访问的数据都无法访问了。报错00376指数据文件offline.
[oracle@sp11 ~]$ oerr ora 00376
00376, 00000, "file %s cannot be read at this time"
// *Cause: attempting to read from a file that is not readable. Most likely
// the file is offline.
// *Action: Check the state of the file. Bring it online
6 尝试恢复。
(1) online操作
SQL> alter database datafile 5 online;
alter database datafile 5 online
*
ERROR at line 1:
ORA-01113: file 5 needs media recovery
ORA-01110: data file 5: '/oracle/oradata/prod/tbs1.dbf'
(2)提示需要恢复
我们查下需要恢复的文件的scn ,其值为2088975
SQL> select file#,CHECKPOINT_CHANGE#, UNRECOVERABLE_CHANGE#,OFFLINE_CHANGE#,name from v$datafile where file#=5
FILE# CHECKPOINT_CHANGE# UNRECOVERABLE_CHANGE# OFFLINE_CHANGE# NAME
---------- ------------------ --------------------- --------------- ----------------------------------------
5 2088975 0 0 /oracle/oradata/prod/tbs1.dbf
下面是正常文件的SCN,明显这个时间更新。
SQL> select file#,CHECKPOINT_CHANGE#, UNRECOVERABLE_CHANGE#,OFFLINE_CHANGE#,name from v$datafile where file#=6;
FILE# CHECKPOINT_CHANGE# UNRECOVERABLE_CHANGE# OFFLINE_CHANGE# NAME
---------- ------------------ --------------------- --------------- ----------------------------------------
6 2090348 0 0 /oracle/oradata/prod/idx.dbf
下面我们人工产生一个检查点时间,推进SCN。
SQL> alter system checkpoint;
System altered.
下面输出看出正常文件6的SCN推进了,但是需要恢复的文件5依然保持在原始值2088975,所以需要恢复到新的SCN,此时需要日志文件
SQL> select file#,CHECKPOINT_CHANGE#, UNRECOVERABLE_CHANGE#,OFFLINE_CHANGE#,name from v$datafile where file#=6;
FILE# CHECKPOINT_CHANGE# UNRECOVERABLE_CHANGE# OFFLINE_CHANGE# NAME
---------- ------------------ --------------------- --------------- ----------------------------------------
6 2104612 0 0 /oracle/oradata/prod/idx.dbf
SQL> select file#,CHECKPOINT_CHANGE#, UNRECOVERABLE_CHANGE#,OFFLINE_CHANGE#,name from v$datafile where file#=5;
FILE# CHECKPOINT_CHANGE# UNRECOVERABLE_CHANGE# OFFLINE_CHANGE# NAME
---------- ------------------ --------------------- --------------- ----------------------------------------
5 2088975 0 0 /oracle/oradata/prod/tbs1.dbf
(3)我们切换两次日志,将归档拿走。
SQL> select group#,thread#,sequence#,bytes/1024/1024,status from v$log;
GROUP# THREAD# SEQUENCE# BYTES/1024/1024 STATUS
---------- ---------- ---------- --------------- --------------------------------
1 1 40 200 CURRENT
2 1 38 200 INACTIVE
3 1 39 200 INACTIVE
SQL> alter system switch logfile;
System altered.
SQL> /
System altered.
SQL> /
/
System altered.
看归档情况
SQL> select thread#,name,status from v$archived_log;
THREAD# NAME ST
---------- ---------------------------------------- --
1 /oracle/12.2/dbs/arch1_36_1028741645.dbf A
1 /oracle/12.2/dbs/arch1_37_1028741645.dbf A
1 /oracle/12.2/dbs/arch1_38_1028741645.dbf A
1 /oracle/12.2/dbs/arch1_39_1028741645.dbf A
1 /oracle/12.2/dbs/arch1_40_1028741645.dbf A
1 /oracle/12.2/dbs/arch1_41_1028741645.dbf A
1 /oracle/12.2/dbs/arch1_42_1028741645.dbf A
1 /oracle/12.2/dbs/arch1_43_1028741645.dbf A
8 rows selected.
我们把sequence#40拿走。
[oracle@sp11 dbs]$ mv arch1_40_1028741645.dbf /home/oracle
[oracle@sp11 dbs]$ ls -lrt arch*
-rw-r----- 1 oracle oracle 84424192 Sep 29 2020 arch1_36_1028741645.dbf
-rw-r----- 1 oracle oracle 16097792 Sep 29 2020 arch1_37_1028741645.dbf
-rw-r----- 1 oracle oracle 66402304 Oct 10 2020 arch1_38_1028741645.dbf
-rw-r----- 1 oracle oracle 181679616 Apr 29 23:27 arch1_39_1028741645.dbf
-rw-r----- 1 oracle oracle 3584 Apr 30 01:21 arch1_41_1028741645.dbf
-rw-r----- 1 oracle oracle 43520 Apr 30 01:21 arch1_42_1028741645.dbf
-rw-r----- 1 oracle oracle 1024 Apr 30 01:21 arch1_43_1028741645.dbf
[oracle@sp11 dbs]$ ls -lrt /home/oracle/arch*
-rw-r----- 1 oracle oracle 134467584 Apr 30 01:21 /home/oracle/arch1_40_1028741645.dbf
下面我们尝试恢复数据文件5,看看如何报错
SQL> select file#,name,status from v$datafile;
FILE# NAME STATUS
---------- ---------------------------------------- --------------
1 /oracle/oradata/prod/system01.dbf SYSTEM
2 /oracle/oradata/prod/sysaux01.dbf ONLINE
3 /oracle/oradata/prod/undotbs01.dbf ONLINE
4 /oracle/oradata/prod/users01.dbf ONLINE
5 /oracle/oradata/prod/tbs1.dbf RECOVER
6 /oracle/oradata/prod/idx.dbf ONLINE
6 rows selected.
下面恢复
SQL> recover datafile 5;
ORA-00279: change 2088975 generated at 04/30/2022 00:43:45 needed for thread 1
ORA-00289: suggestion : /oracle/12.2/dbs/arch1_40_1028741645.dbf
ORA-00280: change 2088975 for thread 1 is in sequence #40
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto
ORA-00308: cannot open archived log '/oracle/12.2/dbs/arch1_40_1028741645.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 7
ORA-00308: cannot open archived log '/oracle/12.2/dbs/arch1_40_1028741645.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 7
此时报错很清楚,它需要40号日志,因为这个日志我们拿走了(生产库可能是被备份工具拉走了),所以我们必须拿到这个日志才行
(异常处理方法,比如通过bbed工具拉scn我们不讨论)。
我们将日志拷贝回来。
[oracle@sp11 ~]$ mv arch1_40_1028741645.dbf /oracle/12.2/dbs/
[oracle@sp11 ~]$ ls -lrt /oracle/12.2/dbs/*40*
-rw-r----- 1 oracle oracle 10698752 Apr 30 00:54 /oracle/12.2/dbs/c-440579981-20220430-00
-rw-r----- 1 oracle oracle 134467584 Apr 30 01:21 /oracle/12.2/dbs/arch1_40_1028741645.dbf
下面再次恢复
SQL> recover datafile 5;
ORA-00279: change 2088975 generated at 04/30/2022 00:43:45 needed for thread 1
ORA-00289: suggestion : /oracle/12.2/dbs/arch1_40_1028741645.dbf
ORA-00280: change 2088975 for thread 1 is in sequence #40
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto
Log applied.
Media recovery complete.
用RMAN工具更方便些,下面用RMAN恢复
RMAN> recover datafile 5;
Starting recover at 2022-04-30 01:31:43
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=58 device type=DISK
starting media recovery
media recovery complete, elapsed time: 00:00:00
Finished recover at 2022-04-30 01:31:46
此时我们可以online文件,先查询文件状态,此时已经从RECOVER变为OFFLINE
SQL> lselect file#,name,status from v$datafile where file# in (5,6)
FILE# NAME STATUS
---------- ---------------------------------------- --------------
5 /oracle/oradata/prod/tbs1.dbf OFFLINE
6 /oracle/oradata/prod/idx.dbf ONLINE
SQL> alter database datafile 5 online;
Database altered.
SQL> select file#,name,status from v$datafile where file# in (5,6);
FILE# NAME STATUS
---------- ---------------------------------------- --------------
5 /oracle/oradata/prod/tbs1.dbf ONLINE
6 /oracle/oradata/prod/idx.dbf ONLINE
SQL> select count(*) from lin.objs;
COUNT(*)
----------
75909
总结:在遇到误操作offline drop时,确实会影响业务,但是如果归档在,且误操作时间很短,且在业务低峰时,这个问题影响不大
我们需要两个操作
1 recover 文件,2 online文件即可。
相关文章