sitemap

RSS地图

收藏本站

设为首页

Oracle研究中心

当前位置:Oracle研究中心 > 故障案例 >

【案例】Oracle报错ORA-16433非归档丢失redo无法启动的恢复过程

时间:2016-12-21 20:32   来源:Oracle研究中心   作者:网络   点击:

天萃荷净 Oracle研究中心案例分析:运维DBA反映Oracle数据库处理非归档模式,redo文件损坏常规修复无法正常open数据库。

本站文章除注明转载外,均为本站原创: 转载自love wife & love life —Roger 的Oracle技术博客
本文链接地址: One recover case!

今天ML的群中女神和travel在纠结一个恢复的问题,11.2.0.3版本,非归档,大概是rm掉current的log,然后重建controlfille后恢复导致一系列问题,并最终出现ora-600 2662错误,虽然这个错误很常见,但是你发现推进scn也是无法open,感觉有点怪,远程了女神的电脑,操作不便,最后将文件压缩传过来,我在自己的vmware进行了恢复。

由于环境的差异,所以解压后我先进行rename file操作,如下:

SQL> SELECT name FROM v$datafile;

NAME
------------------------------------------------
/u01/app/oracle/oradata/travel/system01.dbf
.........
/u01/app/oracle/oradata/travel/users01.dbf

SQL> SELECT member FROM v$logfile;

MEMBER
-------------------------------------------------
/u01/app/oracle/oradata/travel/redo03.log
/u01/app/oracle/oradata/travel/redo02.log
/u01/app/oracle/oradata/travel/redo01.log

SQL> ALTER DATABASE RENAME file '/u01/app/oracle/oradata/travel/system01.dbf' TO '/home/oracle/travel/travel/system01.dbf';
SQL>  ALTER DATABASE RENAME file '/u01/app/oracle/oradata/travel/sysaux01.dbf' TO '/home/oracle/travel/travel/sysaux01.dbf';
SQL>  ALTER DATABASE RENAME file '/u01/app/oracle/oradata/travel/undotbs01.dbf' TO '/home/oracle/travel/travel/undotbs01.dbf';
SQL> ALTER DATABASE RENAME file '/u01/app/oracle/oradata/travel/users01.dbf' TO '/home/oracle/travel/travel/users01.dbf';
SQL> ALTER DATABASE RENAME file '/u01/app/oracle/oradata/travel/redo01.log' TO '/home/oracle/travel/travel/redo01.log';
SQL> ALTER DATABASE RENAME file '/u01/app/oracle/oradata/travel/redo02.log' TO '/home/oracle/travel/travel/redo02.log';
SQL> ALTER DATABASE RENAME file '/u01/app/oracle/oradata/travel/redo03.log' TO '/home/oracle/travel/travel/redo03.log';   

SQL> SELECT name FROM v$datafile;

NAME
-------------------------------------------------
/home/oracle/travel/travel/system01.dbf
/home/oracle/travel/travel/sysaux01.dbf
/home/oracle/travel/travel/undotbs01.dbf
/home/oracle/travel/travel/users01.dbf

SQL> SELECT member FROM v$Logfile;

MEMBER
-------------------------------------------------
/home/oracle/travel/travel/redo03.log
/home/oracle/travel/travel/redo02.log
/home/oracle/travel/travel/redo01.log

此时,进行recover,会报错ORA-16433,如下:

SQL> recover DATABASE;
ORA-00283: recovery SESSION canceled due TO errors
ORA-16433: The DATABASE must be opened IN READ/WRITE mode.


SQL> recover DATABASE USING backup controlfile until cancel;
ORA-00283: recovery SESSION canceled due TO errors
ORA-16433: The DATABASE must be opened IN READ/WRITE mode.

SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: DATA file 1: '/home/oracle/travel/travel/system01.dbf'
关于该错误,通过oerr命令可以大概了解一下其含义,如下:

[oracle@11gR2_primary ~]$ oerr ora 16433
16433, 00000, "The database must be opened in read/write mode."
// *Cause:  An attempt was made to open the database in read-only mode after an
//          operation that requires that the database be opened in read/write
//          mode.
// *Action: Open the database in read/write mode. The database can then be
//          opened in read-only mode.

通过这个错误,我们可以得出一个信息,数据库可以以read only模式打开,换句话讲,你查询如下几个值都是一样的:
select checkpoint_change# from v$database;   --来自controlfile
select checkpoint_change# from v$datafile;   --来自controlfile
select checkpoint_change# from v$datafile_orader;  --来自datafile header

其实,通常来讲,只要上述几个值一样,那么我们的db应该都是可以直接open打开的,然而,这里却不行,read only都不行,我试过。
然后下面就开始我的恢复工作。

第一次尝试恢复:

---重建controlfile
SQL> startup nomount  pfile=/home/oracle/travel/travel/a.ora
ORACLE instance started.

Total System Global Area  626327552 bytes
Fixed Size                  2230952 bytes
Variable Size             184550744 bytes
Database Buffers          436207616 bytes
Redo Buffers                3338240 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "TRAVEL" RESETLOGS  NOARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 292
  7  LOGFILE
  8    GROUP 1 '/home/oracle/travel/travel/redo01.log'  SIZE 50M BLOCKSIZE 512,
  9    GROUP 2 '/home/oracle/travel/travel/redo02.log'  SIZE 50M BLOCKSIZE 512,
10    GROUP 3 '/home/oracle/travel/travel/redo03.log'  SIZE 50M BLOCKSIZE 512
11  -- STANDBY LOGFILE
12  DATAFILE
13    '/home/oracle/travel/travel/system01.dbf',
14    '/home/oracle/travel/travel/sysaux01.dbf',
15    '/home/oracle/travel/travel/undotbs01.dbf',
16    '/home/oracle/travel/travel/users01.dbf'
17  CHARACTER SET AL32UTF8
18  ;

Control file created.

SQL>

开始开始进行recover:
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/home/oracle/travel/travel/system01.dbf'

SQL> recover database until cancel;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done

SQL> recover database using backup controlfile until cancel;
ORA-00279: change 244977 generated at 01/19/2013 01:56:54 needed for thread 1
ORA-00289: suggestion : /oracle/product/11.2.0/db_1/dbs/arch1_1_805082211.dbf
ORA-00280: change 244977 for thread 1 is in sequence #1

Specify log: {=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/home/oracle/travel/travel/system01.dbf'

ORA-01112: media recovery not started
因为本身是非归档,所以这个不完全恢复的步骤无非是为了后面可以进行open resetlog。然后停库,加入隐含参数进行open:

SQL> shutdown immediate
ORA-01109: DATABASE NOT OPEN

DATABASE dismounted.
ORACLE instance shut down.
SQL> startup nomount  pfile=/home/oracle/travel/travel/b.ora
ORACLE instance started.

Total System Global Area  626327552 bytes
Fixed SIZE                  2230952 bytes
Variable SIZE             184550744 bytes
DATABASE Buffers          436207616 bytes
Redo Buffers                3338240 bytes

SQL> ALTER DATABASE mount
DATABASE altered.

SQL> ALTER DATABASE OPEN resetlogs;
ALTER DATABASE OPEN resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance TERMINATED. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [0], [244985], [0],
[244998], [4194432], [], [], [], [], [], []
Process ID: 3641
SESSION ID: 1 Serial NUMBER: 5

此时alert log对应的错误如下:

Fri Dec 14 21:58:39 2012
SMON: enabling cache recovery
Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_3641.trc  (incident=4937):
ORA-00600: internal error code, arguments: [2662], [0], [244985], [0], [244998], [4194432], [], [], [], [], [], []
Incident details in: /oracle/diag/diag/rdbms/travel/travel/incident/incdir_4937/travel_ora_3641_i4937.trc
Fri Dec 14 21:58:43 2012
Dumping diagnostic data in directory=[cdmp_20121214215843], requested by (instance=1, osid=3641), summary=[incident=4937].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_3641.trc:
ORA-00600: internal error code, arguments: [2662], [0], [244985], [0], [244998], [4194432], [], [], [], [], [], []
Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_3641.trc:
ORA-00600: internal error code, arguments: [2662], [0], [244985], [0], [244998], [4194432], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 3641): terminating the instance due to error 600
Instance terminated by USER, pid = 3641
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (3641) as a result of ORA-1092

甚至手工执行如下操作都无法open数据库,如下:

SQL> ALTER SESSION SET events '10015 trace name ADJUST_SCN level 14';
SESSION altered.

SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: DATA file 1: '/home/oracle/travel/travel/system01.dbf'

最后dunmp controlfile和CURRENT redo logfile,发现low cache rba信息居然是最大值:

SQL> oradebug setmypid
Statement processed.

SQL> ALTER system  SET events 'immediate trace name controlf level 4'   ;
System altered.

SQL> ALTER system  dump logfile '/home/oracle/travel/travel/redo01.log';

SQL> oradebug tracefile_name
/oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_4229.trc


++++++++ controlfile dump 片段

***************************************************************************
CHECKPOINT PROGRESS RECORDS
***************************************************************************
(size = 8180, compat size = 8180, section max = 11, section in-use = 0,
  last-recid= 0, old-recno = 0, last-recno = 0)
(extent = 1, blkno = 2, numrecs = 11)
THREAD #1 - status:0x2 flags:0x0 dirty:0
low cache rba:(0xffffffff.ffffffff.ffff) on disk rba:(0x1.3.0)  ---low cache rba为无穷大
on disk scn: 0x0000.0003bcf7 12/14/2012 21:58:39
resetlogs scn: 0x0000.0003bcf2 12/14/2012 21:58:36
heartbeat: 802069494 mount id: 2869233386
THREAD #2 - status:0x0 flags:0x0 dirty:0
low cache rba:(0x0.0.0) on disk rba:(0x0.0.0)
on disk scn: 0x0000.00000000 01/01/1988 00:00:00
resetlogs scn: 0x0000.00000000 01/01/1988 00:00:00
heartbeat: 0 mount id: 0
THREAD #3 - status:0x0 flags:0x0 dirty:0
low cache rba:(0x0.0.0) on disk rba:(0x0.0.0)
on disk scn: 0x0000.00000000 01/01/1988 00:00:00
resetlogs scn: 0x0000.00000000 01/01/1988 00:00:00
heartbeat: 0 mount id: 0

+++++++redo01.log dump 片段

DUMP OF REDO FROM FILE '/home/oracle/travel/travel/redo01.log'
Opcodes *.*
RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
Times: creation thru eternity
FILE HEADER:
        Compatibility Vsn = 186646528=0xb200000
        Db ID=2872261344=0xab333ae0, Db Name='TRAVEL'
        Activation ID=2872292516=0xab33b4a4
        Control Seq=233=0xe9, File size=102400=0x19000
        File Number=1, Blksiz=512, File Type=2 LOG
descrip:"Thread 0001, Seq# 0000000001, SCN 0x00000003bcee-0xffffffffffff"
thread: 1 nab: 0xffffffff seq: 0x00000001 hws: 0x3 eot: 1 dis: 0
resetlogs count: 0x2ffc9463 scn: 0x0000.0003bcee (244974)
prev resetlogs count: 0x2ffc7a41 scn: 0x0000.0003bcea (244970)
Low  scn: 0x0000.0003bcee (244974) 01/19/2013 01:56:51
Next scn: 0xffff.ffffffff 01/01/1988 00:00:00
Enabled scn: 0x0000.0003bcee (244974) 01/19/2013 01:56:51
Thread closed scn: 0x0000.0003bcee (244974) 01/19/2013 01:56:51
Disk cksum: 0x3467 Calc cksum: 0x3467
Terminal recovery stop scn: 0x0000.00000000
Terminal recovery  01/01/1988 00:00:00
Most recent redo scn: 0x0000.00000000
Largest LWN: 0 blocks
End-of-redo stream : No
Unprotected mode
Miscellaneous flags: 0x800000
Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000
Zero blocks: 0
Format ID is 2
redo log key is 853c461da2eec7ed4b45ce75b8c27d7
redo log key flag is 5
Enabled redo threads: 1

REDO RECORD - Thread:1 RBA: 0x000001.00000002.0010 LEN: 0x0070 VLD: 0x05
SCN: 0x0000.0003bcf2 SUBSCN:  1 01/19/2013 01:56:54
(LWN RBA: 0x000001.00000002.0010 LEN: 0001 NST: 0001 SCN: 0x0000.0003bcee)  ,
CHANGE #1 MEDIA RECOVERY MARKER SCN:0x0000.00000000 SEQ:0 OP:17.3 ENC:0
END OF REDO DUMP

可以看到实际上current redo的LWN RBA是对的,为1.2.10,是小于on disk rba (0×1.3.0)的。那么这里为什么会出现这个情况?
猜测可能是写紊乱了。到这里也就比较明白了,仍然是conrolfile有问题。

既然如此,那么我直接将controlfile文件rm掉,然后再次重建。如下:

SQL> startup nomount pfile='/home/oracle/travel/travel/a.ora
ORACLE instance started.

Total System Global Area  626327552 bytes
Fixed Size                  2230952 bytes
Variable Size             184550744 bytes
Database Buffers          436207616 bytes
Redo Buffers                3338240 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "TRAVEL" RESETLOGS  NOARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 292
  7  LOGFILE
  8    GROUP 1 '/home/oracle/travel/travel/redo01.log'  SIZE 50M BLOCKSIZE 512,
  9    GROUP 2 '/home/oracle/travel/travel/redo02.log'  SIZE 50M BLOCKSIZE 512,
10    GROUP 3 '/home/oracle/travel/travel/redo03.log'  SIZE 50M BLOCKSIZE 512
11  -- STANDBY LOGFILE
12  DATAFILE
13    '/home/oracle/travel/travel/system01.dbf',
14    '/home/oracle/travel/travel/sysaux01.dbf',
15    '/home/oracle/travel/travel/undotbs01.dbf',
16    '/home/oracle/travel/travel/users01.dbf'
17  CHARACTER SET AL32UTF8
18  ;

Control file created.
SQL> recover database until cancel using backup controlfile;
ORA-00279: change 244985 generated at 12/14/2012 23:05:06 needed for thread 1
ORA-00289: suggestion : /oracle/product/11.2.0/db_1/dbs/arch1_1_802047904.dbf
ORA-00280: change 244985 for thread 1 is in sequence #1

Specify log: {=suggested | filename | AUTO | CANCEL}
AUTO
ORA-00308: cannot open archived log
'/oracle/product/11.2.0/db_1/dbs/arch1_1_802047904.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

ORA-00308: cannot open archived log
'/oracle/product/11.2.0/db_1/dbs/arch1_1_802047904.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/home/oracle/travel/travel/system01.dbf'

SQL>
SQL>
SQL> select file#,CHECKPOINT_CHANGE#,LAST_CHANGE# from v$datafile order by  1;

     FILE# CHECKPOINT_CHANGE# LAST_CHANGE#
---------- ------------------ ------------
         1             244985
         2             244985
         3             244985
         4             244985

SQL> select file#,CHECKPOINT_CHANGE# from v$datafile_header order by 1;

     FILE# CHECKPOINT_CHANGE#
---------- ------------------
         1             244985
         2             244985
         3             244985
         4             244985

SQL> select CHECKPOINT_CHANGE# from v$database;

CHECKPOINT_CHANGE#
------------------
                0
此时停库,然后在pfile中加入隐含参数:
*._allow_resetlogs_corruption=TRUE
*._allow_error_simulation=TRUE

接着再次进行mount,并进行scn 推进:

SQL> ALTER DATABASE OPEN resetlogs;
ALTER DATABASE OPEN resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery TO be consistent
ORA-01110: DATA file 1: '/home/oracle/travel/travel/system01.dbf'

SQL>
SQL> shutdown immediate
ORA-01109: DATABASE NOT OPEN

DATABASE dismounted.
ORACLE instance shut down.
SQL> startup nomount pfile='/home/oracle/travel/travel/b.ora
ORACLE instance started.

Total System Global Area  626327552 bytes
Fixed Size                  2230952 bytes
Variable Size             184550744 bytes
Database Buffers          436207616 bytes
Redo Buffers                3338240 bytes
SQL> alter database mount
  2  ;

Database altered.

SQL> alter session set events '10015 trace name ADJUST_SCN level 10';
Session altered.

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open

SQL> alter database open noresetlogs;
alter database open noresetlogs
*
ERROR at line 1:
ORA-01588: must use RESETLOGS option for database open

SQL> alter database open resetlogs;
Database altered.

SQL> show parameter name

NAME                                 TYPE        VALUE
-------------Oracleо----------------------- ----------- ------------------------------
db_file_name_convert                 string
db_name                              string      travel
db_unique_name                       string      travel
global_names                         boolean     FALSE
instance_name                        string      travel
lock_name_space                      string
log_file_name_convert                string
processor_group_name                 string
service_names                        string      travel
SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ WRITE

SQL> alter system switch logfile;
System altered.

这里说明一下是我的alert log里面还出现了数据字典不一致的问题,因为毕竟是强制open的,如下:

Sun Jan 20 00:25:34 2013
Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_m000_4681.trc:
ORA-25153: Temporary Tablespace is Empty
Sun Jan 20 00:25:38 2013
Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_j004_4692.trc  (incident=21817):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/diag/rdbms/travel/travel/incident/incdir_21817/travel_j004_4692_i21817.trc
Sun Jan 20 00:25:43 2013
Dumping diagnostic data in directory=[cdmp_20130120002543], requested by (instance=1, osid=4692 (J004)), summary=[incident=21817].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun Jan 20 00:25:47 2013

这个ora-600错误很常见了,通过是index信息不一致。

其trace内容如下:
* kdsgrp1-1: *************************************************
            row 0x0040589e.4b continuation at
            0x0040589e.4b file# 1 block# 22686 slot 75 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 75 ..... nrows: 75
kdsgrp - dump CR block dba=0x0040589e
Block header dump:  0x0040589e
Object id on Block  Y
seg/obj: 0x12  csc: 0x00.3bca9  itc: 1  flg: O  typ: 1 - DATA
     fsl: 0  fnx: 0x40589f ver: 0x01

Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0001.01a.000000a1  0x00c05e97.001d.0d  --U-    1  fsc 0x0053.0003bcab
bdba: 0x0040589e
data_block_dump,data header at 0x7a26a044
===============
tsiz: 0x1fb8
hsiz: 0xa8
pbl: 0x7a26a044
     76543210
flag=--------
ntab=1
nrow=75
frre=-1
fsbo=0xa8
...........

seg/obj: 0×12 这是obj$对象,属于bootstrap$核心对象,涉及到这类的对象处理相对麻烦,针对这类情况,数据库open后都建议把数据导出,然后重建库,我这里就不在继续描述这个ora-600错误了,我博客也有类似的例子。

最后来个小节:

1. oracle通过系统checkpoint scn,datafile checkpoint scn,start scn三者之间的比较来判断数据文件是否需要进行介质恢复.
2. 在redo 线程打开的情况下,即数据库open的情况下,stop scn会被设置为无穷大,当正常关闭时,stop scn等于datafile scn.
这里需要注意的是,stop scn是存放在controlfile中的,网上部分资料说是存在datafile header中,这个说法是错误的。
3. oracle在open之前是先判断是否进行介质恢复,然后再是判断是否进行instance recovery。
4. 关于4种scn的关系如下:

system checkpoint scn — 存放在controlfile中
datafile checkpoint scn –存放在controlfile中
start scn —存放在datafile header中
stop scn —存放在controlfile中


system scn,datafile checkpoint scn,start scn,这3种scn用于判断数据文件是否需要进行介质恢复。这3个相等这不需要介质恢复。
如何这4个都相等,那么就不需要进行实例恢复。stop scn是用于判断是否进行实例恢复的。

5. 如果stop scn比其他的几个scn要大,那么就需要进行instance recover,需要进行扫描redo,实例恢复的起点是low cache rba,终点
是redo log的最末端。

--------------------------------------ORACLE-DBA----------------------------------------

最权威、专业的Oracle案例资源汇总之【案例】Oracle报错ORA-16433非归档丢失redo无法启动的恢复过程

本文由大师惜分飞原创分享,网址:http://www.oracleplus.net/arch/1422.html

Oracle研究中心

关键词:

ORA-16433解决办法

Oracle丢失redo非归档进行恢复

Oracle checkpoint scn详解