Friday, March 2, 2012

new behaviour in 11gR2 (2)

Finally had the time to check a change in behavour i blogged about some time ago,

there is a hidden parameter :
_datafile_write_errors_crash_instance
which now by default is set on TRUE while before it was put on FALSE. The consequence of this is that where oracle is unable to write to datafile it will crash instead of putting the datafile offline.

to simulate this I did following


echo 1 > /u02/MYDB/users01.dbf

when the checkpoint process wants to update the datafile header it can't and you see following appear in the alertlog


"
Errors in file /u01/app/oracle/diag/rdbms/mydb/MYDB/trace/MYDB_ckpt_16717.trc:
ORA-63999: data file suffered media failure
ORA-01115: IO error reading block from file 4 (block # 1)
ORA-01110: data file 4: '/u02/MYDB/users01.dbf'
ORA-27072: File I/O error
Additional information: 4
Additional information: 1
Errors in file /u01/app/oracle/diag/rdbms/mydb/MYDB/trace/MYDB_ckpt_16717.trc:
ORA-63999: data file suffered media failure
ORA-01115: IO error reading block from file 4 (block # 1)
ORA-01110: data file 4: '/u02/MYDB/users01.dbf'
ORA-27072: File I/O error
Additional information: 4
Additional information: 1
CKPT (ospid: 16717): terminating the instance due to error 63999
System state dump requested by (instance=1, osid=16717 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/mydb/MYDB/trace/MYDB_diag_16693.trc
Dumping diagnostic data in directory=[cdmp_20120301163246], requested by (instance=1, osid=16717 (CKPT)), summary=[abnormal instance termination].
2012-03-01 16:32:47.071000 +01:00
Instance terminated by CKPT, pid = 16717
2012-03-01 16:33:06.509000 +01:00
"


you can change this behaviour by issuing

alter system set "_datafile_write_error_crash_instance"=FALSE;

and then you get this behaviour :



Thread 1 advanced to log sequence 71 (LGWR switch)
  Current log# 2 seq# 71 mem# 0: /u01/oradata/TEST/redo02.log
Archived Log entry 11 added for thread 1 sequence 70 ID 0x7bacbbe8 dest 1:
2012-03-02 10:43:57.302000 +01:00
Errors in file /u01/app/oracle/diag/rdbms/test/TEST/trace/TEST_ckpt_23366.trc:
ORA-01171: datafile 4 going offline due to error advancing checkpoint
ORA-01115: IO error reading block from file 4 (block # 1)
ORA-01110: data file 4: '/u01/oradata/TEST/users01.dbf'
ORA-27072: File I/O error
Additional information: 4
Additional information: 1
Checker run found 1 new persistent data failures
2012-03-02 10:45:21.345000 +01:00

note :
just corrupting the datafile and leaving the header intact doesn't bring the instance down instead you will see following

"
Hex dump of (file 4, block 2) in trace file /u01/app/oracle/diag/rdbms/test/TEST/trace/TEST_ora_2496.trcCorrupt block relative dba: 0x01000002 (file 4, block 2)Completely zero block found during buffer readReading datafile '/u01/oradata/TEST/users01.dbf' for corruption at rdba: 0x01000002 (file 4, block 2)Reread (file 4, block 2) found same corrupt data (no logical check)Corrupt Block Found         TSN = 4, TSNAME = USERS         RFN = 4, BLK = 2, RDBA = 16777218         OBJN = 1, OBJD = -1, OBJECT = , SUBOBJECT =         SEGMENT OWNER = , SEGMENT TYPE =2012-03-02 10:47:08.087000 +01:00Errors in file /u01/app/oracle/diag/rdbms/test/TEST/trace/TEST_ora_2496.trc  (incident=3781):ORA-01578: ORACLE data block corrupted (file # 4, block # 2)ORA-01110: data file 4: '/u01/oradata/TEST/users01.dbf'Incident details in: /u01/app/oracle/diag/rdbms/test/TEST/incident/incdir_3781/TEST_ora_2496_i3781.trc2012-03-02 10:47:11.750000 +01:00Hex dump of (file 4, block 1) in trace file /u01/app/oracle/diag/rdbms/test/TEST/incident/incdir_3781/TEST_ora_2496_i3781.trcCorrupt block relative dba: 0x00000001 (file 4, block 1)Completely zero block found during validating datafile for block rangeReread of blocknum=1, file=/u01/oradata/TEST/users01.dbf. found same corrupt dataReread of blocknum=1, file=/u01/oradata/TEST/users01.dbf. found same corrupt dataReread of blocknum=1, file=/u01/oradata/TEST/users01.dbf. found same corrupt dataReread of blocknum=1, file=/u01/oradata/TEST/users01.dbf. found same corrupt dataReread of blocknum=1, file=/u01/oradata/TEST/users01.dbf. found same corrupt dataErrors in file /u01/app/oracle/diag/rdbms/test/TEST/incident/incdir_3781/TEST_ora_2496_i3781.trc:ORA-19563: datafile header validation failed for file /u01/oradata/TEST/users01.dbfORA-01251: Unknown File Header Version read for file number 4ORA-01578: ORACLE data block corrupted (file # 4, block # 2)ORA-01110: data file 4: '/u01/oradata/TEST/users01.dbf'Errors in file /u01/app/oracle/diag/rdbms/test/TEST/trace/TEST_ora_2496.trc  (incident=3782):ORA-01578: ORACLE data block corrupted (file # 4, block # 2)
"







3 comments:

Germán said...

Hi, which is the consequence of change this parameter?. I mean, if I change this parameter Oracle puts datafile online but allow to continue the operation.... Can I put on risk my whole database doing that?

Regards.

Philippe said...

what happens is that you instance crashes if you have a datafile giving errors, while before it would just bring the tablespace offline

Germán said...

Thanks for reply Philippe. Is a new RAC 11gr2 database, it comes from 10gr1 single instance. First one node crash and then, 4 hours after, the other node crash with the same error. Looks like the ASM diskgroup +DG_DATA is gone...

Greetings from Colombia