Tuesday, March 27, 2012

11gR2 new features (3) : voting disks in ASM

One of the new features in Clusterware 11.2.0.1 is that ocr and voting disk must be either on shared files system or in ASM, raw devices are only supported if you upgrade  from one version to the other.


Whereas in 10gR2 you could make use of raw devices for voting disks and ocr, in 11g it is only allowed/supported if you upgrade, since this is clean installation we cannot go that way.  In the 10.2 environment we had 3 voting disks and 2 OCR disks configured ..


Voting and ocr disks are now in ASM depending on the redundancy there are multiple copies. If you use external redundancy as we do for our datadisk and archive diskgroup, the voting disk is only stored once.

as shown here

crsctl query css votedisk## STATE File Universal Id File Name Disk group-- ----- ----------------- --------- ---------1. ONLINE 55b5a82167444ff3bfa0ad1bdc8f99fb (/dev/rdsk/c6t60060E80058DA50000008DA500000007d0s0) [DATADG1]Located 1 voting disk(s).



therefore we decided to create a seperate diskgroup with a higher redundancy to store voting disks.



$ ls -altr /dev/rdsk/*s5
crw-rw---- 1 grid asmdba 118, 5 Mar 21 16:00 /dev/rdsk/c6t60060E80058DA50000008DA50000000Dd0s5
crw-rw---- 1 grid asmdba 118, 13 Mar 21 16:00 /dev/rdsk/c6t60060E80058DA50000008DA50000000Cd0s5
crw-rw---- 1 grid asmdba 118, 21 Mar 21 16:00 /dev/rdsk/c6t60060E80058DA50000008DA50000000Bd0s5
crw-rw---- 1 grid asmdba 118, 29 Mar 21 16:00 /dev/rdsk/c6t60060E80058DA50000008DA50000000Ad0s5
crw-rw---- 1 grid asmdba 118, 37 Mar 21 16:00 /dev/rdsk/c6t60060E80058DA50000008DA500000009d0s5
crw-rw---- 1 grid asmdba 118, 45 Mar 21 16:00 /dev/rdsk/c6t60060E80058DA50000008DA500000008d0s5
crw-rw---- 1 grid asmdba 118, 53 Mar 21 16:00 /dev/rdsk/c6t60060E80058DA50000008DA500000007d0s5



in sqlplus as sysasm

create diskgroup dgquorum
normal redundancy
failgroup q1 disk '/dev/rdsk/c6t60060E80058DA50000008DA500000007d0s5',
'/dev/rdsk/c6t60060E80058DA50000008DA500000008d0s5'
failgroup q2 disk '/dev/rdsk/c6t60060E80058DA50000008DA500000009d0s5',
'/dev/rdsk/c6t60060E80058DA50000008DA50000000Ad0s5'
quorum failgroup q3 disk '/dev/rdsk/c6t60060E80058DA50000008DA50000000Bd0s5',
'/dev/rdsk/c6t60060E80058DA50000008DA50000000Cd0s5'
attribute 'compatible.asm' ='11.2.0.0.0'
/



bash-3.00$ id

uid=101(grid) gid=101(oinstall)
crsctl replace votedisk +DGQUORUM
          Successful addition of voting disk 07dda8a1db214fd1bfb6e49562115215.
          Successful addition of voting disk 609aab5c66b54f74bf46850dc171107f.     
     Successful addition of voting disk 98f7325dbdb14fefbf9c8d0bfc3f6d14.
          Successful deletion of voting disk  
                                    55b5a82167444ff3bfa0ad1bdc8f99fb.
     Successfully replaced voting disk group with +DGQUORUM.CRS-4266: 
     Voting file(s) successfully replaced-


bash-3.00$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group-- ----- ----------------- --------- ---------1. 
ONLINE 07dda8a1db214fd1bfb6e49562115215 (/dev/rdsk/c6t60060E80058DA50000008DA50000000Bd0s5) [DGQUORUM]2. ONLINE 609aab5c66b54f74bf46850dc171107f (/dev/rdsk/c6t60060E80058DA50000008DA500000007d0s5) [DGQUORUM]ONLINE 98f7325dbdb14fefbf9c8d0bfc3f6d14 (/dev/rdsk/c6t60060E80058DA50000008DA500000009d0s5) [DGQUORUM]


Same thing for the OCR disk

this needs to be done as root ...
bash-3.00# /u01/app/11.2.0/grid/bin/ocrconfig -add +DGQUORUMbash-3.00# /u01/app/11.2.0/grid/bin/ocrconfig -add +DGQUORUMPROT-29: The Oracle Cluster Registry location is already configuredbash-3.00# /u01/app/11.2.0/grid/bin/ocrcheck
Status of Oracle Cluster Registry is as follows :Version : 3Total space (kbytes) : 262120Used space (kbytes) : 2976Available space (kbytes) : 259144ID : 809788976Device/File Name : +DATADG1Device/File integrity check succeededDevice/File Name : +DGQUORUMDevice/File integrity check succeeded

Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded










Wednesday, March 21, 2012

11gR2 new features (2) : creating diskgroups in ASM


received this error this morning while creating a diskgroup in ASM



create diskgroup ARCHDG1 external redundancy disk
 '/dev/rdsk/c6t60060E80058DA50000008DA50000000Cd0s0','/dev/rdsk/c6t60060E80058DA50000008DA50000000Dd0s0'


ORA-15260: permission denied on ASM disk group'

What happened I logged on with sysdba apparently from 11gR2 you need to logon with sysasm

so 
sqlplus '/ as sysasm' does the trick

Tuesday, March 20, 2012

be carefull with .bashrc scripts

It made the OUI fail this morning with OUI-35000 while installing the examples ;

"

SEVERE: OUI-35000: Fatal cluster error encountered (The Oracle base remains unchanged with value /u01/app/oracleThe Oracle base remains unchanged with value /u01/app/oracleThe Oracle base remains unchanged with value /u01/app/oracleThe Oracle base remains unchanged with value /u01/app/oracleThe Oracle base remains unchanged with value /u01/app/oracleThe Oracle base remains unchanged with value /u01/app/oracle). Correct the problem and try the operation again.
INFO: User Selected: Yes/OK
"

actually we auto execute the oraenv executable to set the environment right in the .bashrc script

when OUI is connecting to the other nodes it gets back

"The Oracle base remains unchanged with value /u01/app/oracle"
which is unexpected for him and it fails with the OUI-35000 error mentioned above




Sunday, March 18, 2012

issues encountered during OVM 3.0.3 install

Friday I finally received my lab pc, the colleaugues were so nice to assemble it for me ;) thanks guys !!!

When I came home I connected it to the PC connection of our Panasonic tv (VGA) in the living room. This worked great I could get into the BIOS and everything worked fine (that's what i tought)

Then I tried to boot from usb smart pen to install Oracle VM 3.0.3 however this didn't work, after trying different things I removed the CD Writer from my main PC and put it into the new pc. I put the iso of oracle VM in it and it started booting but as soon as I passed the first screen it stopped without any sign of life, ... weird... didn't have this with the W7 install I tried to ensure the screen was working.

After trying, retrying ...lots of things and bios options and getting more and more desperate, I moved the pc to my office (had to move from the livingroom) , I connected it to a normal monitor just for fun and guess what it booted further then the first Oracle VM screen, it actually started with the installation. You can imagine my joy, however that wouldn't last a long time....

Everything went smoothly (see the excellent installation instructions from oraclebase webmaster Tim Hall 
) until the partitioning of the disks....

there I received
 "OVM server cannot install on GPT boot partition"

Some research showed that this has to do with EFI disk labels

luckilly I found this otn topic were they explained what to do.

basically it comes down to remove the GPT label from the disk, one way to do this is by using a windows  install disk and execute what  is indicated in the otn topic.

After that everything went smootly, and oracle vm was up and running.
The only thing to make it a bit usable that had to be done was to install oracle VM manager, which i did in Oracle Virtual Box.

The things that remain open for the moment is how to install a software raid 5 on those 4 1,5 TB disks... but that will be something for later


I would like to thank @oraclebase,@ora600dude,@MartinDBA for their moral support on twitter !!!

Friday, March 9, 2012

RAT install on 10203

while testing the installation of  patch 9373986 on a test envir
i received following error


Patch [ 9373986 ] conflict with patch(es) [  4336528 4899479 5081798 5363584 5944955 5998544 6069085 ] in the Oracle Home.

To resolve patch conflicts please contact Oracle Support Services.
If you continue, patch(es) [  4336528 4899479 5081798 5363584 5944955 5998544 6069085 ] will be rolled back and the new Patch  [ 9373986 ] will be installed.


Ticket has been opened a MOS hopefully get an answer, client not in extended support, so I expect to have some fight with support.

UPDATE 12 march 2012 :
Development is working a merge patch for this issue. hopefully this will not raise more issue than it solves

UPDATE 14 march 2012 :

Nial Litchfield made a very good point:
Tweet :
"@pfierens @jarneil am I reading that correctly? To backport RAT requires removing previous mainline patches? Or are the old ones one-offs?"

and indeed two of the patches :
6069085 and 5998544 are part of bigger recommended patch bundles with regards of Logical and Physical Standby (6081550 and 6081547 )

Sunday, March 4, 2012

11gR2 new features (1)


There a nice new functionality in rman 11.2.0.3 i don't know when it was added
It checks the syntax rman script without executing.

i created following file called restore.rman  with just this :

restore database;
then I executed following :

oracle@oraclelinux TEST]$  rman checksyntax cmdfile="restore.rman"
Recovery Manager: Release 11.2.0.3.0 - Production on Thu Mar 1 16:57:49 2012
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
RMAN> restore database;2>The cmdfile has no syntax errors
Recovery Manager complete.


add following deliberate error to the file :


recover databae;

[TEST oracle@oraclelinux TEST]$  rman checksyntax cmdfile="restore.rman"
Recovery Manager: Release 11.2.0.3.0 - Production on Thu Mar 1 16:59:10 2012
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
RMAN> restore database;2> recover databaeRMAN-00571: ===========================================================RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============RMAN-00571: ===========================================================RMAN-00558: error encountered while parsing input commandsRMAN-01009: syntax error: found "identifier": expecting one of: "allow, archivelog, auxiliary, check, clear, copy, corruption, database, datafilecopy, datafile, delete, device, exclude, from, noparallel, noredo, parallel, preview, restore, skip readonly, standby, tablespace, test, undo, validate"RMAN-01008: the bad identifier was: databaeRMAN-01007: at line 2 column 9 file: restore.rman

I find this a usefull feature

Saturday, March 3, 2012

lab pc

Finally after long contemplation I finally ordered my Oracle VM lab pc.

this is the configuration

 Intel i7-3820 Quad Core Processor (2 threads per Core )
Video Card ATI Radeon HD5450
MotherBoard Gigabyte GA-X79-UD5 (8 dimm slots ==> MAX 64GB)
Antec Performance One P-280 Case
 changed by fractal design TTXHF3 (that was easier to get)
32 GB of RAM 
Extra network card
4 * 1,5 TB Seagate Drives 
1 * 500 GB Samsung install drive for oracle vm


Still think if i should get raid adapter or not ;-) 


UPDATE 18 MARCH 2012 with actual components


UPDATE 19 MARCH 2012 : probably should get raid adapter as it seems to be only supported way to get raid working under oracle vm ...,;( no software raid apparently

using para-virtualization with oracle

xen vs os

while doing my previous tests with _datafile_write_errors_crash_instance with  on a paravirtualized  linux (centos 5.7 64bit)  under xen server 5.5 I had some unexpected behaviour, basically

I removed the datafile and oracle, or better the Dbwriter didn't see it at all. I even could continue to create tables and data in that datafile.


i went to /proc/PID_OF_THE_DBWR/fd
and there did ...

ls -al

lrwx------ 1 oracle oinstall 64 Mar  2 14:44 260 -> /u02/MYDB/undotbs01.dbfsnipped ...lrwx------ 1 oracle oinstall 64 Mar  2 14:44 261 -> /u02/MYDB/users01.dbf (deleted)

and it was really deleted, but somehow it wasn't picked up. I was a bit astonished by this behavior and decided to redo the exercise on a non virtual linux Centos 5.6

there the same action almost immediately got picked up by oracle  and the datafile was brought offline or the instance crashed depending on how you set _datafile_write_errors_crash_instance.

Pretty bizar. 
an strace of the delete  on both systems revealed the same os calls.


My colleagues and I discussed this and our best guess is the Xen Server "optimizes" the data access an causes this behaviour.

We will test this next year with a pure virtualized linux ....
keep you posted

Friday, March 2, 2012

new behaviour in 11gR2 (2)

Finally had the time to check a change in behavour i blogged about some time ago,

there is a hidden parameter :
_datafile_write_errors_crash_instance
which now by default is set on TRUE while before it was put on FALSE. The consequence of this is that where oracle is unable to write to datafile it will crash instead of putting the datafile offline.

to simulate this I did following


echo 1 > /u02/MYDB/users01.dbf

when the checkpoint process wants to update the datafile header it can't and you see following appear in the alertlog


"
Errors in file /u01/app/oracle/diag/rdbms/mydb/MYDB/trace/MYDB_ckpt_16717.trc:
ORA-63999: data file suffered media failure
ORA-01115: IO error reading block from file 4 (block # 1)
ORA-01110: data file 4: '/u02/MYDB/users01.dbf'
ORA-27072: File I/O error
Additional information: 4
Additional information: 1
Errors in file /u01/app/oracle/diag/rdbms/mydb/MYDB/trace/MYDB_ckpt_16717.trc:
ORA-63999: data file suffered media failure
ORA-01115: IO error reading block from file 4 (block # 1)
ORA-01110: data file 4: '/u02/MYDB/users01.dbf'
ORA-27072: File I/O error
Additional information: 4
Additional information: 1
CKPT (ospid: 16717): terminating the instance due to error 63999
System state dump requested by (instance=1, osid=16717 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/mydb/MYDB/trace/MYDB_diag_16693.trc
Dumping diagnostic data in directory=[cdmp_20120301163246], requested by (instance=1, osid=16717 (CKPT)), summary=[abnormal instance termination].
2012-03-01 16:32:47.071000 +01:00
Instance terminated by CKPT, pid = 16717
2012-03-01 16:33:06.509000 +01:00
"


you can change this behaviour by issuing

alter system set "_datafile_write_error_crash_instance"=FALSE;

and then you get this behaviour :



Thread 1 advanced to log sequence 71 (LGWR switch)
  Current log# 2 seq# 71 mem# 0: /u01/oradata/TEST/redo02.log
Archived Log entry 11 added for thread 1 sequence 70 ID 0x7bacbbe8 dest 1:
2012-03-02 10:43:57.302000 +01:00
Errors in file /u01/app/oracle/diag/rdbms/test/TEST/trace/TEST_ckpt_23366.trc:
ORA-01171: datafile 4 going offline due to error advancing checkpoint
ORA-01115: IO error reading block from file 4 (block # 1)
ORA-01110: data file 4: '/u01/oradata/TEST/users01.dbf'
ORA-27072: File I/O error
Additional information: 4
Additional information: 1
Checker run found 1 new persistent data failures
2012-03-02 10:45:21.345000 +01:00

note :
just corrupting the datafile and leaving the header intact doesn't bring the instance down instead you will see following

"
Hex dump of (file 4, block 2) in trace file /u01/app/oracle/diag/rdbms/test/TEST/trace/TEST_ora_2496.trcCorrupt block relative dba: 0x01000002 (file 4, block 2)Completely zero block found during buffer readReading datafile '/u01/oradata/TEST/users01.dbf' for corruption at rdba: 0x01000002 (file 4, block 2)Reread (file 4, block 2) found same corrupt data (no logical check)Corrupt Block Found         TSN = 4, TSNAME = USERS         RFN = 4, BLK = 2, RDBA = 16777218         OBJN = 1, OBJD = -1, OBJECT = , SUBOBJECT =         SEGMENT OWNER = , SEGMENT TYPE =2012-03-02 10:47:08.087000 +01:00Errors in file /u01/app/oracle/diag/rdbms/test/TEST/trace/TEST_ora_2496.trc  (incident=3781):ORA-01578: ORACLE data block corrupted (file # 4, block # 2)ORA-01110: data file 4: '/u01/oradata/TEST/users01.dbf'Incident details in: /u01/app/oracle/diag/rdbms/test/TEST/incident/incdir_3781/TEST_ora_2496_i3781.trc2012-03-02 10:47:11.750000 +01:00Hex dump of (file 4, block 1) in trace file /u01/app/oracle/diag/rdbms/test/TEST/incident/incdir_3781/TEST_ora_2496_i3781.trcCorrupt block relative dba: 0x00000001 (file 4, block 1)Completely zero block found during validating datafile for block rangeReread of blocknum=1, file=/u01/oradata/TEST/users01.dbf. found same corrupt dataReread of blocknum=1, file=/u01/oradata/TEST/users01.dbf. found same corrupt dataReread of blocknum=1, file=/u01/oradata/TEST/users01.dbf. found same corrupt dataReread of blocknum=1, file=/u01/oradata/TEST/users01.dbf. found same corrupt dataReread of blocknum=1, file=/u01/oradata/TEST/users01.dbf. found same corrupt dataErrors in file /u01/app/oracle/diag/rdbms/test/TEST/incident/incdir_3781/TEST_ora_2496_i3781.trc:ORA-19563: datafile header validation failed for file /u01/oradata/TEST/users01.dbfORA-01251: Unknown File Header Version read for file number 4ORA-01578: ORACLE data block corrupted (file # 4, block # 2)ORA-01110: data file 4: '/u01/oradata/TEST/users01.dbf'Errors in file /u01/app/oracle/diag/rdbms/test/TEST/trace/TEST_ora_2496.trc  (incident=3782):ORA-01578: ORACLE data block corrupted (file # 4, block # 2)
"