Saturday, February 14, 2015

patching super cluster


we patched our X4 cells for to the latest and greatest version 12.1.2.1.0 last weekend 
we had some issues with a cell that wasn’t reachable anymore during patching luckily after an init 6 of that cell the patching  continued, 
unfortunately the default 3,6 h asm disk time out was reached and the disks where removed from the disk group…

the next day we needed to upgrade DEV with GI upgrade from 11.2.0.4 to 12.1.0.2 and afterwards the 12.1.0.2.4 upgrade of january… the we upgrade the dbs from 11.2.0.4.12 to BP15 11.2.0.4.15

all went ok just one big caveat (it is in the patch notes ), once you upgrade to GI 12c the opatch  auto doesn’t work anymore for the db lower the 12d, so we had to apply those patches manually without opatch auto

so after a long day all database where patched ( mostly because i could start late, it take a while to stop all applications)…

the next day however asm had generated 64GBytes for core dumps, 

support found quickly that this was related to 

  Bug 20313024 - Exadata Solaris: ORA-7445 [ossdisk_ioctl_compl] on XDMG startup with 12.1.0.2.4 DBBP ( Doc ID 20313024.8 ) 


but then the worst still needed to follow

the unlocking of the GI and patch apply of the patch went fine, however 

 /u01/app/grid/product/12.1.0/grid/crs/install/roothas.pl -patch  didn’t complete…….

not knowing if I could stop this i opened an sr where they told me that it could be done, reexectuting didn’t change anything.

After a while I found info in the logs that point to evm issues i gave this observation  to support 

and also told them that i found old-er files in /var/tmp/.oracle and asked if i could remove them
they said no several times. (however apparently i blogged already about this years ago here;)
In the mean time i had to increase severity to 1 because of coming dev deadline the system was already 1 day longer down than expected ...
Had some phone calls with support in US, one guy didn't know the logging changed from $GRID_HOME/log to $ORACLE_BASE/diag ....
asked them again if i could remove the .oracle files, but again no was the answer because i could break things.

next morning got contacted by support pointed again to .oracle files and there he said that it was ok to remove them 
I did and could execetute roothas.pl -patch after rebooting with inittab entries about ohas disabled  removing those entries in /var/tmp/.oracle 


the point was that i mentioned this very early in the SR and nobody listened....

anyway was happy that everything was running back again, and i hope that by blogging about this it will help other people and  that I will remember the next time i have this issue ;-)