Monday, February 11, 2019

opatch check the patches included in a patch

I was aware of opatch lsinventory -bugs_fixed option to check the patches applied on an oracle home but didn't know you could also do this on a downloaded patch(set)


opatch lspatches -bugs  


note that if it is a bundle patch such as 12.1.0.2.19015DBBP  you need to go in the main patchset number and point to the individual patches.


in our case :


opatch lspatches -bugs  ./unzipped/28833531/28729220
patch_id:28729220
unique_patch_id:22494611
date_of_patch:10 Oct 2018, 18:36:59 hrs PST8PDT
patch_description:ACFS PATCH SET UPDATE 12.1.0.2.190115 (28729220)
component:oracle.usm,12.1.0.2.0,optional
platform:226,Linux x86-64
instance_shutdown:true
online_rac_installable:true
patch_type:bundle_member
product_family:db
auto:false
bug:19452723, NEED TO FIX THE SUPPORTED VERSIONS FOR KA ON LINUX
bug:18900953, CONFUSING AFD MESSAGE IN THE GI ALERT LOG
bug:23625427, DLM_PANIC_MSG  <INVALID CHECKSUM>
bug:24308283, AFD FAILED TO SEND OUT UNMAP WHILE USING PARTITIONS IN 12.1.0.2.0 CODE LINE
bug:26882237, ODA  SBIN ACFSUTIL SNAP INFO FAILS WITH   ACFS-03044  FAILED TO OPEN MOUNT POINT
bug:26396215, FSCK CHANGES NEEDED TO IMPROVE PERFORMANCE ON MANY TB SIZED FILE SYSTEMS
bug:28142134, RETPOLINE SUPPORT FOR SLES - ACFS - USM - SPECTRE
bug:25381434, SLES12 SP2 SUPPORT FOR ACFS
bug:23639692, LNX64-112-CMT  HEAP CORRUPTION RELOCATING ACL VOLUME
bug:18951113, AFD FILTERING STATUS IS NOT PERISTENT ACROSS NODE REBOOT
bug:22810422, UEKR4 SUPPORT FOR ACFS
bug:21815339, OPNS PANIC AT OFSOBFUSCATEENCRPRIVCTXT WITH ACTIVE ENCR STRESS TEST
bug:20923224, AFD LINUX SHOULD ISSUE IO WITH 512 SECTOR ADDRESSING
bug:26275740, DIAGNOSIBILITY   AUTOMATICALLY DUMPSTATE AND DUMPSTATS ON FILE SYSTEM INCIDENT
bug:19517835, KA+EF:TEST HANG MIGHT BE RELATED TO LOST MESSAGES TO KA DURING MULTI-BLOCK READ
bug:21474561, LINUX DRIVER SIGNING SUPPORT
bug:18185024, T5 SSC: MACHINE PANIC IN KOBJ_LOAD_MODULE DURING GRID INSTALL
bug:28111958, ACFS-1022 DESPITE BUG FIX

......



Out of place GI upgrade on Exadata OVM



The client I am currently working for wanted to patch their Exadata’s to the latest and greatest patchset that came out 1,5 weeks ago.


This QFSDP January 2019, upgrades the  GI from 12.2.0.1.180116
To  12.2.0.1.190115.

We followed the Oracle recommendation to patch out of place however when we tried to use the same method as last time when we went from 12.1 to 12.2  … as indicated in note

12.2 Grid Infrastructure and Database Upgrade steps for Exadata Database Machine running 11.2.0.3 and later on Oracle Linux (Doc ID 2111010.1)


That didn’t work unfortunately because we aren’t doing an upgrade but just a patch


My colleague tried to use OPatchauto  -prepare-clone etc… but ran into issues


After a while I found that there is a -switchHome option with gridSetup.sh

So basically executing that from your new home by specifying :


./gridSetup.sh -switchHome -silent



So these are the steps we followed :


  • Download golden image via MOS note: 888828.1 
  • Create a disk image and partition it on the Dom0. 
  • Create DomU specific RefLink. 
  • Mount the new device on the DomU. 
  • Install the patched software of the 12.2 GI (executed as GI-owner). 
  • Adapt template response file (generated via interactive mode on the first node of the first DomU). 
  • Set environment correct for existing GI. 
  • unset ORACLE_HOME ORACLE_BASE ORACLE_SID 
  • cd /u01/app/12.2.0.1_190115/grid (which is the new GI HOME) 
  • ./gridSetup.sh -silent -responseFile /home/grid/grid_install_12.2.0.1.190115.rsp 
  • Execute root.sh script as indicated on the screen (as root) on the local node only. 
  • Repeat this procedure on the second node. 
  • The actual switch of the existing GI HOME towards the new GI HOME (executed as GI-owner). 
  • Check if ASM rebalance is active. If so wait… and retry later. 
  • unset ORACLE_HOME ORACLE_BASE ORACLE_SID 
  • cd /u01/app/12.2.0.1_190115/grid (which is the new GI HOME) 
  • ./gridSetup.sh -switchGridHome -silent 
  • Check new binaries are relinked with RDS (if not relink). 
  • Execute root.sh script as indicated on the screen (as root) first on the local node and after that on the second node. ==> takes a while

Friday, April 20, 2018

Setting up standby of PDB 12.2.0.1.170818

We have a single tenant production database

We wanted to create a standby database for it.

That worked perfectly fine :





SET DECRYPTION WALLET OPEN IDENTIFIED BY “password”;
run
{
 allocate channel target01 type disk;
allocate channel target02 type disk;
allocate channel target03 type disk;
allocate channel target04 type disk;
ALLOCATE auxiliary CHANNEL aux01 TYPE DISK;
ALLOCATE auxiliary CHANNEL aux02 TYPE DISK;
ALLOCATE auxiliary CHANNEL aux03 TYPE DISK;
ALLOCATE auxiliary CHANNEL aux04 TYPE DISK;
duplicate target database for standby from active database password file section size 2000M;
}
However ..... my datafiles where not put in place correctly on the standby on the primary it had this structure
+DATA/db_unique_name/DATAFILE 

+DATA/db_unique_name/pdb_guid/DATAFILE

on the Standby on the other hand .... Everything also the PDB datafiles in
+DATA/db_unique_name/DATAFILE
After checking all my parameters I went to look on MOS apparently known issue




Datafile on Standby Database is created under incorrect directory (Doc ID 2346623.1)
Patches available
Update :
Installed the patch and still the same issue ! :-( Opening SR
Update 2 : Twitter is fantastic Piotr Wrzosek pointed me to bug

INCONSISTENT OMF BEHAVIOUR WITH ALTER DATABASE MOVE DATAFILE bug 17613474
https://twitter.com/pewu78/status/987335828958564352
It seems to be present already for some time however no patch available for our version it seems ;(
Patch 24836489: DATAFILES ARE CREATED WRONG LOCATION IN OMF DEFINED PDB DATABASE



Update and solution : REMOVE SECTION SIZE from the RMAN duplicate then the files are created correctly Need to rollback patch to see if just REMOVING section size is enough or if the above mentioned patch 25576813 is still needed



Tested without the patch seems that section size is the culprit

Thursday, April 12, 2018

QFSDP JANUARY 2018 on Exadata with OVM with more then 8 vm's watch out

Last Month we installed the OS related, firmware, .... part of the QFSDP JANUARY 2018 12.2.1.1.6 to be more specific, on our test and dev system at my customer.

GI and DB still need to be patched.

After our last patching experience http://pfierens.blogspot.be/2017/06/applying-april-2017-qfsdp-12102-exadata.html this only could go better.


Well to put a very long story short be cautious we ran into now 4 different bugs, causing instability of the RAC clusters, GI that refused to startup, loss of Infiniband Connectivity ...


So the Monday after the patching we were hit by instability of our Exadata OVM infrastructure for Test and Dev and Qualification. Dom0 rebooting ....

There seemed also to be an issue on IB interfaces in the domU, unfortunately
we didn't have a crash dump so support couldn't really do something.


The only way to get GI and DB's up again was to reboot the VM, crsctl stop crs and start crs didn't really work logs showed IB issues


Last time (forgot to blog about that ) we ran into the gnttab_max_frames issue which we had set to 512 after this patching it was put to 256 so we thought that might have been the reason, because in this release another parameter was introduced in grub.conf.



gnttab_max_maptrack_frames
gnttab_max_frames

the relation between the two was difficult to find but in the end this seem not to be the right diagnosis

if you want some more information about the gnttab_max_frames please read this
shortly put each virtual disk needs and networking operations needs a number of frames granted to communicate if this is not correctly set then you have issues ....


Luckily the Friday in that same week we were in the same situation, we decided to let the dom0 crash and that way have a crashdump.

After uploading that crashdump to Support the where able to see that issue was on Melanox HCA Firmware layer. between APR 2017 and January there where 4000 changes in that Firmware that happened which one or combination caused our issue.



Bottom line : There seem to be issue with the melanox HCA firmware (from 2.11.1280 to 2.35.5532.)
in this patch, you may encounter it if you have more then 8 vm's under one dom0, we had 9......



so basically we shutdown one vm on each node and had again stability.

when it was confirmed in numerous conf calls that  8 was  the magic number we decided to move the exadata monitoring vm functionality to another vm and shutdown the monitoring vm, to be again at 8 vm's


we got a stable situation until last Friday where we had an issue with both IB switches being unresponsive and the second switch not take the sm master role, this issue is still under investigation and hopefully not related to the QFSDP JAN 2018 ...



If you have similar symptoms point support to bugs :

  Bug 27724899 - Dom0 crashes with ib_umad_close with large no. of VMs 
  Bug 27691811 
  Bug 27267621 


UPDATE :

There seem to be a bug as well in the IB switch version 2.2.7-1 solved in 2.2.10 (not released yet) not everything is solved only the logging issue but not the main root cause apparently there is a separate ER for this






Sunday, April 1, 2018

to_dog_year on Exadata

One of the new features which seem to be overlooked in all the publications I saw about 18c is the TO_DOG_YEAR() function. It seems obvious that this was missed, because it fairly undocumented as Frank Pachot Pieter Van Puymbroeck Brendan Tierney Martin Berger Oyvind Isene pointed out.


I wanted to know how it behaved on the Exadata especially on my customers OVM on Exadata. tested version : Exadata 18.1.4.0.0 my first tries where not successfully and still are not successful i see quite some strange behaviour.
desc dogyear

FUNCTION TO_DOG_YEAR RETURNS NUMBER
 Argument Name        Type        In/Out Default?
 ------------------------------ ----------------------- ------ --------
 DOB                            DATE                    IN
 FEMALE                         BOOLEAN                 IN      DEFAULT
 NLS_BREAD                      VARCHAR2                IN      DEFAULT
 OWN_SMOKE                      BOOLEAN                 IN      DEFAULT  

i tried following :
select to_dogyear(to_date(‘01-04-2008','DD-MM-YYYY’) , ‘LABRADOR’) from dual ;


and this raised following :

ORA-07445: exception encountered: core dump [DOG_BREED_VIOLATION] [+34] [PC:0xDEADBEEF] [ADDR:0x14] [UNABLE_TO_BARK] [] 

when choosing another breed it worked although it gave a pretty bizarre result
select to_dogyear(to_date('28-03-2013','DD-MM-YYYY') , ‘POODLE’) a from dual ;

a
-------------------------
vulnerability not found

while according to pedigree it should be 50 and not only that it should RETURN a Number ? WTH ok what’s happen when we try to run the function on cats
select to_dogyear(to_date(‘01-04-2008','DD-MM-YYYY’), ‘GARFIELD’) a from dual ;


a
------------------------
is this a dog ?

Oracle you have some work to do. I would expect a number to be returned not a string Does anybody else with an Exadata see this behaviour preferably Bare Metal ? Cloud ?


Update 2-APR-2018 Before you think that PIO is Pets In Oracle A little update for those who didn’t realize this was posted on 1st of April. It was an April Fool common idea from some Oracle Community buddies on the post-UKOUG_TECH17 trip. And what remains true all the year is how this community is full of awesome people. And special thanks to Connor who added great ideas here :)

Thursday, October 19, 2017

OUD and 12.2 how to get your db registered

We are slowly moving to 12.2 some new products will start of with this version, Single tenant for the moment.

The customer uses OUD and Global Roles to manage centrally access to databases and database objects.

One of the first things that needs to be done is register the database in oud.

I tried to do it and then I got this message :




Ok we are a bit stuck here.
The database needs to be in OUD before we can use global roles and global users .... The are no listeners associated with this database doesn't help here ;-)
I tried different things to work around it adapted the listener.ora file with static registration but didn't work.

Finally after opening an SR the engineer suggested to put a <> listener.ora in /tmp and point TNS_ADMIN to that location. And yes that solved this issue I was able to register the database in OUD.


LISTENER=
  (DESCRIPTION=
    (ADDRESS_LIST=
      (ADDRESS=(PROTOCOL=tcp)(HOST=exa*******)(PORT=1521))
      (ADDRESS=(PROTOCOL=ipc)(KEY=extproc))))
SID_LIST_LISTENER=
  (SID_LIST=
    (SID_DESC=
      (GLOBAL_DBNAME=c20*****.acme.com)
      (ORACLE_HOME=/u01/app/product/12.2.0.1/dbhome1)
      (SID_NAME=c20*****1)
    )
    (SID_DESC=
      (GLOBAL_DBNAME=pdb*****.acme.com)
      (ORACLE_HOME=/u01/app/product/12.2.0.1/dbhome1)
      (SID_NAME=c20*****1)
    )
)


btw it is every easy to do this in a silent way :

dbca -silent -configureDatabase -sourceDB c20v01d01 -registerWithDirService true -dirServiceUserName "cn=Directory Manager" -dirServicePassword "" -walletPassword "" -sysDBAPassword  -sysDBAUserName sys





Must say this time I had a very good experience with Oracle Support


Thursday, June 22, 2017

To Data Guard First or Not to Data Guard First that is the question

In a previous post you could read about issues with IB switches and other problems with the APR QFSDP 12.1

we had some more surprises.

All BP we installed so far are Data Guard First enabled meaning you can install them on the standby do a switchover, bring your new standby at your pace to the same level and do a data patch.


Well we did that as for all the other QFSDP's so far.

but after the switchover ALL of a sudden our standby's didn't follow anymore and aborted recovery with an ORA-00600 ....

we clearly ran into this issue :

ORA-00600:[KTSLU_PUA_REMCHK-1] Could be generated after Applying April 2017 Database Bundle Patch (12.1.0.2.170418 DBBP) (Doc ID 2267842.1)

Note :




We fixed our issue by just applying the BP on the unpatched home we didn't add the extra patch.

The key is to re-read the documentation several times, however wouldn't it be nice if oracle support could send you a mail they have records of everything you downloaded anyways, judging the sales people that call each time i download new stuff ;-)

this would be a great service !

Another great service would actually to test patches ... and test if a patch is DG first here the second patch was release a long time after the APR bundle making you wonder if they actually tested this patch upfront  in a DG environment and did a switchover