Wednesday, May 15, 2019

patching Exadata from 18.1.12.0.0.190111 version to 19.2.1.0.0.190419

As you know the 19 release of Exadata is a big one, it upgrades the linux distribution from Oracle Enterprise Linux 6 to 7. We have an OVM Based Exadata and in the progress of testing it. I will write some more this week about the process to upgrade to 19c so far we upgraded : * our cells * dom0 and are now in the process of upgrading the domU's however today we ran into a funny issue, for which we are waiting for the solution, 18.1.12 comes with linux release cat /etc/redhat-release on the upgraded system shows us
Red Hat Enterprise Linux Server release 6.10 (Santiago) 
However the pre upgrade check tool during the upgrade complains on this Make sure you Read Andy Colvin's blog about this release you might need to redownload the patch as it was re-released More to come about this patching later UPDATE : in the meantime yesterday 15 may 2019 a new QFSDP was released

Thursday, May 2, 2019

moving Exadata vm to a new node

The customer I am working for has OVM on Exadata for the dev qualification and testing Exadata we recently added two compute nodes. The main driver was RAM, it was less ( or equally priced) expensive to add two new nodes then extending the RAM ( removing existing DIMM and change them with higher capacity ones)


 We now have : 4 X6 nodes and two brand new X7 nodes (we just got them when X8 was released :-( ) there are quite some database running here and we since they are using the same cells no need to duplicate them.

 My colleague Freek D'hooge pointed me to this document : Moving a User Domain to a Different Database Server


 Ok that procedure worked as a charm, at first sight, only infiniband in the vm didn't come up. lspci didn't show us the IB card. luckily there was one vm during the installation of the extra nodes on this vm and we could start comparing the configuration : in the vm.cfg of the vm we "copied" from the X6" we saw following


ib_pfs = ['03:00.0']
ib_pkeys = [{'pf':'03:00.0','port':'1','pkey':['0xffff',]},{'pf':'03:00.0','port':'2','pkey':['0xffff',]},]


on the vm that was working on the X7
ib_pfs = ['3b:00.0']
ib_pkeys = [{'pf':'3b:00.0','port':'1','pkey':['0xffff',]},{'pf':'3b:00.0','port':'2','pkey':['0xffff',]},]


once we put this in the copied vm everything booted afterwards we understood why, my colleague Fred pointed out on the source dom0 (X6)


lspci |grep -i infiniband
03:00.0 InfiniBand: Mellanox Technologies MT27500 Family [ConnectX-3]
03:00.1 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
03:00.2 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
03:00.3 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
03:00.4 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
03:00.5 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
03:00.6 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
03:00.7 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
03:01.0 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

....

on target dom0 (X7)


lspci | grep -i 'infiniband'
3b:00.0 InfiniBand: Mellanox Technologies MT27500 Family [ConnectX-3]
3b:00.1 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
3b:00.2 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
3b:00.3 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
3b:00.4 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
3b:00.5 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
3b:00.6 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
3b:00.7 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
3b:01.0 InfiniBand: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]



on an X4 it is even differnt


so basically I think this should be amended to the otherwise flawless document mentioned before

Hope this helps

Monday, March 11, 2019

Issue / bug with TDE encrypted PDB mixed with non encrypted PDB's in same container



We are currently on 12.2 and since a couple of months using multitenant, with more then 1 PDB in the CDB before we were using just 1 PDB.

From time to time the Test PDB's need to be refreshed with production data.

Options to refresh PDB's are limited in 12.2 (much better in 18c ) .


We Currently use this :



create pluggable database kabouter_plop from kuifje@PRODUCTION.ACME.COM
SERVICE_NAME_CONVERT = (
xxx,xxx
...
)
LOGGING
PARALLEL 8<
KEYSTORE IDENTIFIED BY ";
note the last line in this command is necessary if you have TDE enabled on your PDB
We have in production one CDB with two PDB's:

PDB1 has      TDE encrypted tablespaces and key set ( as outlined here )
PDB2 has no TDE encrypted tablespaces and no key set ( the alert.log is  called Kuifje on production and kabouter_plop on testing )

We tried to "refresh" PDB2 We received following errors ( with or without specifying the KEYSTORE IDENTIFIED clause )
ORA-28374
ORA-283

and didn't know why

here an excerpt of the alert.log

Endian type of dictionary set to little

2019-03-11 12:53:03.395000 +00:00

****************************************************************

Pluggable Database KABOUTER_PLOP with pdb id - 3 is created as UNUSABLE.

If any errors are encountered before the pdb is marked as NEW,

then the pdb must be dropped

local undo-1, localundoscn-0x00000000000000e0

****************************************************************

2019-03-11 12:53:15.973000 +00:00

Applying media recovery for pdb-4099 from SCN 91372363956 to SCN 91372887378
Remote log information: count-5
thr-2, seq-4950, logfile-+RECO4/L1/ARCHIVELOG/2019_03_11/thread_2_seq_4950.1360.1002631629, los-91372165870, nxs-91372721676
thr-1, seq-6955, logfile-+RECO4/L1/ARCHIVELOG/2019_03_11/thread_1_seq_6955.1359.1002631571, los-91372412147, nxs-91372715223
thr-1, seq-6954, logfile-+RECO4/L1/ARCHIVELOG/2019_03_11/thread_1_seq_6954.1358.1002630879, los-91372165747, nxs-91372412147
thr-2, seq-4951, logfile-+RECO4/L1/partial_archivelog/2019_03_11/thread_2_seq_4951.1361.1002631985, los-91372721676, nxs-18446744073709551615
thr-1, seq-6956, logfile-+RECO4/L1/partial_archivelog/2019_03_11/thread_1_seq_6956.1362.1002631993, los-91372715223, nxs-18446744073709551615
attach called for domid 3 (domuid: 0x8df2051, options: 0x0, pid: 78883)
queued attach broadcast request 0x82cec698
* allocate domain 3, valid ? 1
all enqueues go to domain 0
Media Recovery Start
Serial Media Recovery started
Media Recovery Log +RECO4/L1/ARCHIVELOG/2019_03_11/thread_2_seq_4950.1360.1002631629
2019-03-11 12:53:17.047000 +00:00
Media Recovery Log +RECO4/L1/ARCHIVELOG/2019_03_11/thread_1_seq_6954.1358.1002630879
2019-03-11 12:53:21.913000 +00:00
Errors with log +RECO4/L1/ARCHIVELOG/2019_03_11/thread_1_seq_6954.1358.1002630879
2019-03-11 12:54:43.280000 +00:00
Media Recovery failed with error 28374
detach called for domid 3 (domuid: 0x8df2051, options: 0x0, pid: 78883)
queued detach broadcast request 0x82cec640
freeing rdom 3

ORA-283 signalled during: create pluggable database kabouter_plop from kuifje@PRODUCTION.ACME.COM

SERVICE_NAME_CONVERT = (
xxxx
)

LOGGING
parallel 8
KEYSTORE IDENTIFIED BY *...

Note that this PDB has no encryption enabled.
We tried to reproduce this behaviour on another environment but didn't succeed. 

We tried again to refresh from production with alert log opened of both source and target and saw that at the time of the failure on there was quite some activity on the other PDB with encryption enabled and quiite some redo was generated.
Once we stopped the activity on the other live db we were able to create the pdb correctly.
As a next test will test if we see the same behaviour when the key is enabled in the non TDE enabled PDB. Will update article to keep you posted
Update : We now get some other weird error ORA-01144 : File size (31457280 blocks) exceeds maximum of 4194303 blocks it seems Bigfile Tablespaces are being ignored ? The version we are working on is 12.2.0.1.190115

Monday, February 11, 2019

opatch check the patches included in a patch

I was aware of opatch lsinventory -bugs_fixed option to check the patches applied on an oracle home but didn't know you could also do this on a downloaded patch(set)


opatch lspatches -bugs  


note that if it is a bundle patch such as 12.1.0.2.19015DBBP  you need to go in the main patchset number and point to the individual patches.


in our case :


opatch lspatches -bugs  ./unzipped/28833531/28729220
patch_id:28729220
unique_patch_id:22494611
date_of_patch:10 Oct 2018, 18:36:59 hrs PST8PDT
patch_description:ACFS PATCH SET UPDATE 12.1.0.2.190115 (28729220)
component:oracle.usm,12.1.0.2.0,optional
platform:226,Linux x86-64
instance_shutdown:true
online_rac_installable:true
patch_type:bundle_member
product_family:db
auto:false
bug:19452723, NEED TO FIX THE SUPPORTED VERSIONS FOR KA ON LINUX
bug:18900953, CONFUSING AFD MESSAGE IN THE GI ALERT LOG
bug:23625427, DLM_PANIC_MSG  <INVALID CHECKSUM>
bug:24308283, AFD FAILED TO SEND OUT UNMAP WHILE USING PARTITIONS IN 12.1.0.2.0 CODE LINE
bug:26882237, ODA  SBIN ACFSUTIL SNAP INFO FAILS WITH   ACFS-03044  FAILED TO OPEN MOUNT POINT
bug:26396215, FSCK CHANGES NEEDED TO IMPROVE PERFORMANCE ON MANY TB SIZED FILE SYSTEMS
bug:28142134, RETPOLINE SUPPORT FOR SLES - ACFS - USM - SPECTRE
bug:25381434, SLES12 SP2 SUPPORT FOR ACFS
bug:23639692, LNX64-112-CMT  HEAP CORRUPTION RELOCATING ACL VOLUME
bug:18951113, AFD FILTERING STATUS IS NOT PERISTENT ACROSS NODE REBOOT
bug:22810422, UEKR4 SUPPORT FOR ACFS
bug:21815339, OPNS PANIC AT OFSOBFUSCATEENCRPRIVCTXT WITH ACTIVE ENCR STRESS TEST
bug:20923224, AFD LINUX SHOULD ISSUE IO WITH 512 SECTOR ADDRESSING
bug:26275740, DIAGNOSIBILITY   AUTOMATICALLY DUMPSTATE AND DUMPSTATS ON FILE SYSTEM INCIDENT
bug:19517835, KA+EF:TEST HANG MIGHT BE RELATED TO LOST MESSAGES TO KA DURING MULTI-BLOCK READ
bug:21474561, LINUX DRIVER SIGNING SUPPORT
bug:18185024, T5 SSC: MACHINE PANIC IN KOBJ_LOAD_MODULE DURING GRID INSTALL
bug:28111958, ACFS-1022 DESPITE BUG FIX

......



Out of place GI upgrade on Exadata OVM



The client I am currently working for wanted to patch their Exadata’s to the latest and greatest patchset that came out 1,5 weeks ago.


This QFSDP January 2019, upgrades the  GI from 12.2.0.1.180116
To  12.2.0.1.190115.

We followed the Oracle recommendation to patch out of place however when we tried to use the same method as last time when we went from 12.1 to 12.2  … as indicated in note

12.2 Grid Infrastructure and Database Upgrade steps for Exadata Database Machine running 11.2.0.3 and later on Oracle Linux (Doc ID 2111010.1)


That didn’t work unfortunately because we aren’t doing an upgrade but just a patch


My colleague tried to use OPatchauto  -prepare-clone etc… but ran into issues


After a while I found that there is a -switchHome option with gridSetup.sh

So basically executing that from your new home by specifying :


./gridSetup.sh -switchHome -silent



So these are the steps we followed :


  • Download golden image via MOS note: 888828.1 
  • Create a disk image and partition it on the Dom0. 
  • Create DomU specific RefLink. 
  • Mount the new device on the DomU. 
  • Install the patched software of the 12.2 GI (executed as GI-owner). 
  • Adapt template response file (generated via interactive mode on the first node of the first DomU). 
  • Set environment correct for existing GI. 
  • unset ORACLE_HOME ORACLE_BASE ORACLE_SID 
  • cd /u01/app/12.2.0.1_190115/grid (which is the new GI HOME) 
  • ./gridSetup.sh -silent -responseFile /home/grid/grid_install_12.2.0.1.190115.rsp 
  • Execute root.sh script as indicated on the screen (as root) on the local node only. 
  • Repeat this procedure on the second node. 
  • The actual switch of the existing GI HOME towards the new GI HOME (executed as GI-owner). 
  • Check if ASM rebalance is active. If so wait… and retry later. 
  • unset ORACLE_HOME ORACLE_BASE ORACLE_SID 
  • cd /u01/app/12.2.0.1_190115/grid (which is the new GI HOME) 
  • ./gridSetup.sh -switchGridHome -silent 
  • Check new binaries are relinked with RDS (if not relink). 
  • Execute root.sh script as indicated on the screen (as root) first on the local node and after that on the second node. ==> takes a while