Thursday, October 22, 2015

ORA-00600 [ktfbns_update_ilmstat2] during impdp

I was asked to do an datapump import from a 11.2.0.4 BP16 to 12.1.0.2.7 from a couple of schema's. everything went fine but all of a sudden during the import following happened
Processing object type SCHEMA_EXPORT/SEQUENCE/SEQUENCE
Processing object type SCHEMA_EXPORT/TABLE/TABLE
ORA-39014: One or more workers have prematurely exited.
ORA-39029: worker 1 with process name "DW00" prematurely terminated
ORA-31671: Worker process DW00 had an unhandled exception.
ORA-00600: internal error code, arguments: [ktfbns_update_ilmstat2], [262161], [], [], [], [], [], [], [], [], [], []
ORA-06512: at "SYS.DBMS_SQL", line 1185
ORA-06512: at "SYS.KUPW$WORKER", line 22791
ORA-06512: at "SYS.KUPW$WORKER", line 21996
ORA-06512: at "SYS.KUPW$WORKER", line 21553
ORA-06512: at "SYS.KUPW$WORKER", line 4516
ORA-06512: at "SYS.KUPW$WORKER", line 12063
ORA-06512: at "SYS.KUPW$WORKER", line 2081
ORA-06512: at line 2
 

ORA-0600 .... generally that is not a good sign, luckily the instance was still up. the ORA-0600 checker on metalink ORA-600/ORA-7445/ORA-700 Error Look-up Tool (Doc ID 153788.1) didn't know this argument. I opened an SR in parallel with all the possible info and I started to have a look at the dump files but that didn't help either. It occurred to me that ilm, Information Life Cycle Management or Heat Map is a new 12c feature. However this feature wasn't enabled on our Target.

heat_map                       OFF


So this was a surprise.


Still the ORA-0600 seemed to give an error while calculating these stats, I quickly read some docs about this feature.
select * from dba_ilmparameters;
 
NAME                                VALUE
------------------------------ ----------
ENABLED                                 1
RETENTION TIME                         30
JOB LIMIT                               2
EXECUTION MODE                          2
EXECUTION INTERVAL                     15
TBS PERCENT USED                       85
TBS PERCENT FREE                       25
POLICY TIME                             0

I used the procedure to disable
 
DBMS_ILM_ADMIN
 
 
sys@WDWSI> begin
  2  dbms_ilm_admin.DISABLE_ILM;
  3  end;
  4  /
 
PL/SQL procedure successfully completed.
 
sys@WDWSI> select * from dba_ilmparameters;
 
NAME                                VALUE
------------------------------ ----------
ENABLED                                 2
RETENTION TIME                         30
JOB LIMIT                               2
EXECUTION MODE                          2
EXECUTION INTERVAL                     15
TBS PERCENT USED                       85
TBS PERCENT FREE                       25
POLICY TIME                             0
 
8 rows selected.
Retried the import => NOK I checked which undocumented parameters where set with regards to ILM in the description
Parameter                     Instance  Description
----------------------------- --------- -------------------------------------------
_ILM_FILTER_TIME              0         Upper filter time for ILM block compression
_ILM_FILTER_TIME_LOWER        0         Lower filter time for ILM block compression
_ILM_POLICY_NAME              FALSE     User specified ILM policy name
_create_stat_segment          0         create ilm statistics segment
_disable_12cbigfile           FALSE     DIsable Storing ILM Statistics in 12cBigFiles
_drop_stat_segment            0         drop ilm statistics segment
_enable_ilm_flush_stats       TRUE      Enable ILM Stats Flush
_enable_ilm_testflush_stats   FALSE     Enable Test ILM Stats Flush
_flush_ilm_stats              0         flush ilm stats
_ilmflush_stat_limit          0         ILM flush statistics limit - Internal testing only
_ilmset_stat_limit            0         ILM set statistics limit - Internal testing only
_ilmstat_memlimit             10        Percentage of shared pool for use by ILM Statistics
_ktilmsc_exp                  600       expiration time of ktilm segment cache (in second)
_print_inmem_heatmap          0         print inmem ilm heatmap
_print_stat_segment           0         print ilm statistics segment
_trace_ktfs                   FALSE     Trace ILM Stats Tracking
heat_map                      OFF       ILM Heatmap Tracking

It occurred to me that BIGFILE TABLESPACES were used so because of the urgency of the IMPORT i decided to try to set the parameter _disable_12cbigfile to TRUE, since I didn't get any update from support and the customer was waiting to do his tests .... And Bingo the import Worked as expected.... Big disclaimer : putting _ parameters is not something you do without the blessing of support !!! I provided support with my workaround and they are checking for further potential issues and impacts

Saturday, September 19, 2015

evmd not starting in oracle restart

Monday two weeks ago I patched the DEV Super Cluster System to GI 12.1.0.2 BP10 together with DB 11.2.0.4 BP17. To win time I created a seperate oracle database home and upgraded that one to BP17. Then it would be a matter to stop the db's change their homes and do catbundle exa apply.... So far so good... Then came the time to upgrade GI everything went pretty smooth.
opatchauto -oh $GI_HOME -ocmrf /export/home/grid/ocm.rsp ....
and there we go ... however the last last step post patch took a long long time. looking at the traces evm didn’t want to start I remembered that last time i had the same issue and cleaning up /var/tmp/.oracle solved the issue …. so i interrupted this step,disabled automatic has start, cleaned up /var/tmp/.oracle and rebooted the zone. ok all perfect however relaunching the step still didn’t help, evmd still didn’t want to start
grid 3493 566 0 15:18:41 pts/17 0:00 grep d.bin 
grid 1380 27879 0 15:11:57 ? 0:07 /u01/app/grid/product/12.1.0/grid/bin/ohasd.bin reboot 
grid 1708 27879 0 15:12:10 ? 0:07 /u01/app/grid/product/12.1.0/grid/bin/oraagent.bin 
root 1355 1161 0 15:11:57 pts/19 0:02 /u01/app/grid/product/12.1.0/grid/bin/crsctl.bin start has 

I also saw plenty of errors regarding evmd in oohs_oraagent_grid.trc
"2015-09-01 15:19:59.298545 :GIPCXCPT:13:  gipcInternalConnectSync: failed sync request, ret gipcretConnectionRefused (29)

2015-09-01 15:19:59.298700 :GIPCXCPT:13:  gipcConnectSyncF [EvmConConnect : evmgipcio.c : 205]: EXCEPTION[ ret gipcretConnectionRefused (29) ]  failed sync connect endp 102d76690 [00000000000050c4] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=00000000-00000000-0))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef 0, ready 0, wobj 1030d1d40, sendp 102d76190 status 13flags 0xa008871a, flags-2 0x1, usrFlags 0x30020 }, addr 1033b1290 [00000000000050cb] { gipcAddress : name 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', objFlags 0x0, addrFlags 0x4 }, flags 0x8000000

2015-09-01 15:19:59.299375 : CLSCEVT:13: (:CLSCE0017:)clsce_subscribe 10212ee50 EvmConnCreate failed with status = 13

2015-09-01 15:19:59.299798 :  CRSEVT:13: {0:0:2} ClusterPubSub::subscribe clsce_subscribe failed [4]

2015-09-01 15:19:59.299917 : USRTHRD:13: {0:0:2} LsnrAgentSub-LISTENER_CLONE ClusterReconnectingSubscriber::subscribe Exception ClusterConnectException : CRS-10203: (:CLSCE0017:)  Could not connect to the Event Manager daemon

2015-09-01 15:19:59.300001 : CLSCEVT:13: (:CLSCE0028:)clsce_unsubscribe 10212ee50 successfully unsubscribed : 0

2015-09-01 15:20:00.301266 : CLSCEVT:13: clsce_subscribe 10226cad0 filter='^CRS_RESOURCE_PROFILE_CHANGE.*NAME='ora\.(scan|ssc02dbdat05z01\.vip).*RESOURCE_CLASS='(scan_vip|vip)'', flags=1, handler=100b26978, arg=102f928e0

2015-09-01 15:20:00.303161 :GIPCXCPT:13:  gipcInternalConnectSync: failed sync request, ret gipcretConnectionRefused (29)

2015-09-01 15:20:00.303300 :GIPCXCPT:13:  gipcConnectSyncF [EvmConConnect : evmgipcio.c : 205]: EXCEPTION[ ret gipcretConnectionRefused (29) ]  failed sync connect endp 102d76690 [00000000000050d5] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=00000000-00000000-0))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef 0, ready 0, wobj 1030d1d40, sendp 102d76190 status 13flags 0xa008871a, flags-2 0x1, usrFlags 0x30020 }, addr 1033b0990 [00000000000050dc] { gipcAddress : name 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', objFlags 0x0, addrFlags 0x4 }, flags 0x8000000

2015-09-01 15:20:00.303791 : CLSCEVT:13: (:CLSCE0017:)clsce_subscribe 10226cad0 EvmConnCreate failed with status = 13

2015-09-01 15:20:00.304118 :  CRSEVT:13: {0:0:2} ClusterPubSub::subscribe clsce_subscribe failed [4]

2015-09-01 15:20:00.304204 : USRTHRD:13: {0:0:2} LsnrAgentSub-LISTENER ClusterReconnectingSubscriber::subscribe Exception ClusterConnectException : CRS-10203: (:CLSCE0017:)  Could not connect to the Event Manager daemon

2015-09-01 15:20:00.304257 : CLSCEVT:13: (:CLSCE0028:)clsce_unsubscribe 10226cad0 successfully unsubscribed : 0

2015-09-01 15:20:00.304304 : CLSCEVT:13: clsce_subscribe 10212ee50 filter='^CRS_RESOURCE_PROFILE_CHANGE.*NAME='ora\.(scan|ssc02dbdat05z01\.vip).*RESOURCE_CLASS='(scan_vip|vip)'', flags=1, handler=100b26978, arg=1033406a0

2015-09-01 15:20:00.305574 :GIPCXCPT:13:  gipcInternalConnectSync: failed sync request, ret gipcretConnectionRefused (29)

2015-09-01 15:20:00.305675 :GIPCXCPT:13:  gipcConnectSyncF [EvmConConnect : evmgipcio.c : 205]: EXCEPTION[ ret gipcretConnectionRefused (29) ]  failed sync connect endp 102d76690 [00000000000050e6] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=00000000-00000000-0))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef 0, ready 0, wobj 1030d1d40, sendp 102d76190 status 13flags 0xa008871a, flags-2 0x1, usrFlags 0x30020 }, addr 1033b1290 [00000000000050ed] { gipcAddress : name 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', objFlags 0x0, addrFlags 0x4 }, flags 0x8000000

2015-09-01 15:20:00.306108 : CLSCEVT:13: (:CLSCE0017:)clsce_subscribe 10212ee50 EvmConnCreate failed with status = 13

2015-09-01 15:20:00.306376 :  CRSEVT:13: {0:0:2} ClusterPubSub::subscribe clsce_subscribe failed [4]

2015-09-01 15:20:00.306470 : USRTHRD:13: {0:0:2} LsnrAgentSub-LISTENER_CLONE ClusterReconnectingSubscriber::subscribe Exception ClusterConnectException : CRS-10203: (:CLSCE0017:)  Could not connect to the Event Manager daemon

2015-09-01 15:20:00.306752 : CLSCEVT:13: (:CLSCE0028:)clsce_unsubscribe 10212ee50 successfully unsubscribed : 0

2015-09-01 15:20:01.308000 : CLSCEVT:13: clsce_subscribe 10226cad0 filter='^CRS_RESOURCE_PROFILE_CHANGE.*NAME='ora\.(scan|ssc02dbdat05z01\.vip).*RESOURCE_CLASS='(scan_vip|vip)'', flags=1, handler=100b26978, arg=102f928e0

2015-09-01 15:20:01.309869 :GIPCXCPT:13:  gipcInternalConnectSync: failed sync request, ret gipcretConnectionRefused (29)

2015-09-01 15:20:01.309994 :GIPCXCPT:13:  gipcConnectSyncF [EvmConConnect : evmgipcio.c : 205]: EXCEPTION[ ret gipcretConnectionRefused (29) ]  failed sync connect endp 102d76690 [00000000000050f7] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=00000000-00000000-0))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef 0, ready 0, wobj 1030d1d40, sendp 102d76190 status 13flags 0xa008871a, flags-2 0x1, usrFlags 0x30020 }, addr 1033b0990 [00000000000050fe] { gipcAddress : name 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', objFlags 0x0, addrFlags 0x4 }, flags 0x8000000

2015-09-01 15:20:01.310445 : CLSCEVT:13: (:CLSCE0017:)clsce_subscribe 10226cad0 EvmConnCreate failed with status = 13

2015-09-01 15:20:01.310779 :  CRSEVT:13: {0:0:2} ClusterPubSub::subscribe clsce_subscribe failed [4]”

I opened an sr and after killing ohasd and oraagent.bin process the GI and evmd came up. Later after transferring the SR to my timezone support came back with a suspicion of a couple of bugs.
“

1. Unpublished  BUG 21484367 12.1.0.2 SIHA UPGRADE HANG INDEFINITELY IF MORE SERVICES REGISTERED 
2.  BUG 20620033 AIX ISSUES WITH GI 12.1.0.2 UPGRADE FINE, IF DON'T CONFIGURE > 34 OR 35 SERVICES 
-> not AIX specific 

“
on another system I could reproduce the issue the problem indeed appeared in my case when 35 services where created in total on the machine that is services you add with
“
srvctl add service 
then the GI didn't come up. one way circumvent is to put the services in MANUAL but that is not really a solution for us >80 services or put the db in MANUAL. Dev is working on a patch currently. Hope this help when you get these errors.

Sunday, July 19, 2015

E4 2015 it is a wrap

Long overdue but I had a very busy month of june and july sofar.

 This year I was invited at E4 as a speaker. This was my second time in the USA and
 I really enjoyed every bit of it, a great conference , great people nice interactions unsurpassed hospitality. Andy Colvin and his wife Kate came to pick me up at the airport, the belt where I had to get my luggage was very appropriate.





Once the luggage was there it was the turn of Martin Bach to get picked up at the airport he landed at about the same time at another terminal.
Andy together with the amazing Veronica arranged a great Mexican Restaurant for us that night it was delicious and we finished the evening in a great beer pub in Southlake .


The great thing about E4 is that it is completely focussed on engineered systems and how they work, the other great thing is that there is one track which makes you attend plenty of interesting session you might miss otherwise.

 I really enjoyed it and learned a lots of engineered systems I didn't even know before such as the VCA, Virtual Compute Appliance, James Anthony really nailed this presentation it was very interesting and well presented.


 Other highlights for me were :


  •  Tanel Poder & Kerry Osborne's keynote about the hybrid world, Conventional RDBMS and the NOSQL movement will continue to coexist together and are complimentary 



  • Tyler Muth about Super Cluster and in-memory very interesting since I didn't have not a lot of exposure of ORACLE 12c so far 



  •  Roger MacNicol What’s new in Exadata Smart Scan 12.1.0.1 and 12.1.0.2 which was the version we just upgraded to, so this was a great refresher



  •  Sue Lee "Tips and Tricks for Successful Consolidation on Exadata" I had a lot of eye openers there and got really good info about resource manager and IORM. Some interesting new things in Storage Cell version 12.1.2.1 such as Flash Cache Min and Flash Cache Limit are available now. 



  •  Martin Bach's talk about HCC Compression, I saw this one already a couple of times and it is really a presentation like wine, it gets better and and better, I you are able to attend this talk at some other conference please do so it is really very world while and Martin knows his stuff and has very good presentation skills.



  •  Frits Hoogland, in Frits' own style he did a superb presentation about memory SGA the differences in allocation between AMM and ASMM and manual, PGA how it allocated, etc very interesting but have to go through the presentation again to grasp everything 



  • Kodi's Umamageswaran session he pointed out what is in store on the technology level: Networking 40Gbps Ethernet, 100Gbps infiniband Persistent memory etc. 



  •  Mauro Pagano's session about health checks was really incredible what a horror story very entertaining, if ever a book should be written about worst practices this use case is a good candidate.


 My talk was about the Sparc Super Cluster "Super Cluster: the SWISS army knife of the Engineered Systems", I had the first slot on Tuesday and had about 100 people in my session,

I was bit nervous since a lot of my hero's were in the room and also the product manager of Super Cluster Michael Palmeter and Tyler Muth Super Cluster affiando.

The talk went ok and I did get good feedback and enjoyed it a lot.

 The conference is one thing but being able to talk offline with people like Karl Arao, Tanel Põder, Kerry Osborne, Mauro Pagana, Cary Millsap, Carlos Sierra, Andy Colvin, Tyler Muth, Michael Palmeter, Frits Hoogland, Martin Bach, Tarmo Alasoo, Radu Parvu, Jason Arneil, Hank   and all the passionate attendees just feels great and that is what attending this conference

I really enjoyed this conference a lot and got to know some nice people, thank you Veronica , Chanea and all the others of the organisation, you really made us speakers feel like kings.

A big thank you to Andy and Kate,  thanks for you warm welcome and even warmer hospitality really appreciated.

Next year there will be a European E4 as well in April, just a couple hours of flight for most of the Europeans, so you have no excuse to not attend this great engineered systems conference. check enkitecs website for more info


and yes it is true everything is bigger in Texas :-)



adding storage cells : appliance mode

Due to capacity issues 2 extra storage cells had to be added to the already 3 licensed high performance X3 cells on the Super Cluster T4. Some details about the installation Cellsrv 12.1.2.1.0 GI 12.1.0.2.4 DB 11.2.0.4 The process was quite straight forwards and is documented here in summary : create cell disks create grid disks on them exactly the same as on the existing cells with the same offsets for each disk group here an example of the first disk group on our cells
cellcli -e  "LIST GRIDDISK WHERE NAME LIKE 'DATA_DG.*' ATTRIBUTES NAME, SIZE ,STATUS, ASMMODESTATUS, ASMDEACTIVATIONOUTCOME,OFFSET"
         DATA_DG_CD_00_celadm01   423G    active  ONLINE  Yes     32M
         DATA_DG_CD_01_celadm01   423G    active  ONLINE  Yes     32M
         DATA_DG_CD_02_celadm01   423G    active  ONLINE  Yes     32M
         DATA_DG_CD_03_celadm01   423G    active  ONLINE  Yes     32M
         DATA_DG_CD_04_celadm01   423G    active  ONLINE  Yes     32M
         DATA_DG_CD_05_celadm01   423G    active  ONLINE  Yes     32M
         DATA_DG_CD_06_celadm01   423G    active  ONLINE  Yes     32M
         DATA_DG_CD_07_celadm01   423G    active  ONLINE  Yes     32M
         DATA_DG_CD_08_celadm01   423G    active  ONLINE  Yes     32M
         DATA_DG_CD_09_celadm01   423G    active  ONLINE  Yes     32M
         DATA_DG_CD_10_celadm01   423G    active  ONLINE  Yes     32M
         DATA_DG_CD_11_celadm01   423G    active  ONLINE  Yes     32M
ok we did that that way the configuration was exactly the same on the cells. on the database node we added the ip's of the newly added cells. I was a bit confused by the fact that nothing needed to be added on the storage cells because I recalled that there was a functionality introduced in one of the more recent storage cell versions that made them "talk" to each other on rebalance operations. But adding the ip's in cellip.ora on the db node was enough and then we were ready to rebalance the disk group
ALTER DISKGROUP DATA_DG ADD 
DISK 'o/192.168.15.104/DATA_DG*' 
DISK 'o/192.168.25.105/DATA_DG*' 
rebalance power 32 
; 
and kaboum
ERROR at line 1: 
ORA-00600: internal error code, arguments: [kfgpCreate_60], [12], [0], [65535], 
[65535], [65535], [65535], [], [], [], [], []
luckily asm was still running but being already pretty late after a day of migrations this was not funny at all. after chatting with support they suggested to disable appliance.mode on the disk group
 ALTER DISKGROUP DATA_DG SET ATTRIBUTE 'appliance.mode'='FALSE';
after we did that the addition of the disks worked
ALTER DISKGROUP DATA_DG ADD 
DISK 'o/192.168.15.104/DATA_DG*' 
DISK 'o/192.168.25.105/DATA_DG*' 
rebalance power 32 
; 

Diskgroup altered. 
When the rebalance was finished we put appliance.mode again to TRUE
ALTER DISKGROUP DATA_DG SET ATTRIBUTE 'appliance.mode'='TRUE';
more info can be found in note EXADATA : What Is Oracle ASM appliance.mode Attribute (Doc ID 1664261.1) So it looks like the appliance.mode is an optimisation I wonder if it is the fact that cells can communicate directly instead of using the host for rebalance operations, however that is not literally stated in the metalink note. hope this helps if you encounter this

Monday, May 4, 2015

upgrading to GI 12.1.0.2

Some weeks ago I patched the GI from 11.2.0.4 to 12.1.0.2 I implemented the fix for the listener poisoning issue you can read about it here here is the original listener.ora
#CVE-2012-1675
VALID_NODE_CHECKING_REGISTRATION_LISTENER=1
VALID_NODE_CHECKING_REGISTRATION_LISTENER_DG=1
VALID_NODE_CHECKING_REGISTRATION_LISTENER_SCAN1=1
REGISTRATION_INVITED_NODES_LISTENER_SCAN2=(x.y.z.61,x.y.z.64)
 
VALID_NODE_CHECKING_REGISTRATION_LISTENER_SCAN2=1
REGISTRATION_INVITED_NODES_LISTENER_SCAN1=(x.y.z.61,x.y.z.64)
 
VALID_NODE_CHECKING_REGISTRATION_LISTENER_SCAN3=1
REGISTRATION_INVITED_NODES_LISTENER_SCAN3=(x.y.z.61,x.y.z.64)
REGISTRATION_INVITED_NODES_LISTENER_DG=(172.20.20.72,172.20.20.73)

here the upgraded one
#CVE-2012-1675
VALID_NODE_CHECKING_REGISTRATION_LISTENER=1
VALID_NODE_CHECKING_REGISTRATION_LISTENER_DG=1
VALID_NODE_CHECKING_REGISTRATION_LISTENER_SCAN1=OFF             # line added by Agent
REGISTRATION_INVITED_NODES_LISTENER_SCAN2=()            # line added by Agent
 
VALID_NODE_CHECKING_REGISTRATION_LISTENER_SCAN2=OFF             # line added by Agent
REGISTRATION_INVITED_NODES_LISTENER_SCAN1=()            # line added by Agent
 
VALID_NODE_CHECKING_REGISTRATION_LISTENER_SCAN3=OFF             # line added by Agent
REGISTRATION_INVITED_NODES_LISTENER_SCAN3=()            # line added by Agent
REGISTRATION_INVITED_NODES_LISTENER_DG=(172.20.20.72,172.20.20.73)
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_CLONE=ON                # line added by Agent
VALID_NODE_CHECKING_REGISTRATION_LISTENER_CLONE=SUBNET          # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_MGMTLSNR=ON              # line added by Agent
VALID_NODE_CHECKING_REGISTRATION_MGMTLSNR=SUBNET                # line added by Agent

The installer / upgrade process removes these entries, I saw similar behaviour with the dbca when you have IFILES in the tnsnames.ora, most annoying these entries get removed .... NOTE : I anNonimized the ip addresses

Sunday, April 19, 2015

OLTP compression and Updates



I did some small tests (actually more troubleshooting ) with OLTP compression, a couple of weeks back.


Some execution times nearly doubled, I wanted to get a clear understanding of what happened so I asked to trace the procedure.

Which was easy since dbms_application.set_module could be used in the procedure to set the module and the action.

This made it much less cumbersome to enable the trace, the  dbms_monitor package has a procedure which allows you to start tracing whenever the combination of service_name and module is used very handy

so in the end I used this to trace


DBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE('SERVICE_NAME','MODULE');


That made it easy to trace as soon as the procedure was launched and avoid to play around with logon triggers and trace more, then necessary.

I did the tests both with a warmed up DB cache as with a flushed db cache and will do them again next week the results were in the same league


But back to the problem, speed of the query was much slower.


I checked both



NONCOMPRESSED

45810 rows updated.
Elapsed: 00:00:01.59

COMPRESSED

45810 rows updated.
Elapsed: 00:00:04.38


Quite a difference ...



The query looks a bit like this I had to obfuscate because of privacy reasons :

UPDATE TAB1 SET

(COLUMNS ....)

= (SELECT COLUMNS ....

FROM TAB2

WHERE TAB2.ID = TAB1.TAB2_ID)

WHERE TAB2.DATE_FIELD = TO_DATE('31/03/2015' ,'DD/MM/YYYY')







This is how the trace file looks like after being processed with Method R profiler a tool i really can recommend, it save me lots of time, time you can use to solve you performance issue, instead of crawling through raw trace files ...


Here You see the uncompressed trace



here is the compress trace


Much more blocks in Current mode


In both cases the last step of the execution plan takes quite a bit

the last step of the compressed update took ; ~ 3,25 sec (4,37 -0,34 ....)
the last step of the non compressed update ;  ~ 0,77 sec

So unbelievable that I will redo some tests again next week and add them to the post.

Thursday, April 2, 2015

12c adr support

I noticed that with the upgrade of GI to 12c some new things are in the ADRCI
 the scan listeners are know under adrci control as well, 
In 12c,(12.1.0.2.4)

adrci

ADRCI: Release 12.1.0.2.0 - Production on Tue Mar 31 18:17:21 2015

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

ADR base = "/u01/app/grid"
adrci> show control
DIA-48448: This command does not support multiple ADR homes

adrci> show homes
ADR Homes:
diag/tnslsnr/node2/listener_clone
diag/tnslsnr/node2/listener_scan2
diag/tnslsnr/node2/listener_scan1
diag/tnslsnr/node2/listener
diag/tnslsnr/node2/listener_scan3
diag/tnslsnr/node2/listener_dg
diag/tnslsnr/node2/mgmtlsnr
diag/rdbms/_mgmtdb/-MGMTDB
diag/crs/node2/crs
diag/asm/+asm/+ASM2
diag/asm/user_grid/host_4252752997_82
adrci> set home diag/tnslsnr/node2/listener


on 11g this was not the case


while on 11204 …

ADRCI: Release 11.2.0.4.0 - Production on Tue Mar 31 18:21:18 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

ADR base = "/u01/app/grid"
adrci> show homes
ADR Homes:
diag/asm/+asm/+ASM2
diag/tnslsnr/other_node2/listener
diag/tnslsnr/other_node2/listener_dg
diag/tnslsnr/other_node2/listener_clone

Tuesday, March 31, 2015

assumption is the mother of ....

We had complaint about a batch job that took much longer to complete on environment A then on environment B.

I checked and indeed the plans where different

As we recently updated from 11.2.0.4 BP12 to BP15 on that environment (A) my first idea was that this caused the issue,  to exclude everything I imported the table stats from the working environment B to the environment A.
Still I got the same plan.


I checked the system stats they were the same.
I used Mauro Pagano's excellent SQLd360 tool get it here and couldn't see anything (that's ENTIRELY my fault I could really work on this and was multitasking a lot due to other constraints;-).


I ran a 10053 optimizer trace and saw that there were fix controls not used on env B and only FTS were used in SINGLE TABLE ACCESS PATH , .... bizar 

Environment A

SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for TABLE_X[TABLE_X
  Column (#2): 
    NewDensity:0.000210, OldDensity:0.000476 BktCnt:254, PopBktCnt:2, PopValCnt:1, NDV:4725
  Column (#2): COL2(
    AvgLen: 7 NDV: 4725 Nulls: 0 Density: 0.000210 Min: 113898 Max: 814396873
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 254
  Column (#3): 
    NewDensity:0.000019, OldDensity:0.001339 BktCnt:254, PopBktCnt:18, PopValCnt:9, NDV:48180
  Column (#3): COL3(
    AvgLen: 7 NDV: 48180 Nulls: 0 Density: 0.000019 Min: 113328 Max: 816880773
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 246

  Table: TABLE_X  Alias: TABLE_X
    Card: Original: 633107.000000  Rounded: 1  Computed: 0.00  Non Adjusted: 0.00
  Access Path: TableScan
    Cost:  1113.98  Resp: 1113.98  Degree: 0
      Cost_io: 1100.00  Cost_cpu: 180830851
      Resp_io: 1100.00  Resp_cpu: 180830851
  Best:: AccessPath: TableScan
         Cost: 1113.98  Degree: 1  Resp: 1113.98  Card: 0.00  Bytes: 0

While on the other env B

SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for TABLE_X[TABLE_X
  Column (#2): 
    NewDensity:0.000210, OldDensity:0.000476 BktCnt:254, PopBktCnt:2, PopValCnt:1, NDV:4725
  Column (#2): COL2(
    AvgLen: 7 NDV: 4725 Nulls: 0 Density: 0.000210 Min: 113898 Max: 814396873
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 254
  Column (#3): 
    NewDensity:0.000019, OldDensity:0.001339 BktCnt:254, PopBktCnt:18, PopValCnt:9, NDV:48180
  Column (#3): COL3(
    AvgLen: 7 NDV: 48180 Nulls: 0 Density: 0.000019 Min: 113328 Max: 816880773
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 246
  ColGroup (#1, Index) IX_4
    Col#: 4 5    CorStregth: -1.00
  ColGroup Usage:: PredCnt: 2  Matches Full:  Partial: 
  Table: TABLE_X  Alias: TABLE_X
    Card: Original: 633107.000000  Rounded: 1  Computed: 0.00  Non Adjusted: 0.00
  Access Path: TableScan
    Cost:  1113.55  Resp: 1113.55  Degree: 0
      Cost_io: 1100.00  Cost_cpu: 180830851
      Resp_io: 1100.00  Resp_cpu: 180830851
  Access Path: index (AllEqRange)
    Index: IX_2
    resc_io: 9.00  resc_cpu: 69774
    ix_sel: 0.000019  ix_sel_with_filters: 0.000019 
    Cost: 9.01  Resp: 9.01  Degree: 1
  Access Path: index (AllEqRange)
    Index: IX_3
    resc_io: 28.00  resc_cpu: 257919
    ix_sel: 0.000210  ix_sel_with_filters: 0.000210 
    Cost: 28.02  Resp: 28.02  Degree: 1
  ****** trying bitmap/domain indexes ******
  Access Path: index (AllEqRange)
    Index: IX_2
    resc_io: 3.00  resc_cpu: 23964
    ix_sel: 0.000019  ix_sel_with_filters: 0.000019 
    Cost: 3.00  Resp: 3.00  Degree: 0
  Access Path: index (AllEqRange)
    Index: IX_3
    resc_io: 3.00  resc_cpu: 47964
    ix_sel: 0.000210  ix_sel_with_filters: 0.000210 
    Cost: 3.00  Resp: 3.00  Degree: 0
  Bitmap nodes:
    Used IX_2
      Cost = 3.001919, sel = 0.000019
    Used IX_3j
      Cost = 3.004938, sel = 0.000210
  Access path: Bitmap index - accepted
    Cost: 6.007407 Cost_io: 6.000536 Cost_cpu: 91721.496173 Sel: 0.000000
    Not Believed to be index-only
  ****** finished trying bitmap/domain indexes ******
******** Begin index join costing ********
  Access Path: index (FullScan)
    Index: IX_1
    resc_io: 1439.00  resc_cpu: 136869152
    ix_sel: 1.000000  ix_sel_with_filters: 1.000000 
    Cost: 1449.25  Resp: 1449.25  Degree: 0



Being in a hurry because I was busy with plenty of stuff  I assumed  that fix_control was the problem

however setting fix_control didn't change a thing so I went back to check  the trace with a colleague  and then we saw following


Table Stats::
  Table: TABLE_X  Alias: TABLE_X
    #Rows: 633107  #Blks:  4056  AvgRowLen:  43.00  ChainCnt:  0.00
Index Stats::
  Index: IX_1  Col#: 1
    LVLS: 2  #LB: 1437  #DK: 633107  LB/K: 1.00  DB/K: 1.00  CLUF: 37633.00
  Index: IX_2  Col#: 3
    LVLS: 2  #LB: 1665  #DK: 48180  LB/K: 1.00  DB/K: 5.00  CLUF: 287723.00
    INVISIBLE
  Index: IX_3  Col#: 2
    LVLS: 2  #LB: 2103  #DK: 4725  LB/K: 1.00  DB/K: 25.00  CLUF: 118278.00
    INVISIBLE
  Index: IX_4  Col#: 4 5
    LVLS: 2  #LB: 2490  #DK: 14906  LB/K: 1.00  DB/K: 2.00  CLUF: 36217.00
    INVISIBLE



We checked the status of the indexes they were INVISIBLE, when they were put back to VISIBLE the correct plan was choosen again ...


Moral of the story : As one of my previous managers (probably the best I had so far but that is another story )said Assumption is the mother of all f*ck ups, don't think that everything is a complicated problem and check basic things, and take your time to check things don't do 50 things at the same time, man anyway can't multitask ;-------------)
luckily in the end it didn't take to much time to identify the issue...



Thanks @MartinDBA, @Mautro and everyone interacting in the discussion on twitter


Saturday, March 14, 2015

wow what a seminar

wow it’s behind us again a great OUGN Varseminar 2015,big shout out to the boa(r)(t)d  to have made this again a success. 


You’ve set the bar very high for coming years, I hope to be able to be part of other future editions  ! 
It was a very good idea to put the exhibition hall not on the same floor as the talks,giving more breathing room around the rooms

Apart from the fact that a seminar on a boat creates a special bond with the people aboard the quality of the talks was really very good, i learned plenty of things...

I really enjoyed being on a boat with people passionate about there job in fact for most of the people it more then just being a job it is a passion , vocation ,...

It was great to meetup with old friends and make new ones

my personal highlights not in any particular order

were the sessions of :

Neil Johnson (@neiljdba) about contentious small tables really a very good presentation I learned a lot about the hakan factor, I strongly recommend to go to follow this session on other conferences this year, beside that Neil is a great guy to hangout with. Unfortunately I missed his talk about jumbo frames...but the good news I found his slides already ;)

Kellyn Potvin-Gorman  also known as @DBAKevlar talking about AWR Warehouse thanks to her presentation I finally know that this is a “free” limited license if you have licensed tuning and diagnostic pack for your  databases, I will start pushing for it at my current customer,because it can really save you bacon and lots of discussio. Her talk about EM 12c was also great by the way despite that she was pretty i'll suffering from her sinusses, so really well done

Jan Karremans’ session ‘ok now my database crashed’ was a very good reminder on how important it is to take this part of your oracle installation not too lightly, btw Jan is pretty (hyper) active on twitter ;-) here (@johnnyq72)

James Morle’s (@JamesMorle) session about opitmizing table scanning  and how to tune all layers of the cake even the blue ones

Luis Marques (@drune) session about Rat was of a very high quality so was his talk about Resource Manager, this is a subject that needs bigger attention especially in the actual spirit of consolidation

I really enjoyed the ‘Instructor’ aka Uwe Hesse’s presentations about the broker and fra I envy his courage of doing a complete talk with live demo’s. (@UweHesse)

I had a blast doing the SE Round Table BE (boat edition) with SE lady Ann Sjokvist (@annsjokvist) and my good friend Jan Karremans, I liked the chemistry that was going on on stage.
SE edition can be a good solution for you, it depends on your business needs and it is an easy step up to EE when you later would need those features. Each of us three looked the product from a Different angle 
Ann really has lots of experience with it and is a really advocate, Jan Looked at it from the business side, and I for my part I am a spoiled EE dba who realises that you need to do correct size matching and see if the 

I had my own talk about a great underestimated product in the engineered systems line up the Sparc Super Cluster, I’ve been working with this product since beginning 2014 and it is for me very much the Swiss Army knife of the engineered exadata systems, providing an un-paralleled versatility and flexibility. 
I liked the questions and discussion with Jacco Landlust that came offline after my talk about Super Cluster.

I would like to explicitely thank Oyvind Isene (@OyvindIsene) and Kjell Tore for many of us international speakers you guys are the public face of OUGN. Speaking For myself if I wouldn’t have come to OUGN as a delegate in 2013 after talking to Oyvind at #OOW2012  I wouldn’t be speaking now, so Oyvind  I owe you  a lot. I hope the next board will continue your work and the investments of blood sweat and tears you two did.

I would also like to thank portrix Florian , Connie and Bjoern (@brost)for the great gin / buffet you organised in Kiel. It becomes clearly a tradition.

Too bad I missed several sessions I due to overlap ;-(


Nice meeting you all girls and guys

Thank you 

Jan Karremans
Heli Helskyaho ;-)
Debra Lilley
Brendan Tierney
Oyvind Isene
Kjell Tore
Frits Hoogland
Christian Antognini
Neil Johnson
Kellyn & Tim Gorman
Gurcan Orhan
Bjoern Roest Florian 
Andy Colvin & wife
James Morle
Luis Marquis
Kelly Potvin
Tim Gorman
Jacco Landlust
Alex Nuijten
Patrick Barrel
Magnus Fagertun
Mark Rittman
Eric Van Roon



Thank you all for making my OUGN 15 so great

If you haven't decided yet maybe this small video and some pictures (sorry I am not so good with an iphone DSLR user) can persuade you to attend or submit for OUGN16


video












Saturday, February 14, 2015

patching super cluster


we patched our X4 cells for to the latest and greatest version 12.1.2.1.0 last weekend 
we had some issues with a cell that wasn’t reachable anymore during patching luckily after an init 6 of that cell the patching  continued, 
unfortunately the default 3,6 h asm disk time out was reached and the disks where removed from the disk group…

the next day we needed to upgrade DEV with GI upgrade from 11.2.0.4 to 12.1.0.2 and afterwards the 12.1.0.2.4 upgrade of january… the we upgrade the dbs from 11.2.0.4.12 to BP15 11.2.0.4.15

all went ok just one big caveat (it is in the patch notes ), once you upgrade to GI 12c the opatch  auto doesn’t work anymore for the db lower the 12d, so we had to apply those patches manually without opatch auto

so after a long day all database where patched ( mostly because i could start late, it take a while to stop all applications)…

the next day however asm had generated 64GBytes for core dumps, 

support found quickly that this was related to 

  Bug 20313024 - Exadata Solaris: ORA-7445 [ossdisk_ioctl_compl] on XDMG startup with 12.1.0.2.4 DBBP ( Doc ID 20313024.8 ) 


but then the worst still needed to follow

the unlocking of the GI and patch apply of the patch went fine, however 

 /u01/app/grid/product/12.1.0/grid/crs/install/roothas.pl -patch  didn’t complete…….

not knowing if I could stop this i opened an sr where they told me that it could be done, reexectuting didn’t change anything.

After a while I found info in the logs that point to evm issues i gave this observation  to support 

and also told them that i found old-er files in /var/tmp/.oracle and asked if i could remove them
they said no several times. (however apparently i blogged already about this years ago here;)
In the mean time i had to increase severity to 1 because of coming dev deadline the system was already 1 day longer down than expected ...
Had some phone calls with support in US, one guy didn't know the logging changed from $GRID_HOME/log to $ORACLE_BASE/diag ....
asked them again if i could remove the .oracle files, but again no was the answer because i could break things.

next morning got contacted by support pointed again to .oracle files and there he said that it was ok to remove them 
I did and could execetute roothas.pl -patch after rebooting with inittab entries about ohas disabled  removing those entries in /var/tmp/.oracle 


the point was that i mentioned this very early in the SR and nobody listened....

anyway was happy that everything was running back again, and i hope that by blogging about this it will help other people and  that I will remember the next time i have this issue ;-)

Friday, January 30, 2015

Rac One Node and Data Guard Broker some thoughts on bugs, notes.....


Currently we are moving user acceptance testing db's from regular single instances with ZFS cross-site mirroring to Super Cluster with RAC One and Data Guard.

The initial setup of this was finished a couple of months ago

and apart from Martin Bach (@MartinDBA) suggestions here and here and in real life together with Marcin Przepiorowski (@pioro) in Ireland in march last year.

 I also used following metalink note



Bug 17781373 : ORAAGENT DOES NOT SET LOCAL_LISTENER WHEN LISTENER_NETWORKS IS SET IN SPFILE

This exactly looked like what happened I could easily reproduce this by removing listener_networks

The workaround suggested was to put the nodeVIP as local listener, but that wasn't what the dgbroker wanted because it's static connect identifier was set to the this VIP ...and I want to avoid at all cost to modify it after role changes or relocates of the Rac One instances.

so I finally changed the LISTENER_NETWORKS parameter from


((NAME=network1)(LOCAL_LISTENER=RACONE)(REMOTE_LISTENER=scan01-uat:1521)), ((NAME=network_dg)(LOCAL_LISTENER=DG_VIP)(REMOTE_LISTENER=REMOTE_NET2))

To this :


((NAME=network1)(LOCAL_LISTENER=NODE_VIP,RACONE)(REMOTE_LISTENER=scan01-ora:1521)), ((NAME=network_dg)(LOCAL_LISTENER=DG_VIP)(REMOTE_LISTENER=REMOTE_NET2))




Note that NODE_VIP , RACONE , DG_VIP and REMOTE_NET2 resolve differently on each node in the RAC One Cluster.

eg.
on node1 one this maps to :


DG_VIP, NODE01_LOCAL_NET2=
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = node01dg-vip)(PORT = 1522))
  )


on node2 
DG_VIP, NODE02_LOCAL_NET2=
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = node02dg-vip)(PORT = 1522))
  )
...

also if you use alter system to change this don't forget the quotes

eg

alter system set listener_networks='(NAME=network1)(LOCAL_LISTENER=NODE_VIP,RACONE)(REMOTE_LISTENER=scan01-ora:1521))', '((NAME=network_dg)(LOCAL_LISTENER=DG_VIP)(REMOTE_LISTENER=REMOTE_NET2))';


While discussing some other things with my ex colleague Freek D'Hooge you can follow him on twitter and on his blog he pointed me to a note

How to Configure A Second Listener on a Separate Network in 11.2 Grid Infrastructure (Doc ID 1063571.1)

which says following

" Listeners specified by the LISTENER_NETWORKS parameter should not be used in the LOCAL_LISTENER and REMOTE_LISTENER parameters. Otherwise, cross registration will happen and connections will be redirected cross networks."


however this is not explicitly mentioned in the first note mentioned in this post.
For the time being I left the remote_listener as is and will test on newly created databases in order to remove it from already migrated uat databases...
Thanks Freek , Martin and Marcin for your insights ;-)  you guys rock !



PS I the host names in this blogpost are made up for privacy reasons .