Saturday, September 19, 2015

evmd not starting in oracle restart

Monday two weeks ago I patched the DEV Super Cluster System to GI 12.1.0.2 BP10 together with DB 11.2.0.4 BP17. To win time I created a seperate oracle database home and upgraded that one to BP17. Then it would be a matter to stop the db's change their homes and do catbundle exa apply.... So far so good... Then came the time to upgrade GI everything went pretty smooth.
opatchauto -oh $GI_HOME -ocmrf /export/home/grid/ocm.rsp ....
and there we go ... however the last last step post patch took a long long time. looking at the traces evm didn’t want to start I remembered that last time i had the same issue and cleaning up /var/tmp/.oracle solved the issue …. so i interrupted this step,disabled automatic has start, cleaned up /var/tmp/.oracle and rebooted the zone. ok all perfect however relaunching the step still didn’t help, evmd still didn’t want to start
grid 3493 566 0 15:18:41 pts/17 0:00 grep d.bin 
grid 1380 27879 0 15:11:57 ? 0:07 /u01/app/grid/product/12.1.0/grid/bin/ohasd.bin reboot 
grid 1708 27879 0 15:12:10 ? 0:07 /u01/app/grid/product/12.1.0/grid/bin/oraagent.bin 
root 1355 1161 0 15:11:57 pts/19 0:02 /u01/app/grid/product/12.1.0/grid/bin/crsctl.bin start has 

I also saw plenty of errors regarding evmd in oohs_oraagent_grid.trc
"2015-09-01 15:19:59.298545 :GIPCXCPT:13:  gipcInternalConnectSync: failed sync request, ret gipcretConnectionRefused (29)

2015-09-01 15:19:59.298700 :GIPCXCPT:13:  gipcConnectSyncF [EvmConConnect : evmgipcio.c : 205]: EXCEPTION[ ret gipcretConnectionRefused (29) ]  failed sync connect endp 102d76690 [00000000000050c4] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=00000000-00000000-0))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef 0, ready 0, wobj 1030d1d40, sendp 102d76190 status 13flags 0xa008871a, flags-2 0x1, usrFlags 0x30020 }, addr 1033b1290 [00000000000050cb] { gipcAddress : name 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', objFlags 0x0, addrFlags 0x4 }, flags 0x8000000

2015-09-01 15:19:59.299375 : CLSCEVT:13: (:CLSCE0017:)clsce_subscribe 10212ee50 EvmConnCreate failed with status = 13

2015-09-01 15:19:59.299798 :  CRSEVT:13: {0:0:2} ClusterPubSub::subscribe clsce_subscribe failed [4]

2015-09-01 15:19:59.299917 : USRTHRD:13: {0:0:2} LsnrAgentSub-LISTENER_CLONE ClusterReconnectingSubscriber::subscribe Exception ClusterConnectException : CRS-10203: (:CLSCE0017:)  Could not connect to the Event Manager daemon

2015-09-01 15:19:59.300001 : CLSCEVT:13: (:CLSCE0028:)clsce_unsubscribe 10212ee50 successfully unsubscribed : 0

2015-09-01 15:20:00.301266 : CLSCEVT:13: clsce_subscribe 10226cad0 filter='^CRS_RESOURCE_PROFILE_CHANGE.*NAME='ora\.(scan|ssc02dbdat05z01\.vip).*RESOURCE_CLASS='(scan_vip|vip)'', flags=1, handler=100b26978, arg=102f928e0

2015-09-01 15:20:00.303161 :GIPCXCPT:13:  gipcInternalConnectSync: failed sync request, ret gipcretConnectionRefused (29)

2015-09-01 15:20:00.303300 :GIPCXCPT:13:  gipcConnectSyncF [EvmConConnect : evmgipcio.c : 205]: EXCEPTION[ ret gipcretConnectionRefused (29) ]  failed sync connect endp 102d76690 [00000000000050d5] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=00000000-00000000-0))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef 0, ready 0, wobj 1030d1d40, sendp 102d76190 status 13flags 0xa008871a, flags-2 0x1, usrFlags 0x30020 }, addr 1033b0990 [00000000000050dc] { gipcAddress : name 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', objFlags 0x0, addrFlags 0x4 }, flags 0x8000000

2015-09-01 15:20:00.303791 : CLSCEVT:13: (:CLSCE0017:)clsce_subscribe 10226cad0 EvmConnCreate failed with status = 13

2015-09-01 15:20:00.304118 :  CRSEVT:13: {0:0:2} ClusterPubSub::subscribe clsce_subscribe failed [4]

2015-09-01 15:20:00.304204 : USRTHRD:13: {0:0:2} LsnrAgentSub-LISTENER ClusterReconnectingSubscriber::subscribe Exception ClusterConnectException : CRS-10203: (:CLSCE0017:)  Could not connect to the Event Manager daemon

2015-09-01 15:20:00.304257 : CLSCEVT:13: (:CLSCE0028:)clsce_unsubscribe 10226cad0 successfully unsubscribed : 0

2015-09-01 15:20:00.304304 : CLSCEVT:13: clsce_subscribe 10212ee50 filter='^CRS_RESOURCE_PROFILE_CHANGE.*NAME='ora\.(scan|ssc02dbdat05z01\.vip).*RESOURCE_CLASS='(scan_vip|vip)'', flags=1, handler=100b26978, arg=1033406a0

2015-09-01 15:20:00.305574 :GIPCXCPT:13:  gipcInternalConnectSync: failed sync request, ret gipcretConnectionRefused (29)

2015-09-01 15:20:00.305675 :GIPCXCPT:13:  gipcConnectSyncF [EvmConConnect : evmgipcio.c : 205]: EXCEPTION[ ret gipcretConnectionRefused (29) ]  failed sync connect endp 102d76690 [00000000000050e6] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=00000000-00000000-0))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef 0, ready 0, wobj 1030d1d40, sendp 102d76190 status 13flags 0xa008871a, flags-2 0x1, usrFlags 0x30020 }, addr 1033b1290 [00000000000050ed] { gipcAddress : name 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', objFlags 0x0, addrFlags 0x4 }, flags 0x8000000

2015-09-01 15:20:00.306108 : CLSCEVT:13: (:CLSCE0017:)clsce_subscribe 10212ee50 EvmConnCreate failed with status = 13

2015-09-01 15:20:00.306376 :  CRSEVT:13: {0:0:2} ClusterPubSub::subscribe clsce_subscribe failed [4]

2015-09-01 15:20:00.306470 : USRTHRD:13: {0:0:2} LsnrAgentSub-LISTENER_CLONE ClusterReconnectingSubscriber::subscribe Exception ClusterConnectException : CRS-10203: (:CLSCE0017:)  Could not connect to the Event Manager daemon

2015-09-01 15:20:00.306752 : CLSCEVT:13: (:CLSCE0028:)clsce_unsubscribe 10212ee50 successfully unsubscribed : 0

2015-09-01 15:20:01.308000 : CLSCEVT:13: clsce_subscribe 10226cad0 filter='^CRS_RESOURCE_PROFILE_CHANGE.*NAME='ora\.(scan|ssc02dbdat05z01\.vip).*RESOURCE_CLASS='(scan_vip|vip)'', flags=1, handler=100b26978, arg=102f928e0

2015-09-01 15:20:01.309869 :GIPCXCPT:13:  gipcInternalConnectSync: failed sync request, ret gipcretConnectionRefused (29)

2015-09-01 15:20:01.309994 :GIPCXCPT:13:  gipcConnectSyncF [EvmConConnect : evmgipcio.c : 205]: EXCEPTION[ ret gipcretConnectionRefused (29) ]  failed sync connect endp 102d76690 [00000000000050f7] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=00000000-00000000-0))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef 0, ready 0, wobj 1030d1d40, sendp 102d76190 status 13flags 0xa008871a, flags-2 0x1, usrFlags 0x30020 }, addr 1033b0990 [00000000000050fe] { gipcAddress : name 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=00000000-00000000-0))', objFlags 0x0, addrFlags 0x4 }, flags 0x8000000

2015-09-01 15:20:01.310445 : CLSCEVT:13: (:CLSCE0017:)clsce_subscribe 10226cad0 EvmConnCreate failed with status = 13

2015-09-01 15:20:01.310779 :  CRSEVT:13: {0:0:2} ClusterPubSub::subscribe clsce_subscribe failed [4]”

I opened an sr and after killing ohasd and oraagent.bin process the GI and evmd came up. Later after transferring the SR to my timezone support came back with a suspicion of a couple of bugs.
“

1. Unpublished  BUG 21484367 12.1.0.2 SIHA UPGRADE HANG INDEFINITELY IF MORE SERVICES REGISTERED 
2.  BUG 20620033 AIX ISSUES WITH GI 12.1.0.2 UPGRADE FINE, IF DON'T CONFIGURE > 34 OR 35 SERVICES 
-> not AIX specific 

“
on another system I could reproduce the issue the problem indeed appeared in my case when 35 services where created in total on the machine that is services you add with
“
srvctl add service 
then the GI didn't come up. one way circumvent is to put the services in MANUAL but that is not really a solution for us >80 services or put the db in MANUAL. Dev is working on a patch currently. Hope this help when you get these errors.

1 comment: