Thursday, April 12, 2012

HAIP expected behaviour ?

we have two nodes node1 and node2 with 11.2.0.3 GI and DB in solaris containers.

they are connected redundantly to two switches.

The HAIP functionality create 169... address on the physical nics


bge3: flags=1000843 mtu 1500 index 3

        inet 192.168.1.101 netmask ffffff00 broadcast 192.168.1.255

bge3:1: flags=1001000843 mtu 1500 index 3

        inet 169.254.114.125 netmask ffff8000 broadcast 169.254.127.255



e1000g3: flags=1000843 mtu 1500 index 5

        inet 192.168.2.105 netmask ffffff00 broadcast 192.168.2.255
e1000g3:1: flags=1001000843 mtu 1500 index 5
        inet 169.254.182.181 netmask ffff8000 broadcast 169.254.255.255


When we disable port bge3 from the first node, the correctponding 169.x.x.x address moves to nic e1000g3, so far so good. But, and a big one also on node 2.

also there the 169.x.x.x address moves to the e1000g3 nic

is this expected behaviour ?

It doesn't make sense to me because if nic e1000g3 on the other node now fails, the cluster goes belly up, I mean at least one node will get evicted.

actually this in mind,we tested this  and indeed the server went belly up, node 2 got evicted.....


according to http://blog.trivadis.com/b/robertbialek/archive/2011/02/03/do-i-still-need-bonding-for-the-cluster-interconnect.aspx the fact that the 169.x.x.x address travels to other nics on all the nodes is expected behaviour for me this is a big NO NO going back to IPMP...


UPDATE :

just after posting the article I created an SR at MOS to ask if this was normal, finally a month later now a bug was created and it is in triage state ..

UPDATE 15 th may :

Ok 1,5 month after my initial issues I now finally have confirmation of MOS that this IS actually expected behaviour :

"Find the latest update on the Bug . 
@ From the oracle stack perspective the behavior here is correct - I do not@ know how the switch maintenance could be carried out so the work is only@ visible on one node: the customer needs to discuss this with their network@ team and the switch provider - thanks and sorry this is not more helpful but@ this is really outside our stack
Let me know If there is any concern on this"
and after reasking the question

"
I just reread the comments of DEV"@ From the oracle stack perspective the behavior here is correct - I do not@ know how the switch maintenance could be carried out so the work is only@ visible on one node: the customer needs to discuss this with their network@ team and the switch provider - thanks and sorry this is not more helpful but@ this is really outside our stack"
so basically what is written there is that in fact it is expected behaviour , when the nic fails the 169 address fails over the the other available nic on ALL nodes.
is this correct could you please ask because it starts to be VERY confusing...
Philippe"

MOS answer

"so basically what is written there is that in fact it is expected behavior , when the nic fails the 169 address fails over the the other available nic on ALL nodes.
++++ Yes It is specified as expected behavior. the 169 addresses that were on nic1 on every node will failover to nic2 and back once connectivity is restored.
Let me know if you have any more queries "

What is funny or sad is the engineer at first assured me that this was totally not the behaviour to expect and that after i insisted to file a bug it was confirmed to be normal, really not happy here.

I don't understand why they didn't implement it like link based IPMP this really makes no sense.


Now asked the question how to disable this in a supported way :

disabled it on test system myself by doing following

"# $GRID_HOME/bin/crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=0" -init
# $GRID_HOME/bin/crsctl stop crs
# $GRID_HOME/bin/crsctl start crs"
oifcfg setif -global e1000g1/192.168.3.0:cluster_interconnect
but want a statement on how to do this officialy

UPDATE 21 th May

the above mentioned way to disable haip is not supported, the MOS says to use cluster_interconnect parameter to disable HAIP will do so  later

Wednesday, April 11, 2012

SID='*'

was testing some memory settings and ISM and DISM  on a 11.2.0.3 2-node rac today

changed the sga_max_size on first node to > than swap available .

SQL node1>alter system set sga_max_size=16G sid='SWINGBE1' scope=spfile;

then
          SQLnode1> shutdown abort

as expected  at startup  I get :

SQL node1> startup nomount
ORA-27102: out of memory
SVR4 Error: 12: Not enough space 


on the 2nd  node I execute following :

SQL node2> alter system set sga_max_size=1632M sid='*' scope=spfile;


still on first node :


SQLnode1> startup nomountORA-27102: out of memorySVR4 Error: 12: Not enough space 

nothing yet,


maybe thinks it is still started 
 SQL node1> shutdown abort 
ORACLE instance shut down. 
ok try again

SQL node1> startup nomount
ORA-27102: out of memory
SVR4 Error: 12: Not enough space 


no joy



on the other instance



SQL node2> alter system set sga_max_size=1632M sid='SWINGBE1' scope=spfile;




on the original node 





SQL> startup
ORACLE instance started.

Total System Global Area 1704132608 bytes
Fixed Size                  2159976 bytes
Variable Size             956304024 bytes
Database Buffers          738197504 bytes
Redo Buffers                7471104 bytes
Database mounted.
Database opened.


Apparently you oracle needs to receive the explicit SID in this case and the SID='*' doesn't work..