Thursday, April 12, 2012

HAIP expected behaviour ?

we have two nodes node1 and node2 with 11.2.0.3 GI and DB in solaris containers.

they are connected redundantly to two switches.

The HAIP functionality create 169... address on the physical nics


bge3: flags=1000843 mtu 1500 index 3

        inet 192.168.1.101 netmask ffffff00 broadcast 192.168.1.255

bge3:1: flags=1001000843 mtu 1500 index 3

        inet 169.254.114.125 netmask ffff8000 broadcast 169.254.127.255



e1000g3: flags=1000843 mtu 1500 index 5

        inet 192.168.2.105 netmask ffffff00 broadcast 192.168.2.255
e1000g3:1: flags=1001000843 mtu 1500 index 5
        inet 169.254.182.181 netmask ffff8000 broadcast 169.254.255.255


When we disable port bge3 from the first node, the correctponding 169.x.x.x address moves to nic e1000g3, so far so good. But, and a big one also on node 2.

also there the 169.x.x.x address moves to the e1000g3 nic

is this expected behaviour ?

It doesn't make sense to me because if nic e1000g3 on the other node now fails, the cluster goes belly up, I mean at least one node will get evicted.

actually this in mind,we tested this  and indeed the server went belly up, node 2 got evicted.....


according to http://blog.trivadis.com/b/robertbialek/archive/2011/02/03/do-i-still-need-bonding-for-the-cluster-interconnect.aspx the fact that the 169.x.x.x address travels to other nics on all the nodes is expected behaviour for me this is a big NO NO going back to IPMP...


UPDATE :

just after posting the article I created an SR at MOS to ask if this was normal, finally a month later now a bug was created and it is in triage state ..

UPDATE 15 th may :

Ok 1,5 month after my initial issues I now finally have confirmation of MOS that this IS actually expected behaviour :

"Find the latest update on the Bug . 
@ From the oracle stack perspective the behavior here is correct - I do not@ know how the switch maintenance could be carried out so the work is only@ visible on one node: the customer needs to discuss this with their network@ team and the switch provider - thanks and sorry this is not more helpful but@ this is really outside our stack
Let me know If there is any concern on this"
and after reasking the question

"
I just reread the comments of DEV"@ From the oracle stack perspective the behavior here is correct - I do not@ know how the switch maintenance could be carried out so the work is only@ visible on one node: the customer needs to discuss this with their network@ team and the switch provider - thanks and sorry this is not more helpful but@ this is really outside our stack"
so basically what is written there is that in fact it is expected behaviour , when the nic fails the 169 address fails over the the other available nic on ALL nodes.
is this correct could you please ask because it starts to be VERY confusing...
Philippe"

MOS answer

"so basically what is written there is that in fact it is expected behavior , when the nic fails the 169 address fails over the the other available nic on ALL nodes.
++++ Yes It is specified as expected behavior. the 169 addresses that were on nic1 on every node will failover to nic2 and back once connectivity is restored.
Let me know if you have any more queries "

What is funny or sad is the engineer at first assured me that this was totally not the behaviour to expect and that after i insisted to file a bug it was confirmed to be normal, really not happy here.

I don't understand why they didn't implement it like link based IPMP this really makes no sense.


Now asked the question how to disable this in a supported way :

disabled it on test system myself by doing following

"# $GRID_HOME/bin/crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=0" -init
# $GRID_HOME/bin/crsctl stop crs
# $GRID_HOME/bin/crsctl start crs"
oifcfg setif -global e1000g1/192.168.3.0:cluster_interconnect
but want a statement on how to do this officialy

UPDATE 21 th May

the above mentioned way to disable haip is not supported, the MOS says to use cluster_interconnect parameter to disable HAIP will do so  later

4 comments:

Chris Milner said...

Hi Phillipe

I read you post with interest as I am assessing the benefits (or not) of HAIP over 3rd party solutions.

I have witnessed the same behaviour, ie. all NICs are effectively disabled in a cluster if one goes down. However, why is this an issue?

If the other NIC on the other node goes down, then yes, the cluster will start an eviction, but that is a failure of 2 points and Oracle High Available architecture should at minimum handle just one failure. Which HAIP achieves in this case. If you want to protect against 2 NIC failures, then I guess 3 network adapters per node would be required...?

How does IPMP perform better in this scenario?

Philippe said...

Hi Chris,


thanks for your comment, in the end we came to the same conclusion as you, if a double failure happens then indeed it is an issue, chances however should be low.

As far as I am told by Oracle, you can not disable HAIP, that is you can but it is not recommended nor supported

in this particular case link based IPMP would work better why because there is at all time just one active path for the nic and the ip assigned just "floats" (sorry not the correct lingo probably) to the other surviving nic. So you can survive a double nic failure.

My biggest problem / frustration is that at the time I was installing the GI not a lot of info was available on the supposed behaviour of this feature. The support agent in the beginning really said there was an issue with our setup ;-( which was not the case. after i explained him for the fifth time, with drawings etc...

important comment

Make also sure that your nics are on different subnets e.g..

eth0 : 192.168.1.x
eth1 : 192.168.2.x

i saw really weird behaviour if this was not the case


best regards,

Philippe

Anonymous said...

Hi Philippe,

I encountered issue in using HAIP with IPMP on my 11.2.0.3 RAC database. The workaround that I applied is avoid using the HAIP by setting cluster_interconnect parameter to the actual virtual IP. Is this what you did for the final setup? Are you saying that this method of disabling HAIP is supported by oracle?

Thanks.
Mr Lee

Philippe said...

Hi Lee,

i disable ipmp all together ... and lived with the consequences, you could use aggregates or bonding, and just present one bonded interface to hail. the way to disable haip indicated in the post is not supported at all by oracle.