Wireless Access

Reply
Highlighted
Occasional Contributor II

Backup MM lost network connection, but not to Master MM

Hi

 

I have two Mobility Masters running version 8.3.7.

 

All of a sudden, the backup has lost contact with the outside world network. However, it has a tunnel up towards Master, vrrp is up and the database is synced as it should.


I can access the master's ip, vrrp ip and the backup's own ip from it, the rest is not reachable. Not even that devices on the same layer 2 domain.


No change has occurred on hypervizor, they are on the same host that allows promiscuous mode.


All ip configuration is correct like gateway, routing table, submask.

 

My idea is that all traffic goes through the tunnel in some way but cannot verify this.

 

Is there anyone who recognizes the problem and has a solution?

 

BR

Highlighted
Frequent Contributor II

Re: Backup MM lost network connection, but not to Master MM

Wow, I've run into the same issue maybe twice, but I just figured it must be my network. In my case I actually lost the Primary MM, it lost all communication with the outside world, but VRRP kept working and it kept talking to the Standby, so VRRP never failed over. Of course this took my MM offline because the primary server kept the VIP. I could not ping the default gateway from the MM console, but I could ping the peer. In my case the only thing that brought it back online was a reboot. Since my MM was offline I didn't have time to troubleshoot with TAC. 

 

If you have time to troubleshoot with TAC while the issue is happening that might help us all figure out what's going on. 

 

In my case I can't say for sure what version I was on, most likely I was either on 8.3.0.7 or 8.5.01. I've been on 8.5.0.2 for a few weeks now, and haven't had the issue again. 

Highlighted
Guru Elite

Re: Backup MM lost network connection, but not to Master MM

The #1 reason we see for controllers losing connectivity in the field is too much traffic on their subnet(broadcast/multicast) for example.  Controllers have a firewall that will attempt to drop traffic if it exceeds certain boundaries.  "show firewall | include Rate" will show the rates above which traffic is limited:

(Babarella) #show firewall | include Rate
Policy                                       Action                                          Rate        Port
Rate limit CP untrusted ucast traffic        Enabled                                         9765 pps     
Rate limit CP untrusted mcast traffic        Enabled                                         3906 pps     
Rate limit CP trusted ucast traffic          Enabled                                         65535 pps    
Rate limit CP trusted mcast traffic          Enabled                                         3906 pps     
Rate limit CP route traffic                  Enabled                                         976 pps      
Rate limit CP session mirror traffic         Enabled                                         976 pps      
Rate limit CP auth process traffic           Enabled                                         976 pps      
Rate limit CP vrrp traffic                   Enabled                                         512 pps      
Rate limit CP ARP traffic                    Enabled                                         3906 pps     
Rate limit CP L2 protocol/other traffic      Enabled                                         1953 pps     
Rate limit CP IKE traffic                    Disabled                                                     

"show datapath bwm" will tell you if that traffic has even been exceeded (the policed" column.

 

(Babarella) #show datapath bwm 

Datapath Bandwidth Management Table Entries
-------------------------------------------
Contract Types : 
   0 - CP Dos 1 - Configured contracts 2 - Internal contracts
------------------------------------------------
Flags: Q - No drop, P - No shape(Only Policed), 
       T - Auto tuned 
--------------------------------------------------------------------
Rate: pps - Packets-per-second (256 byte packets), bps - Bits-per-second
--------------------------------------------------------------------
      Cont                          Avail     Queued/Pkts 
Type  Id    Rate       Policed     Credits  Bytes         Flags    CPU      Status
----  ----  ---------  ----------  -------  ------------  -------  -------  ----------
0     1     9792 pps   0           305            0/0              4        ALLOCATED
0     2     3936 pps   0           123            0/0              4        ALLOCATED
0     3     65536 pps  0           2048           0/0              4        ALLOCATED
0     4     3936 pps   0           123            0/0              4        ALLOCATED
0     5     992 pps    0           31             0/0              4        ALLOCATED
0     6     992 pps    0           31             0/0              4        ALLOCATED
0     7     992 pps    0           31             0/0              4        ALLOCATED
0     8     512 pps    0           16             0/0              4        ALLOCATED
0     9     3936 pps   0           123            0/0              4        ALLOCATED
0     10    1984 pps   0           62             0/0              4        ALLOCATED

Long story short:

- Make sure the management subnet of your controllers are not in large broadcast subnets.  In addition, avoid putting APs directly on your MD or MM management subnet; when broadcast and ARP traffic spikes, the MD or MM will protect itself by dropping useful traffic like VRRP and ARP necessary to communicate with outside components

- If you can, make the management subnet of your MM different from the management subnet of your MDs, so that the VRRP/Broadcast traffic are on separate subnets to avoid the same issue as above.

- Enable bcmc-optimization on all VLANs to drop unnecessary traffic so that the controller does not consume cycles attempting to process it

- Make sure that you are only trunking VLANs from your switch to your controller that the controller will be actually using.  Traffic that is on unnecessary VLANs will force the controller to process that traffic and it will be policed if it goes over a threshhold.  Also conversely, don't enable VLANs on an MD that will not be used by that MD for the same reason; the controller will have to process traffic that will never be seen or used by the MD for no reason.

- Do not force any unnecessary physical redundancy - Don't feel the need to dual-connect controllers to different switches.  You could introduce an inadvertent loop and that will either completely or partially blackhole your controller

- Do not span MDs in a cluster physically far from each other - If the latency between the two MDs in a cluster increases due to the distance between them, it will generate a "split brain" situation where you will not know what MD has controll over what APs.  This can easily happen again due to alot of traffic being generated and MDs in a cluster being far away from each other.  Try to avoid this design so that AP redundancy is predictable.

 

**All of these tips will not eliminate your MDs and MMs losing connectivity, but they will (1) decrease the likelihood and (2) more easily allow you to understand and issue when you encounter one*****

 


*Answers and views expressed by me on this forum are my own and not necessarily the position of Aruba Networks or Hewlett Packard Enterprise.*
ArubaOS 8.5 User Guide
InstantOS 8.5 User Guide
Airheads Knowledgebase
Airheads Learning Videos
Aruba Central Documentation
ArubaOS Consolidated Release Notes
Aruba VIA ASE Solution - Configure VIA VPN
Highlighted
Occasional Contributor II

Re: Backup MM lost network connection, but not to Master MM

 
Highlighted
Occasional Contributor II

Re: Backup MM lost network connection, but not to Master MM

Thanks for the elaborate answer.

didn't know this so it was useful information.

 

However, MM is not in a large L2 domain and only the backup has problems. The master show the same output put it's not effected.

However, the symptoms do not rate out the firewall.
I get the output below which shows 0 on all policed ​​rates.
I have restarted backup MM without results.

Any idee on wheat to do next?

 

show firewall | include Rate
Policy                                       Action                                            Rate       Port
Rate limit CP untrusted ucast traffic        Enabled                                           9765 pps
Rate limit CP untrusted mcast traffic        Enabled                                           1953 pps
Rate limit CP trusted ucast traffic          Enabled                                           98304 pps
Rate limit CP trusted mcast traffic          Enabled                                           1953 pps
Rate limit CP route traffic                  Enabled                                           976 pps
Rate limit CP session mirror traffic         Enabled                                           976 pps
Rate limit CP auth process traffic           Enabled                                           976 pps
Rate limit CP vrrp traffic                   Enabled                                           512 pps
Rate limit CP ARP traffic                    Enabled                                           976 pps
Rate limit CP L2 protocol/other traffic      Enabled                                           976 pps
Rate limit CP IKE traffic                    Enabled                                           1953 pps
show datapath bwm

Datapath Bandwidth Management Table Entries
-------------------------------------------
Contract Types :
   0 - CP Dos 1 - Configured contracts 2 - Internal contracts
------------------------------------------------
Flags: Q - No drop, P - No shape(Only Policed),
       T - Auto tuned
--------------------------------------------------------------------
Rate: pps - Packets-per-second (256 byte packets), bps - Bits-per-second
--------------------------------------------------------------------
      Cont                          Avail     Queued/Pkts
Type   Id    Rate      Policed      Credits    Bytes      Flags   CPU     Status
----  ----  ---------  ----------  -------  -----------   ------- -------  ------
0     1     9792 pps             0      306        0/0            1        ALLOCATED
0     2     1984 pps             0       62        0/0            1        ALLOCATED
0     3     98304 pps            0     3072        0/0            1        ALLOCATED
0     4     1984 pps             0       62        0/0            1        ALLOCATED
0     5     992 pps              0       31        0/0            1        ALLOCATED
0     6     992 pps              0       31        0/0            1        ALLOCATED
0     7     992 pps              0       31        0/0            1        ALLOCATED
0     8     512 pps              0       16        0/0            1        ALLOCATED
0     9     992 pps              0       31        0/0            1        ALLOCATED
0     10    992 pps              0       31        0/0            1        ALLOCATED
0     11    1984 pps             0       62        0/0            1        ALLOCATED
Highlighted
Guru Elite

Re: Backup MM lost network connection, but not to Master MM

Please open a TAC case, so that they can explore what is wrong in your specific situation.

 

It could be a bug, or configuration issue or a combination.  Please report back to us here when you get any insight on that.


*Answers and views expressed by me on this forum are my own and not necessarily the position of Aruba Networks or Hewlett Packard Enterprise.*
ArubaOS 8.5 User Guide
InstantOS 8.5 User Guide
Airheads Knowledgebase
Airheads Learning Videos
Aruba Central Documentation
ArubaOS Consolidated Release Notes
Aruba VIA ASE Solution - Configure VIA VPN
Highlighted
Frequent Contributor II

Re: Backup MM lost network connection, but not to Master MM

I just ran into this issue again. Seems to manifest after the MM VM was moved due to host maintenance in our environment. Primary MM lost outside network connectivity, but not to standby, so MM VRRP did not fail over. Unlike in the past, this actually caused a service disruption. It also took down one of our MDs, which also lost outside connectivity at the exact same time. But it is in a completely different datacenter, so the only reason it lost connectivity is because of something with the IPSec tunnel to the MM VRRP (I'm guessing). Same symptoms, the MD lost outside connectivity, however it could still see it's cluster peer, and even weirder, we lost connectivity to all the APs registered to that MD, but the APs did not lose contact with the controller, so they did not fail over to the backup! So half of all our APs were effectively black-holed. Rebooted the Primary MM, as soon as that happed the Standby took over VRRP, the MD came back online, and all of it's APs came back online! Very strange issue. It's got to have something to do with internal routing between APs, MDs and MM via the IPsec tunnel.

 

This is all on 8.5.0.4 btw. Opening a TAC case.. 

Highlighted
Guru Elite

Re: Backup MM lost network connection, but not to Master MM

The MD does not have any dependency on the MM to pass user traffic.  I am sitting here with popcorn to understand what could be making that happen.


*Answers and views expressed by me on this forum are my own and not necessarily the position of Aruba Networks or Hewlett Packard Enterprise.*
ArubaOS 8.5 User Guide
InstantOS 8.5 User Guide
Airheads Knowledgebase
Airheads Learning Videos
Aruba Central Documentation
ArubaOS Consolidated Release Notes
Aruba VIA ASE Solution - Configure VIA VPN
Highlighted
Frequent Contributor II

Re: Backup MM lost network connection, but not to Master MM

I'm not exactly sure how much *user* traffic was affected, but definetely management traffic to the MD and all APs connected to it. 

 

Here's an interesting piece.. While the MM was 'offline', or not reachable, the MD pulled it's default route out of the route table. 

 

MD during outage

show ip route 

Codes: C - connected, O - OSPF, R - RIP, S - static, B - Bgw peer uplink
       M - mgmt, U - route usable, * - candidate default, V - RAPNG VPN/Branch
       I - Ike-overlay, N - not redistributed

Gateway of last resort is Imported from DHCP to network 0.0.0.0 at cost 10
Gateway of last resort is Imported from CELL to network 0.0.0.0 at cost 10
Gateway of last resort is Imported from PPPOE to network 0.0.0.0 at cost 10
C    10.20.0.0/16 is directly connected, VLAN20
C    10.210.0.0/16 is directly connected, VLAN810
C    10.212.0.0/16 is directly connected, VLAN812
C    10.20.40.51/32 is an ipsec map default-ha-ipsecmap10.20.40.51
C    10.150.20.31/32 is an ipsec map default-local-master-ipsecmap

MD normal operation

(adc-mod-awlc01) [MDC] #show ip route

Codes: C - connected, O - OSPF, R - RIP, S - static, B - Bgw peer uplink
       M - mgmt, U - route usable, * - candidate default, V - RAPNG VPN/Branch
       I - Ike-overlay, N - not redistributed

Gateway of last resort is Imported from DHCP to network 0.0.0.0 at cost 10
Gateway of last resort is Imported from CELL to network 0.0.0.0 at cost 10
Gateway of last resort is Imported from PPPOE to network 0.0.0.0 at cost 10
Gateway of last resort is 10.20.0.1 to network 0.0.0.0 at cost 1
S*    0.0.0.0/0  [0/1] via 10.20.0.1*
C    10.20.0.0/16 is directly connected, VLAN20
C    10.210.0.0/16 is directly connected, VLAN810
C    10.212.0.0/16 is directly connected, VLAN812
C    10.20.40.51/32 is an ipsec map default-ha-ipsecmap10.20.40.51
C    10.150.20.31/32 is an ipsec map default-local-master-ipsecmap
Highlighted
Guru Elite

Re: Backup MM lost network connection, but not to Master MM

Sounds to me like 10.20.0.1 is unreachable for some reason.

 

EDIT:  I would do a "show datapath route-cache" and see if you see the default gateway in there.


*Answers and views expressed by me on this forum are my own and not necessarily the position of Aruba Networks or Hewlett Packard Enterprise.*
ArubaOS 8.5 User Guide
InstantOS 8.5 User Guide
Airheads Knowledgebase
Airheads Learning Videos
Aruba Central Documentation
ArubaOS Consolidated Release Notes
Aruba VIA ASE Solution - Configure VIA VPN
Search Airheads
cancel
Showing results for 
Search instead for 
Did you mean: