Wireless Access

last person joined: 17 minutes ago 

Access network design for branch, remote, outdoor and campus locations with Aruba access points, and mobility controllers.
Expand all | Collapse all

IPSec tunnel between MM and MC failing

This thread has been viewed 20 times
  • 1.  IPSec tunnel between MM and MC failing

    Posted 20 days ago
    Hello,

    We have a live system which is working fine, this question relates to our dev system (running 8.6.0.6 custom code)

    The dev system consists of an MM and currently 2 clusters - one cluster (A) has a single 7010 and is running fine, cluster B however has issues - we recently repaced the single 7010 in this cluster with a 7220. The 7220 seems to be unable to form a tunnel with the MM. I ran "write erase all" and ran through the full set-up as normal but when it comes up, although it can successfully ping the A cluster member, and other devices (including 8.8.8.8), it cannot ping the MM. Weirdly the MM _can_ ping it though it's a bit hit and miss.

    In the MM logs there are lots of these relating to the new MC:

    Mar 31 23:06:32 isakmpd[31225]: <103103> <31225> <WARN> |ike| IKE SA Deletion: IKE2_delSa peer:x.x.x.x:4500 id:2962618954 errcode:ERR_IKESA_EXPIRED saflags:0x51 arflags:0x0

    There is a complication(!) which is that we have just also installed a Fortigate FW - at the moment only the cluster A mngmnt VLAN and the MM mngmnt VLAN have been moved to the FW. The Cluster B mngmnt VLAN SVIs are still on the routers (there are 4 routers just to make it entertaining). I don't really understand how a FW would be involved in breaking the tunnel - the chap who set it up says he can see traffic on port 4500 getting through. And there shouldn't be anything blocking ping etc between the VLANs. I did Google and there were a few similar sounding things involving Cisco ASA.

    I can ping from the router directly to the MM when I use the cluster B mngmnt VLAN as the source interface.

    I should mention we actually installed 2 7220s in cluster B - I have the same issue with both of them.

    I'm wondering if this rings any bells with anyone?

    Thanks
    Guy

    ------------------------------
    Guy Goodrick
    ------------------------------


  • 2.  RE: IPSec tunnel between MM and MC failing

    Posted 20 days ago
    Firewalls between those devices can lead to a decent amount of management overhead.  Please see here:  https://www.arubanetworks.com/techdocs/ArubaOS_81_Web_Help/Web_Help_Index.htm#ArubaFrameStyles/Firewall_Port_Info/Communication_Between__D.htm

    ------------------------------
    Any opinions expressed here are solely my own and not necessarily that of Hewlett Packard Enterprise or Aruba Networks.
    ------------------------------



  • 3.  RE: IPSec tunnel between MM and MC failing

    Posted 20 days ago
    Hi,

    Was this working before adding the firewall? Are you seeing any other traffic that is initiated from the controller being dropped on the firewall?
    Are you sure all the needed ports are allowed (UDP ports 500 and 4500) and ESP (protocol 50)..
    https://www.arubanetworks.com/techdocs/ArubaOS_86_Web_Help/Content/arubaos-solutions/external-firewallconf/fire-port-conf-arub.htm?Highlight=ports

    ------------------------------
    Ayman Mukaddam
    ------------------------------



  • 4.  RE: IPSec tunnel between MM and MC failing

    Posted 20 days ago
    I have passed the ports onto our FW guru, he is happy everything is allowed that is needed to be. 

    The history of the 'new' MCs is that they have been taken out of our existing live B cluster and repurposed into the dev B cluster (these do not use the same MM), this all happened at roughly the same time as the FW was introduced (note to self - don't do that in future!). Before the changes the dev B 'cluster' was running fine with a single 7010. The dev A cluster is still running fine, we haven't touched that, no issues there.

    I ran "write erase all" on the 7220s before running full setup as usual. After a reload there are (perhaps predictably) a whole bunch of errors in the 7220 logs, a few snippets below:

    Deleting temporary database upgrademgrdb_tmp and related /mswitch/conf/upgrademgr_psql_tmp.sql file
    Deleting temporary database wms_tmp and related /mswitch/conf/wms_pg_schema_tmp.sql file
    ERROR: constraint "cpsec_whitelist_pkey" of relation "cpsec_whitelist" does not exist
    ERROR: constraint "rap_whitelist_pkey" of relation "rap_whitelist" does not exist
    ERROR: constraint "userinfo_pkey" of relation "userinfo" does not exist
    Extracted schema data for postgresql://root@127.0.0.1:5432/upgrademgrdb
    Extracted schema data for postgresql://root@127.0.0.1:5432/upgrademgrdb_tmp


    Apr 1 10:06:29 ctrlmgmt: PAPI_Send: To: 7f000001:8226 Type:0x4 Timed out.
    Apr 1 10:06:29 mobileip[3781]: PAPI_Send: To: 7f000001:8226 Type:0x4 Timed out.
    Apr 1 10:06:30 ble_relay[4227]: PAPI_Send: To: 7f000001:8226 Type:0x4 Timed out.
    Apr 1 10:06:30 bocmgr[4281]: PAPI_Send: To: 7f000001:8405 Type:0x4 Timed out.
    Apr 1 10:06:30 off-loader[4334]: PAPI_Send: To: 7f000001:8407 Type:0x4 Timed out.
    Apr 1 10:06:30 phonehome[3784]: PAPI_Send: To: 7f000001:8226 Type:0x4 Timed out.
    Apr 1 10:06:31 nanny[3446]: PAPI_Send: To: 7f000001:8407 Type:0x4 Timed out.



    Apr 1 10:06:54 profmgr[3644]: <334200> <ERROR> |profmgr| Node /mm already has elements in it
    Apr 1 10:06:54 profmgr[3644]: <399816> <3644> <ERRS> |profmgr| cfgparser_parse_cfgfile: Failed to parse " any any any permit ". Continue
    Apr 1 10:06:54 profmgr[3644]: <399816> <3644> <ERRS> |profmgr| cfgparser_parse_cfgfile: Failed to parse " any any sys-svc-dhcp permit ". Continue
    Apr 1 10:06:54 profmgr[3644]: <399816> <3644> <ERRS> |profmgr| cfgparser_parse_cfgfile: Failed to parse " any any sys-svc-esp permit ". Continue
    Apr 1 10:06:54 profmgr[3644]: <399816> <3644> <ERRS> |profmgr| cfgparser_parse_cfgfile: Failed to parse " any any sys-svc-icmp permit ". Continue
    Apr 1 10:06:54 profmgr[3644]: <399816> <3644> <ERRS> |profmgr| cfgparser_parse_cfgfile: Failed to parse " any any sys-svc-icmp6 permit ". Continue
    Apr 1 10:06:54 profmgr[3644]: <399816> <3644> <ERRS> |profmgr| cfgparser_parse_cfgfile: Failed to parse " any any sys-svc-ike permit ". Continue
    Apr 1 10:06:54 profmgr[3644]: <399816> <3644> <ERRS> |profmgr| cfgparser_parse_cfgfile: Failed to parse " any any sys-svc-natt permit ". Continue
    Apr 1 10:06:54 profmgr[3644]: <399816> <3644> <ERRS> |profmgr| cfgparser_parse_cfgfile: Failed to parse " any any sys-svc-v6-dhcp permit ". Continue
    Apr 1 10:06:54 profmgr[3644]: <399816> <3644> <ERRS> |profmgr| cfgparser_parse_cfgfile: Failed to parse " disallow-vlan type servers service "" ". Continue
    Apr 1 10:06:54 profmgr[3644]: <399816> <3644> <ERRS> |profmgr| cfgparser_parse_cfgfile: Failed to parse " host 255.255.255.255 any any deny ". Continue
    Apr 1 10:06:54 profmgr[3644]: <399816> <3644> <ERRS> |profmgr| cfgparser_parse_cfgfile: Failed to parse " invert ". Continue
    ....(lots more like this)



    Apr 1 10:07:01 profmgr[3644]: <399803> <3644> <ERRS> |profmgr| An internal system error has occurred at file profmgr_ncfg.c function profmgr_domain_validate line 754 error Priv is NULL for ASLEAP ids_signature_prof. Inst: 0x2b87de4 .
    Apr 1 10:07:01 profmgr[3644]: <399803> <3644> <ERRS> |profmgr| An internal system error has occurred at file profmgr_ncfg.c function profmgr_domain_validate line 754 error Priv is NULL for AirJack ids_signature_prof. Inst: 0x288636c .
    Apr 1 10:07:01 profmgr[3644]: <399803> <3644> <ERRS> |profmgr| An internal system error has occurred at file profmgr_ncfg.c function profmgr_domain_validate line 754 error Priv is NULL for Deauth-Broadcast-From-Valid-AP ids_signature_prof. Inst: 0x2be3484 .
    Apr 1 10:07:01 profmgr[3644]: <399803> <3644> <ERRS> |profmgr| An internal system error has occurred at file profmgr_ncfg.c function profmgr_domain_validate line 754 error Priv is NULL for Disassoc-Broadcast-From-Valid-AP ids_signature_prof. Inst: 0x2bd5e8c .
    Apr 1 10:07:01 profmgr[3644]: <399803> <3644> <ERRS> |profmgr| An internal system error has occurred at file profmgr_ncfg.c function profmgr_domain_validate line 754 error Priv is NULL for Netstumbler Generic ids_signature_prof. Inst: 0x2861934 .
    Apr 1 10:07:01 profmgr[3644]: <399803> <3644> <ERRS> |profmgr| An internal system error has occurred at file profmgr_ncfg.c function profmgr_domain_validate line 754 error Priv is NULL for Netstumbler Version 3.3.0x ids_signature_prof. Inst: 0x2b24d84 .
    Apr 1 10:07:01 profmgr[3644]: <399803> <3644> <ERRS> |profmgr| An internal system error has occurred at file profmgr_ncfg.c function profmgr_domain_validate line 754 error Priv is NULL for NoAuthApGroup ap_group. Inst: 0x2a05214 .


    I guess some of this (timeouts etc) can be explained by the fact it hasn't been able to contact the master. But do the cfgparser messages seem unusual to you? It's not impossible that in the past they have had minor config changes made on them in disaster-recovery mode, could that be an issue?

    Is there a more extreme version of "write erase all" that we can run on the two 7220s just to make sure there's nothing funky going on with them (before I lay all the blame on the FW!)?

    If push comes to shove we can move them out from behind the FW again to see if that 'fixes' things, but ultimately we want them behind there so we'll need to work out what is going on.

    Guy

    ------------------------------
    Guy Goodrick
    ------------------------------



  • 5.  RE: IPSec tunnel between MM and MC failing

    Posted 20 days ago
    Ok red-face time - I hadn't whitelisted the controllers on the dev MM! Apologies for wasted time.

    I'm sure it will be the FWs fault next time ;)

    Guy

    ------------------------------
    Guy Goodrick
    ------------------------------