Wireless Access

last person joined: yesterday 

Access network design for branch, remote, outdoor, and campus locations with HPE Aruba Networking access points and mobility controllers.
Expand all | Collapse all

Windows10 802.1x unable to reconnect after controller failover

This thread has been viewed 2 times
  • 1.  Windows10 802.1x unable to reconnect after controller failover

    Posted May 10, 2019 11:43 AM

    Hi,

     

    I've set up a pair of 7005 controllers with v8.5.0.0 in a (standalone) master/backup deployment for testing purposes. Everything works fine so far, but there is one serious problem left:

     

    Windows10 machines, configured for EAP-TLS or EAP-PEAP machine auth, cannot reconnect to the 802.1x SSID after the failover to the standby controller has occurred.It just says "Can't connect to this network".

     

    A Macintosh (EAP-TLS or EAP-PEAP 802.1x user auth) doesn't have this problem. It perfectly survives the failover. The Macintosh is connected to the same SSID as the Win10 machine.

     

    I see a log message when the Win10 machine tries to connect during the failover occurence:

     

    May 10 17:07:53 2019 demo-mc2 dot1x-proc:2[4039]: <138094> <4039> <WARN>
    <demo-mc2 172.18.13.38> MIC failed in WPA2 Key Message 2 from Station
    34:13:e8:36:a8:6c 90:4c:81:55:bb:90 AP03

     

    When the primary controller comes back again, the Win10 client is able to reestablish the Wi-Fi connection.

     

    Here are some facts about my test environment:

     

    • 2 7005 controllers, v8.5.0.0
    • External Windows NPS Radius Server
    • No EAP termination on controller
    • I tried both, EAP-TLS and EAP-PEAP. No difference.
    • I also tried "no validate-pmkid" and "no opp-key-caching" in my aaa authentication dot1x profile ... doesn't help.

    I have no clue on what this "MIC failed in WPA2 Key Message 2 from Station" error is about ... any help is highly appreciated!

     

    Thanks,

     -Andreas



  • 2.  RE: Windows10 802.1x unable to reconnect after controller failover

    Posted May 10, 2019 12:02 PM
    Any particular reason you decided to use AOS8.5 instead of 8.3.0.6



    Thank you

    Victor Fabian

    Pardon typos sent from Mobile


  • 3.  RE: Windows10 802.1x unable to reconnect after controller failover

    EMPLOYEE
    Posted May 10, 2019 07:47 PM

    What does the radius server logs say?

     

    You should type "show auth-tracebuf" on the commandline of the controller to see the back and forth radius messages for those clients.



  • 4.  RE: Windows10 802.1x unable to reconnect after controller failover
    Best Answer

    Posted May 11, 2019 12:21 PM

    Hi,

     

    I think I just found the cause of my problem. It had nothing to do with 802.1x or RADIUS. My fault was that I configured a "ha group-profile" with state-synchronization along with the master-redundancy configuration. I thought that this would be good for minimizing failover times.

     

    Obviousluy, the backside of using the "ha group-profile" is, that certain clients run into a "mic failure" (shown in the auth-tracebuf output) during the failover situation. This issue even occurs with iPhone/iPads connected to a wpa2psk SSID! The iOS devices just say "wrong password" ... I tested this with v8.3.0.6 as well.

     

    Now I removed the "ha group-profile" and pointed the APs to the VRRP VIP. The failover now takes more time (~30 seconds instead of ~10 seconds with ha config), but all clients are now able to connect to the second controller when the failover has occurred.

     

    Unfortunately, I didn't find a good documentation describing this standalone master/backup scenario within ArubaOS 8.x docs ... I think Aruba should improve this, because there are still some customers without MM / MC-Clusters out there.

     

    Thank you for your answers, anyway.

     -Andreas



  • 5.  RE: Windows10 802.1x unable to reconnect after controller failover

    EMPLOYEE
    Posted May 11, 2019 01:40 PM

    "State Sync" should synchronize the multicast encryption keys in a master/local situation.  How do you have those controllers configured?  I am not sure HA state sync is supported in master/backup master scenarios.

     

     



  • 6.  RE: Windows10 802.1x unable to reconnect after controller failover

    Posted May 11, 2019 06:50 PM

    I'm not sure either .. it was some kind of (not really) educated guess. I did configure this "ha group-profile" at the /mm node-hierarchy level of the primary controller:

     

    ha group-profile LAB-HA
     preemption
     state-sync
     pre-shared-key <psk>
     controller <mc1-ip> role dual
     controller <mc2-ip> role dual
     exit
    ap system-profile default
     lms-ip <mc1-ip>
     bkup-lms-ip <mc2-ip>
     exit
    ha group-membership LAB-HA

    Then I sync'd the config to the secondary controller ("database-synchronize"). This seems to be the only way to configure HA in a master/backup deployment.

     

    I also tried to omit the "state-sync" statement, but in either case, some clients are running into the "mic failed" issue when the primary controller is offline. Others do not run into this and are working quite well.

     

    I learned that the only way to make the controller failover work for all clients is to purge that "ha group-profile" and to point the APs to the VRRP VIP instead.

     

    I did some more tests with 802.1x and Guest (captive portal) SSIDs. Failover times are even worse with those SSIDs. It takes at least 2 minutes until the clients can re-connect, though the APs are NOT bootstrapping. Of course, this is still better than the issue where they can't connect to the secondary controller at all.

     

    I did my recent tests with v8.5.0.0 - there might be room for improvements in future software versions :)

     

    -Andreas