Wireless Access

last person joined: an hour ago 

Access network design for branch, remote, outdoor and campus locations with Aruba access points, and mobility controllers.
Expand all | Collapse all

AOS 8.4.0.3 Roaming disconnects

  • 1.  AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 07:57 AM

    We have recently set up 2 controllers on AOS 8.4.0.3 in a cluster and our users are reporting intermittent disconnects when they roam around our office. Looking in our FreeRADIUS logs we see the following message that coincides with the issues:

     

    Login incorrect (eap: EAP requires the State attribute to work, but no State exists in the Access-Request packet.): [USERNAME] (from client MC port 0 cli MACADDR)

     

    Have googled but not found any help so far.

     

    When the issue occurs users get prompted to enter their credentials again and even if they enter the correct credentials it rejects their login. It seems they have to forget the network entirely to reconnect or if they wait a period of time it will eventually be able to reconnect by itself. This is happening on all devices regardless of make or OS etc.

     

    Currently the only thing we have noticed is that even though we have L2 connectivity between our controller cluster and same VLANs present for each (they are in two different data centres) Mobile IP has been enabled on our VAPs (think this is default setting) and VLAN Mobility is not enabled. Could this be causing these issues?

     

    Thanks.



  • 2.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 08:33 AM

    On either MD commandline, type "show lc-cluster vlan-probe status" to see if all your user VLANs can see each other or if probes are failing.  If not, it will consider your cluster layer 3 connected and deauth your clients when they roam between APs that are on different controllers.

     

    EDIT:  rereading your post above, I have no clue why you are having issues.  Try to see if it operates with a single controller.  Mobileip and VLAN mobility have nothing to do with clustering.

     

     

     



  • 3.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 08:48 AM

    Thanks for reply. They are showing as L2 connected when I check on CLI.

     

    Reading some Aruba documentation it says should not have L2 and L3 enabled on the VAP at same time. By these settings I presume it means Mobile IP and VLAN mobility. Currently we have Mobile IP ticked and VLAN mobility unticked. This seems to be default but is it correct for L2?

     

    Thanks.

     

    2e45d26b-82c5-43d6-88b0-a1b18bcf0c83.jpeg



  • 4.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 08:51 AM

    Those knobs are not used for clustering.  Have you excluded any VLANs from your cluster?



  • 5.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 09:17 AM

    3 VLANs are excluded for some reason even though there is layer 2 connectivity between them and two controllers (config was done by someone else). 2 of these VLANs might get used by clients but not by those who are currently reporting the issue.



  • 6.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 09:20 AM

    Ok.

     

    Just to ask:

     

    - How frequently does the issue you mentioned with the radius server happen?

    - Does it happen to a specific client more than another?

     

    I am asking this question, because we need to zero in on why this is happening.  If you have the person who configured/designed this network handy, that would be helpful.



  • 7.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 09:25 AM

    I have logged a ticket with the company who configured and am awaiting reply. Most of the config is default settings and the rest was replicating our old 6.5 environment which does not experience the same issue.

     

    Currently it seems to be affecting random set of users with nothing that links them together. Different devices and OSes at different times in the day in different locations around our office. Only thing that seems to be consistent is it is triggered by them moving through the office and roaming from one AP to another. A very technical colleague from another team experienced the issue recently so we know it is not user error or something like that.

     

    We have tested roaming with a variety of devices and could not reproduce after many hours of walking around our office which is rather frustrating but reports keep coming through with the same symptoms.



  • 8.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 09:11 AM

    What do you see in the authentication buffer for the specific user?

    (When you know you have entered the correct credentials but the user is unable to login)

     

    Command : show auth-tracebuf mac <user mac>

     

    Note: Use this command as the client tries to login

     

    At which point is the authentication process failing?

     

    What happens when you do a AAA test server from the diagnostics with the credentials you know are correct?

     

     

    --Give Kudos: found something helpful, important, or cool? Click Kudos Star in a post.
    --Problem Solved? Click "Accepted Solution" in a post.



  • 9.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 09:19 AM

    We can't reproduce the issue currently so very difficult to debug. It seems to happen maybe once every few days but users aren't reporting it to us when it does and we hear about it days later.

     

    Interestingly running a AAA test from any of our controllers or mobility master and our old controllers on AOS 6.5 to our FreeRADIUS server we get authentication failed response even though users can successfully auth on the live wireless network. Not sure why this is the case. We have some MS NPS servers that are also in use that the AAA test comes back fine for.

     

    Thanks.



  • 10.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 09:24 AM

    @kuairhead wrote:

    We can't reproduce the issue currently so very difficult to debug. It seems to happen maybe once every few days but users aren't reporting it to us when it does and we hear about it days later.

     

    Interestingly running a AAA test from any of our controllers or mobility master and our old controllers on AOS 6.5 to our FreeRADIUS server we get authentication failed response even though users can successfully auth on the live wireless network. Not sure why this is the case. We have some MS NPS servers that are also in use that the AAA test comes back fine for.

     

    Thanks.


    If it is something that happens rarely, it will be difficult to figure out, because you would have to wait until it happens to capture the state of the user.

     

    AAA test server has a raw authentication that might not have all the attributes your freeradius server is looking for, so it might fail.  The NPS server might not be looking for any more attributes and maybe you should switch back to that to see if your issue continues. The fact that AAA test server is failing at least means that authentication is making it to your radius server(s).

     

     



  • 11.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 09:27 AM

    @cjoseph wrote:

    @kuairhead wrote:

    We can't reproduce the issue currently so very difficult to debug. It seems to happen maybe once every few days but users aren't reporting it to us when it does and we hear about it days later.

     

    Interestingly running a AAA test from any of our controllers or mobility master and our old controllers on AOS 6.5 to our FreeRADIUS server we get authentication failed response even though users can successfully auth on the live wireless network. Not sure why this is the case. We have some MS NPS servers that are also in use that the AAA test comes back fine for.

     

    Thanks.


    If it is something that happens rarely, it will be difficult to figure out, because you would have to wait until it happens to capture the state of the user.

     

    AAA test server has a raw authentication that might not have all the attributes your freeradius server is looking for, so it might fail.  The NPS server might not be looking for any more attributes and maybe you should switch back to that to see if your issue continues. The fact that AAA test server is failing at least means that authentication is making it to your radius server(s).

     

     


    This is what we suspected as it was responding immediately with reject and the requests from the controllers look a bit different to a user device. It has made it rather annoying to troubleshoot any auth issues because anyone looking at it immediately says auth is failing but in our production wireless it works fine (beside these random disconnects)



  • 12.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 09:34 AM

    @kuairhead wrote:

    @cjoseph wrote:

    @kuairhead wrote:

    We can't reproduce the issue currently so very difficult to debug. It seems to happen maybe once every few days but users aren't reporting it to us when it does and we hear about it days later.

     

    Interestingly running a AAA test from any of our controllers or mobility master and our old controllers on AOS 6.5 to our FreeRADIUS server we get authentication failed response even though users can successfully auth on the live wireless network. Not sure why this is the case. We have some MS NPS servers that are also in use that the AAA test comes back fine for.

     

    Thanks.


    If it is something that happens rarely, it will be difficult to figure out, because you would have to wait until it happens to capture the state of the user.

     

    AAA test server has a raw authentication that might not have all the attributes your freeradius server is looking for, so it might fail.  The NPS server might not be looking for any more attributes and maybe you should switch back to that to see if your issue continues. The fact that AAA test server is failing at least means that authentication is making it to your radius server(s).

     

     


    This is what we suspected as it was responding immediately with reject and the requests from the controllers look a bit different to a user device. It has made it rather annoying to troubleshoot any auth issues because anyone looking at it immediately says auth is failing but in our production wireless it works fine (beside these random disconnects)


    You can create something on freeradius that will respond and authenticate to the specific attributes sent by the controller in the AAA test server test, so that you will know if that basic authentication works.  AAA test server was created to get a general response from a radius server, you will just have to tailor your radius server so it can respond to the test.  Everybody can require different attributes, so AAA Test is not expected to work perfectly in every environment.  You can make changes so that the AAA test works in yours...  At minimum it will test connectivity to your radius server.



  • 13.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 09:29 AM

    What happens when you create a set of new test credentials and try to aaa test those credentials?

     

    Have you tried creating a test SSID and using the internal database on the controller to authenticate your users? Is the behaviour the same?

     

    Also if i am not wrong, freeRadius is available as a windows .exe file?

    What happens when you configure another freeRadius server and add the IP of the new server to test the test SSID with the test credentials?

     

     

     

    --Give Kudos: found something helpful, important, or cool? Click Kudos Star in a post.
    --Problem Solved? Click "Accepted Solution" in a post.



  • 14.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 09:37 AM

    @Mr.RFC wrote:

    What happens when you create a set of new test credentials and try to aaa test those credentials?

     

    Have you tried creating a test SSID and using the internal database on the controller to authenticate your users? Is the behaviour the same?

     

    Also if i am not wrong, freeRadius is available as a windows .exe file?

    What happens when you configure another freeRadius server and add the IP of the new server to test the test SSID with the test credentials?

     

     

     

    --Give Kudos: found something helpful, important, or cool? Click Kudos Star in a post.
    --Problem Solved? Click "Accepted Solution" in a post.


    @kuairhead already wrote that it is not working because the AAA test server is not sending all the attributes required by his radius server.



  • 15.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 09:45 AM

    @cjoseph wrote:

    @Mr.RFC wrote:

    What happens when you create a set of new test credentials and try to aaa test those credentials?

     

    Have you tried creating a test SSID and using the internal database on the controller to authenticate your users? Is the behaviour the same?

     

    Also if i am not wrong, freeRadius is available as a windows .exe file?

    What happens when you configure another freeRadius server and add the IP of the new server to test the test SSID with the test credentials?

     

     

     

    --Give Kudos: found something helpful, important, or cool? Click Kudos Star in a post.
    --Problem Solved? Click "Accepted Solution" in a post.


    @kuairhead already wrote that it is not working because the AAA test server is not sending all the attributes required by his radius server.


    As said I would expect no matter the account it would respond with reject. We have tried loads of different accounts that all can access the wifi ok and all get the reject response when doing AAA test but otherwise auth ok and get assigned the role they should get etc in production on an actual wireless device.

     

    With regards to test SSID etc I have not tried that but as I have not experienced this issue myself in the 3 or so months the controllers have been live I wouldn't expect to see it on a test SSID and can't ask the hundred or so users in the production environment to use the test SSID so pretty limited there as well unfortunately.

     

    We initially suspected the issue was with FreeRADIUS as we didn't have any reports of issues using NPS but we pointed auth back to NPS and we got reports from users so can only assume the issue is configuration of the controllers or AOS 8 behaviour.

     

    Thanks.



  • 16.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 09:47 AM

     

    We initially suspected the issue was with FreeRADIUS as we didn't have any reports of issues using NPS but we pointed auth back to NPS and we got reports from users so can only assume the issue is configuration of the controllers or AOS 8 behaviour.

     

    Thanks.


    What was the corresponding message on the NPS server?  Did that make you think you had the same issue?



  • 17.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 10:18 AM

    @cjoseph wrote:

     

    We initially suspected the issue was with FreeRADIUS as we didn't have any reports of issues using NPS but we pointed auth back to NPS and we got reports from users so can only assume the issue is configuration of the controllers or AOS 8 behaviour.

     

    Thanks.


    What was the corresponding message on the NPS server?  Did that make you think you had the same issue?


    It was decided to move to only use FreeRADIUS on our AOS8 environment going forward (not my decision) so I can't point it back to NPS to test and try and find logs when users are affected. In the time we were on NPS before whenever the issue was reported it was usually after the fact without precise times so I was unable to find anything in the logs.

     

    Our AOS 6.5 environment does not have the same issue when using the same FreeRADIUS and NPS servers so that makes me believe it is the AOS8 config as the problem.

     

    Thanks.



  • 18.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 09:54 AM

    @kuairhead wrote:

    @cjoseph wrote:

    @Mr.RFC wrote:

    What happens when you create a set of new test credentials and try to aaa test those credentials?

     

    Have you tried creating a test SSID and using the internal database on the controller to authenticate your users? Is the behaviour the same?

     

    Also if i am not wrong, freeRadius is available as a windows .exe file?

    What happens when you configure another freeRadius server and add the IP of the new server to test the test SSID with the test credentials?

     

     

     

    --Give Kudos: found something helpful, important, or cool? Click Kudos Star in a post.
    --Problem Solved? Click "Accepted Solution" in a post.


    @kuairhead already wrote that it is not working because the AAA test server is not sending all the attributes required by his radius server.


    As said I would expect no matter the account it would respond with reject. We have tried loads of different accounts that all can access the wifi ok and all get the reject response when doing AAA test but otherwise auth ok and get assigned the role they should get etc in production on an actual wireless device.

     

    With regards to test SSID etc I have not tried that but as I have not experienced this issue myself in the 3 or so months the controllers have been live I wouldn't expect to see it on a test SSID and can't ask the hundred or so users in the production environment to use the test SSID so pretty limited there as well unfortunately.

     

    We initially suspected the issue was with FreeRADIUS as we didn't have any reports of issues using NPS but we pointed auth back to NPS and we got reports from users so can only assume the issue is configuration of the controllers or AOS 8 behaviour.

     

    Thanks.


    What do you see in the security logs on the controller for any of the users experiencing this issue? 

     

    Command : show log security all | include <mac of user>

     

    Note: This command does not need a live client

     

    Do these logs indicate that the issues could be the same in both scenarios? (when using freeradius and when using NPS)

     

     

     

    --Give Kudos: found something helpful, important, or cool? Click Kudos Star in a post.
    --Problem Solved? Click "Accepted Solution" in a post.



  • 19.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 10:32 AM

    @Mr.RFC wrote:

    @kuairhead wrote:

    @cjoseph wrote:

    @Mr.RFC wrote:

    What happens when you create a set of new test credentials and try to aaa test those credentials?

     

    Have you tried creating a test SSID and using the internal database on the controller to authenticate your users? Is the behaviour the same?

     

    Also if i am not wrong, freeRadius is available as a windows .exe file?

    What happens when you configure another freeRadius server and add the IP of the new server to test the test SSID with the test credentials?

     

     

     

    --Give Kudos: found something helpful, important, or cool? Click Kudos Star in a post.
    --Problem Solved? Click "Accepted Solution" in a post.


    @kuairhead already wrote that it is not working because the AAA test server is not sending all the attributes required by his radius server.


    As said I would expect no matter the account it would respond with reject. We have tried loads of different accounts that all can access the wifi ok and all get the reject response when doing AAA test but otherwise auth ok and get assigned the role they should get etc in production on an actual wireless device.

     

    With regards to test SSID etc I have not tried that but as I have not experienced this issue myself in the 3 or so months the controllers have been live I wouldn't expect to see it on a test SSID and can't ask the hundred or so users in the production environment to use the test SSID so pretty limited there as well unfortunately.

     

    We initially suspected the issue was with FreeRADIUS as we didn't have any reports of issues using NPS but we pointed auth back to NPS and we got reports from users so can only assume the issue is configuration of the controllers or AOS 8 behaviour.

     

    Thanks.


    What do you see in the security logs on the controller for any of the users experiencing this issue? 

     

    Command : show log security all | include <mac of user>

     

    Note: This command does not need a live client

     

    Do these logs indicate that the issues could be the same in both scenarios? (when using freeradius and when using NPS)

     

     

     

    --Give Kudos: found something helpful, important, or cool? Click Kudos Star in a post.
    --Problem Solved? Click "Accepted Solution" in a post.


    Checking the logs for MAC address of the user device affected yesterday I see the following, none of which relates to yesterday. Not sure if any help.

     

    Aug 19 17:36:05 dot1x-proc:1[4592]: <138057> <4592> <ERRS> |dot1x-proc:1| Failed to send the radius request for Station b8:8a:60:xx:xx:xx b4:5d:50:xx:xx:xx
    Aug 29 14:46:11 dot1x-proc:2[4595]: <138094> <4595> <WARN> |dot1x-proc:2| MIC failed in WPA2 Key Message 2 from Station b8:8a:60:xx:xx:xx b4:5d:50:xx:xx:xx AP-b4:5d:50:xx:xx:xx
    Aug 30 16:01:20 dot1x-proc:1[4592]: <138094> <4592> <WARN> |dot1x-proc:1| MIC failed in WPA2 Key Message 2 from Station b8:8a:60:xx:xx:xx b4:5d:50:7c:5c:d0 AP-b4:5d:50:xx:xx:xx
    Sep 25 15:37:50 authmgr[3926]: <199802> <3926> <ERRS> |authmgr| dot1x.c, auth_handle_dot1x_key_handshake_data:4957: Key handshake data received for unknown user b8:8a:60:xx:xx:xx

     

    The user who experienced issue yesterday was challenged for their credentials when roaming, then credentials were rejected even though correct so they left their device alone for around 7-8 minutes and after that it reconnected by itself. Not sure if a timer at play somewhere relating to this? The RADIUS requests when the issue occured seem to show the user went from an AP on one controller to an AP on the other controller in the cluster and at that point got the RADIUS login failed although this doesn't seem to be the case for other users when they experienced issues as it is roaming between APs on the same controller.

     

    Thanks.



  • 20.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 10:53 AM

    The logs show stale entries and are from a week back.

     

    Did you happen to enable user-debug for this user yesterday?



  • 21.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 11:09 AM

    @Mr.RFC wrote:

    The logs show stale entries and are from a week back.

     

    Did you happen to enable user-debug for this user yesterday?


    Not yet. It appears to happen randomly so it is unlikely it will happen again for this same user anytime soon so I can enabled a debug and it might never occur to them again.



  • 22.  RE: AOS 8.4.0.3 Roaming disconnects

    Posted Oct 03, 2019 09:49 AM

    @cjoseph wrote:

    @Mr.RFC wrote:

    What happens when you create a set of new test credentials and try to aaa test those credentials?

     

    Have you tried creating a test SSID and using the internal database on the controller to authenticate your users? Is the behaviour the same?

     

    Also if i am not wrong, freeRadius is available as a windows .exe file?

    What happens when you configure another freeRadius server and add the IP of the new server to test the test SSID with the test credentials?

     

     

     

    --Give Kudos: found something helpful, important, or cool? Click Kudos Star in a post.
    --Problem Solved? Click "Accepted Solution" in a post.


    @kuairhead already wrote that it is not working because the AAA test server is not sending all the attributes required by his radius server.


    Ah missed that while typing my reply. I was wondering what happens when the internal database on the controller is used for authentication for a test user in this case. (If this works fine, then could move on to attribute configuration on the freeradius server by narrowing it down to the attributes usually sent by the client as you have mentioned.)

     

    Also the reason i have asked to test with a new freeRadius server was to see if the default base FreeRadius server exhibits this behaviour or is it because if something was configured on it.

     

    I have seen base freeradius server work fine for some time and they go berserk for a while. In one scenario I had to remove the server completely and re-install the server to make it work fine. Just a quick fix though.