on 11-07-2014 03:16 AM
At the start of this past semester we encountered the following issue and I thought it may be useful for people to know about it.
Users were complaining that wireless wasn't working or would stop working or that it would disconnect or take forever to connect.
We have 2 SSID's
- captive portal
Observations (when I finally tracked it down)
- users associated to the access points just fine
- users were delivered an IP address via DHCP no issue
- users were NOT able to ping their gateway
- doing a lookup for the user name on the controller did not show them in the usertable (for that specific mac address)
- doing a lookup for the mac address showed it to be associated.
- if you left the client alone, in varying amounts of time the user would eventually start working, you did not need to disconnect/reconnect them.
- the problem happened always mid day (when we have the most students on campus).
- the problem was NOT happening on every controller
- The httpd process was running very high
The cause of the problem:
- The captive portal SSID was the source of the problem (aruba TAC told me)
- During problem events, looking at users counts for users in pre-auth role showed numbers higher than 2000 on the controllers
- additionally most of those users are 'smart-devices' and have apps that think they are online and so are constantly getting destination natted to the captive portal (on th e controller)
- this ate up resources causing one CRITICAL task (on all SSID"s) to get delayed.
- The process of putting the device into the user table was getting delayed. (if you are debuging that is the 'user miss' part) When I was debugging I could observe a client getting stuck JUST before that message, and then 5 to 15 minutes later get into the user table and be OK.
- rebalance your AP groups across your controllers. (we are fortunate enough to be able to do this)
- If you have money, you could buy clearpass; the mac address caching aspect of it will help (since the devices on the portal do not need to log in every time now)
- We are going get rid of captive portal, so people don't put their own devices on it, but not entirely (since we need a way for game consoles and other non-dot1x-supporting devices to get online)
We are going to have a captive portal SSID on an external CP site without login; since the aruba controllers allow you to do radius server based mac auth (aruba talks to your radius, your radius checks a database of mac addresses, if the mac is in there it gets put in the auth role)
Will that work? I will find out next year.