07-15-2009 02:02 PM
Here is what we are going to monitor with each program:
Users on specific SSIDs – self explanatory
APs on your backup controller – Monitoring this statistic gives you a jump on network issues if lose connectivity to another controller
CPU usage per controller – Just in case something is wrong with your controller you can monitor when the problem started
Wireless syslog – keeps track of rogue APs and interference
User syslog – keeps track of all user connects/disconnects and mobility trails
System syslog – keeps track of network, controller, and AP errors
Search - I chose Splunk not only as a syslog server, but a search and alerting tool. For example, let’s say someone calls into the helpdesk and says they are having issues connecting to the network. They may be able to tell you where they were and when they were having the issues, but they don’t know for sure. With Splunk, we can look up their MAC address to see where they were, what APs they were connecting to, and if there was an authentication type mismatch, etc. Search for where/when rogue APs popped up, are your users connecting/disconnecting too frequently when they are stationary, errors over a given period of time.
Reporting - I also use Splunk to report how many unique users were on the network during the week. This information helps for future growth planning. MRTG can tell you how many people were connected at a given time, but not unique users during a given period of time. I break down the reports per SSID.
Alerts - One thing that you may want to monitor is an increase in firewall activity on a controller. You can setup an alert to email you when there are a certain number of firewall logs generated within a specific amount of time. Let’s say 1500 messages within 5 minutes. In addition to receiving an email from an alert, you may want to setup a shell script perform an action on the controller, such as blacklisting a client for high firewall activity – this takes some simple programming. - Splunk can also monitor your core wireless switches. I use Splunk to alert me when there are any kind of link failures among the controllers and spanning tree kicks in.
Whether you have a small deployment and need to just monitor the system for problems, or have a large deployment and are concerned about security, you need some type of all in one solution. You can’t spend your time logging into the GUI and checking for rogues nor can you spend time logging into the CLI searching for firewall error logs. Have all this information at your fingertips and let software do the legwork for you.
MRTG is free and can be found at http://oss.oetiker.ch/mrtg/. Splunk can be found at www.splunk.com and has a free version that works for those who have under 500MB/day of syslogs, if you want more you have to pay for the license. 500MB/day is a lot of data from the Aruba controllers. You usually can get around 2,000,000 syslogs a day from the controllers and that is around 500MB. If you have “drop and log” policies on your CP, you may exceed 500MB/day since it will log all firewall violations. The Splunk database is very good about compressing the data, for 500MB of syslogs, it’s compressed into about 200MB.
07-16-2009 07:59 AM
07-16-2009 08:12 AM
I actually tried to use Cacti at first for the WLAN stats instead of MRTG. I tried for about a week to get it up and running, without success. We use Cacti for a lot of other things, but it just wouldn't work with Aruba. I know MRTG can be confusing to setup at first, but in the long run it's been great.
I'll post the MIBs I use in another post.
07-16-2009 08:36 AM
Authenticated 802.1x users per controller: 220.127.116.11.4.1.14818.104.22.168.22.214.171.124.0
Total number of users on a controller: 126.96.36.199.4.1.148188.8.131.52.184.108.40.206 - this can be deceiving since not all users may be authenticated into CP.
Number of APs on a particular controller(useful for finding out if any APs lost connectivity to its local switch and failed over to your backup controller. Run this only on the backup controller): 220.127.116.11.4.1.14818.104.22.168.22.214.171.124
CPU usage: 126.96.36.199.4.1.148188.8.131.52.184.108.40.206.3.1
09-22-2009 02:51 PM
The Defaults file contains the cricket definitions, the controllers file contains the list of controllers to monitor.
The aruba_ap_long file is a definition, and list of aps to monitor in vast detail. You really only want to do that level of monitoring if you're debugging a problem. Too much snmp runs up the cpu on the controllers.
11-04-2009 01:22 PM
I was hoping to graph a couple more things (e.g. current number of 'n' users on the network, number of users on a specific vlan). Do things like this have specific MIB's? or would I need something like cricket to graph this. I've been going through the aruba-mib pdf, but still learning the whole snmp thing.
11-05-2009 07:34 AM
220.127.116.11.4.1.14818.104.22.168.22.214.171.124.0 is authenticated CP users
126.96.36.199.4.1.148188.8.131.52.184.108.40.206.0 is 802.1x users
I haven't been able to find a MIB for specific vlans, but we have our's already broken up into specific vlans based on CP/802.1x so the above MIBs work.
06-09-2010 12:07 PM
Do you have any good search tips that will help find / alert to authentication errors, blacklists, problematic ap's?
06-10-2010 07:21 AM
1. Weekly alerts for total user count. This is something MRTG can't give you an accurate count on.
"user authenticated" "-WPA-radius" startdaysago=7 | stats distinct_count(Name)
2. Spanning tree events
"corecisco1" "state Standby -> Active" startminutesago=5
3. Alerting when someone creates a guest account. This actually triggers a shell script on the server to email me the details.
4. Guest account access. We like to keep all authentication logs in one place. So when an aruba-DB guest logs into wireless we want that information to be pushed to our IAS server. The shell script will reformat the text from "local user-db-guest" syslog and place it within the IAS server through a file share. Then the IAS server integrates the log into it's logs so it's now all in one place.
Since you now have splunk setup, you will probably notice stuff you don't need. I recommend also creating a script on the splunk server that cleans certain events daily. Here is what my script looks like:
/opt/splunk/bin/splunk search ' | oldsearch delete::501101' -auth admin:xxxxxx
/opt/splunk/bin/splunk search ' | oldsearch delete::Mismatch' -auth admin:xxxxxx
/opt/splunk/bin/splunk search ' | oldsearch delete::handle_sapcp' -auth admin:xxxxxx
/opt/splunk/bin/splunk search ' | oldsearch delete::dst-nat' -auth admin:xxxxxx
/opt/splunk/bin/splunk search ' | oldsearch delete::cloned' -auth admin:changeme