02-09-2017 11:00 AM
I've set up SNMP traps to send to our Solarwinds server and found that the trap wlsxAPNumDown seems to indicate an AP failure, reboot, or port down issue. Most of the time it works great and we can see on the controller that an AP did in fact reboot or lose connectivity temporarily. Other times we receive these traps for what seems like no reason at all. The AP has been up for 200+ days in our most recent event.
The closest thing I can find that may be the problem is that our secondary controller may be trying to take over as the Master in a VRRP configuration, causing the AP to "go down" on the primary controller. Unfortunately I cannot find any good evidence that there is a switching issue which would cause a loss of communication between the two. Does anyone see a misconfiguration in the VRRP config below?
peer-ip-address 10.1.1.3 ipsec <passphrase>
ip address 10.1.1.1
preempt delay 0
peer-ip-address 10.1.1.2 ipsec <passphrase>
ip address 10.1.1.1
Solved! Go to Solution.
02-13-2017 09:47 AM
We received more of these snmp traps. After investigating the site from layer 1 up, we have found:
No port errors
No port discards
No high bandwidth usage
No spanning-tree events
No packet loss or latency during the noted time period
No reports of network issues during this time
No routing table changes (all static)
AP's are reporting uptime of several weeks, so they did not reboot or lose connectivity
Are there any suggestions as to why this may be occurring? I also have a ticket open with HPE support trying to identify what could be causing this.
02-16-2017 12:27 PM
Ok so I've been using the wrong snmp trap for alerts based on what we wanted: an up/down alert for when AP's go offline.
Don't use wlsxAPNumDown for this, as it is only tracking a counter that increments when an alarm is added. Obviously you may want to look into alarms if this fires, but you should actually use wlsxNAccessPointIsDown and wlsxNAccessPointIsUp for basic up/down.