Hello fellow airheads,
Long time listener, first time poster. I've been playing around with ArubaOS-CX on a 6300M in our testbed for the past couple weeks, and was pleasantly surprised to see everything I cared about came through in LibreNMS. I can see all the port stats, environment details, CPU/Memory usage, it's exactly what I was hoping would happen!
However, after adding a few of our 6410 ArubaOS-CX switches, it seems like after a few days all of the ports disappear and I need to rediscover the device. Still digging into the logs on my side; oddly enough I have no issues with that 6300M or any of the Cisco gear also being monitored.
Just curious if anyone else has had any difficulties with the 6410 in LibreNMS?
Hello! I'm particularly interested about your setup.
I too use LibreNMS but I'm monitoring just a few specific ArubaOS-Switch based switches and I still left our Aruba 8320 VSX under the HPE IMC umbrella...so I'm quite interested to understand what is your experience with LibreNMS and ArubaOS-CX (say 10.4 or 10.5) and how you configured SNMP on ArubaOS-CX side (with HPE IMC I used a strict SNMPv3 approach).
Can you share any relevant config with us?
Great to hear from you, happy to share more details!
I'm using a dirt simple SNMPv3 authNopriv config for LibreNMS polling, here's a sanitized version of what's applied within AOS-CX:
access-list ip management-access
10 comment Permit established traffic
20 permit tcp any any established
30 comment Permit LibreNMS
40 permit udp (librenms_ip) any range 161 162
50 comment Permit SSH from mgmt server
60 permit tcp (ssh_ip) any eq 22
snmp-server vrf default
snmp-server system-location (location)
snmp-server system-contact (noc email)
snmp-server community (string)
snmpv3 security-level auth
snmpv3 user (username) auth md5 auth-pass ciphertext (password)
interface vlan (mgmt_vlan)
apply access-list ip management-access routed-in
What's really crazy is both the 6300M and the 6410 are running 10.05.0011 code, very similar configuration. I've attached a screenshot of when I mouseover the switches in the devices list, the CPU/Memory graphs are fantastic on both, but no traffic for the 6410.
Also including a somewhat redacted screenshot of the device list, LibreNMS knows about the 312 ports on the two 6410 switches, so I don't think it's a discovery problem, even though the problem seems to briefly goes away when I re-discover.
At first I figured I'm just doing something stupid with permissions for the snmpv3 user, but the fact that everything is working on the 6300M is throwing me off. We have a few Cisco switches with over 1000 ports being monitored in this LibreNMS instance without any problems, so I don't think it's a server resources issue either.
I'm tempted to just delete the switch in LibreNMS and retry adding just with SNMP v2c temporarily to take v3 issues out of the picture. Would really like to stay with v3 though.
May be a bug on SNMP with 6410... do you have look the last release ?
I'm running 10.05.0011 on both the 6300M and one of the two 6410 switches (the other is running 10.05.0001 since there's never a good time to reboot it). Took a quick look at the 10.05.0020 release notes, there's an SNMP bug (SR89339) for the 6300M, but nothing regarding the 6400 series. Which is especially odd considering the 6300M is my only AOS-CX switch actually working 100% properly in LibreNMS!
Maybe tomorrow I'll try to get a pcap of the SNMP traffic for both the 6300M and one of the 6410 chassis, perhaps there will be some clues there.
Especially curious if anyone else with a 6410 chassis has ran into this in LibreNMS; I'm running a fairly vanilla install with the daily update script enabled, first time I've seen a switch behave like this.
Well I don't have a solution for this, but I think I found the problem and a workaround: open the ArubaOS-CX device in LibreNMS, wait a good ~5 min for LibreNMS polling cron job to occur, click the ports tab, and the port + total traffic graphs reappear like clockwork.
Pcaps of both the 6410 & 6300M traffic to LibreNMS looked sensible; I'm no SNMP expert, but I could see all the SNMPv3 GetBulkRequest & get-response traffic in wireshark, so this is making me lean towards something inside of LibreNMS as the culprit.
After clicking around the GUI for a bit, I noticed the following error message after clicking "show RRD command" on the total traffic graph, which seems to be a really good clue
When I go to the 6300M total traffic graphs that are actually working, I see hundreds of lines of the following in the "show RRD command" section (would post a screenshot, but I'd rather not share my switch FQDNs with the whole internet)
DEF:inB0=/opt/librenms/rrd/(my switch's FQDN)/port-id2943.rrd:INOCTETS:AVERAGE DEF:outB0=/opt/librenms/rrd/(my switch's FQDN)/port-id2943.rrd:OUTOCTETS:AVERAGE CDEF:octets0=inB0,outB0,+ CDEF:inbits0=inB0,8,* CDEF:outbits0=outB0,8,* CDEF:outbits0_neg=outbits0,-1,* CDEF:bits0=inbits0,outbits0,+ VDEF:totinB0=inB0,TOTAL VDEF:totoutB0=outB0,TOTAL VDEF:tot0=octets0,TOTAL AREA:inbits0#CAE853:'1/1/47 In' GPRINT:inbits0:LAST:%6.2lf%sbps GPRINT:inbits0:AVERAGE:%6.2lf%sbps GPRINT:inbits0:MAX:%6.2lf%sbps GPRINT:totinB0:%6.2lf%sB COMMENT:'' HRULE:999999999999999#CC7CCC:' Out' GPRINT:outbits0:LAST:%6.2lf%sbps GPRINT:outbits0:AVERAGE:%6.2lf%sbps GPRINT:outbits0:MAX:%6.2lf%sbps GPRINT:totoutB0:%6.2lf%sB COMMENT:''
I googled "ERROR: string ends after the = sign librenms" and got a very large number of ideas to tshoot this further within my LibreNMS install.
After spending a good 10-15 min on google, I went back to my 6410 total traffic graph which produced the RRD error, and all the data magically appeared! I'm not 100% sure if this is due to my LibreNMS machine being underpowered, or if I need to leave the page open long enough for the LibreNMS poller to update the data, but this feels like a decent repeatable workaround.
There's a good chance we might end up rebuilding LibreNMS on a newer distro soon, so I think I'm going to live with the workaround on these two 6410s for now. Super curious if anyone else runs into 6410 issues with LibreNMS, or if this is just something wacky on my instance. Thanks for the ideas everyone!
Hi! glad you found a workaround.
I'll register our VSX - Aruba 8320 based - and see what is going to happen (one thing I'm worried about is how VSX LAG interfaces are going to be represented/monitored).
My actual SNMPv3 configuration on ArubaOS-CX 10.3 is this one:
logging <HPE-IMC-NMS-IP-address> severity warning vrf mgmt include-auditable-events
snmp-server vrf mgmt
snmp-server system-description <system-name>
snmp-server system-location <system-location>
snmp-server system-contact <system-contact>
snmpv3 user <snmp-user> auth sha auth-pass ciphertext <ciphered-auth-password> priv aes priv-pass ciphertext <ciphered-priv-password>
snmp-server host <HPE-IMC-NMS-IP-address> inform version v3 user <snmp-user>
snmp-server host <HPE-IMC-NMS-IP-address> trap version v3 user <snmp-user>
as you see, it's quite planar.
I'll not change the logging, SNMP inform and trap settings since I'm going to perform just a test (but I don't deny I could switch the monitoring of our VSX to LibreNMS in a very near future).
logging <HPE-IMC-NMS-IP-address> severity warning vrf mgmt include-auditable-eventssnmp-server vrf mgmtsnmp-server system-description <system-name>snmp-server system-location <system-location>snmp-server system-contact <system-contact>snmpv3 user <snmp-user> auth sha auth-pass ciphertext <ciphered-auth-password> priv aes priv-pass ciphertext <ciphered-priv-password>snmp-server host <HPE-IMC-NMS-IP-address> inform version v3 user <snmp-user>snmp-server host <HPE-IMC-NMS-IP-address> trap version v3 user <snmp-user>
At Aruba, we believe that the most dynamic customer experiences happen at the Edge. Our mission is to deliver innovative solutions that harness data at the Edge to drive powerful business outcomes.
© Copyright 2021 Hewlett Packard Enterprise Development LPAll Rights Reserved.