04-03-2017 11:49 AM
Was wondering if anyone has had any problems with with IAP 325 failing to connect back to the cluster after a firmware upgrade? We recently upgraded our 22 sites from 126.96.36.199-188.8.131.52 to 184.108.40.206-220.127.116.11 (649 in total) and about 10 of the units failed to rejoin the cluster. After further review it seems that the problem is a corruption in the firmware, but this had caused us some concerns on future upgrades. Especially since some of our AP's aren't easiliy accessible.
Here is what we did for the upgrade process (We have Airwave, but not fully trained so I did it manually) Also, this was down after hours when the load was low on the network and no bandwidth constraints.:
- The firmware was downloaded from a Aruba Support to my admin machine
- Connected to each sites VC and uploaded the firmware (Maintenance->Firmware)
- Firmware load would say 'Uploading' for maybe a minute or two.
- AP's would start rebooting then one or two at a site would fail.
Troubleshooting failed AP's.
- Failed AP's would be in a continous reboot, as we could see consistent POE events on the switches every +/- 25 secs.
- Removed power for the ports of the failed AP's and left off for 60 seconds, then restored with the same results.
- Drove to the sites and manually pulled the ethernet cable and had the same problem (Expected this)
- Connected to the console port and watched the load
- RSA Signature was verified
- Reported updated firmware version
- would receive a 'Structure needs cleaning' message and the system would reboot.
- Pushed the reset pin in and would get the AP to factory reset. We would continue to get the same problem (Verified the reset via the console connection)
- Checked the ports that the AP's were connected to to verify any port errors, where there were none.
- Change the boot partition in the APBoot menu to partition 0. This would make the AP reboot into the older code, connect to the Cluster, then perform an upgrade and then the AP would be in the reboot cycle (This is what led me to believe it is a firmware corruption)
What seemed to fix the issue:
- Boot the AP with the console cable connected and enter into the APBoot menu.
- Run 'factory_reset'
So my questions are:
- Is this common?
- Anyway to prevent it in the future?
- Anyway to fix this on the wire and not have physical access to it?
- Was this due to moving from Early Available code (VAR installed) vs General Release code?
- Why didn't the physical 'pin push' reset fix the issue and we had to run the 'factory_reset' from the console?
04-10-2017 10:05 PM
It is bit difficult to say what could have caused the IAP's to land in to this state.
However, I can suggest the following:
1. While upgrading the IAP's, there is an option "Reboot AP's after upgrade".
The above option is enabled by default. You can uncheck this option & then click on the option "Upgrade Now"
We can easily check the state of each IAP by executing the following commands on the VC:
Once this output shows that all the IAP's have been upgraded, you can just execute the command:
Note: There was an issue seen with upgrading IAP's using the local file option (Web UI) which got fixed in 18.104.22.168
So, not sure if you could have been impacted by that.