A little follow-up on this case.
Working with TAC has been great. We have been able to properly set Airwave to receive traps and have them processed relatively quickly.
What they had found was the following:
the async_logger_client_debug file showed the following
the payload_timestamp vs the current_timestamp was something like 2 hours off.
We had to do the following:
AMP Setup - General -> Monitoring Processes is now set to 6
Each controllers group now has a 10 min polling period whereas the AP group has 5 min.
They also applied a script so that AIrwave ignores 3 particular traps coming from the controllers, this was done so that I did not have to disabled the 3 traps in 28 controllers. I can post the script if needed.
Currently the server is handling around ~20 million traps.
What we also noticed is the following:
Each controller (28 of them, 21 active, 7 backups) has the same subnet range for the RAPS, (e.g. 192.168.0.1 to 254)
Even though these raps will never bounce between active controllers, Airwave is getting confused in terms of processing traps for a particular RAP
RAPA for Location A will have ip 192.168.1.5 on Controller 1
RAPB for Location B will have ip 192.168.1.5 on Controller 2
In the Device Events for RAPA, you will sometimes see events for RAPB because it shares the same IP, TAC had advised me that this was working as designed because the traps for APs are based off the IP address and not the MAC address.
Interesting, so we may be looking at changing the subnets on all of the controllers for the RAPs so that they do not overlap.