sFlow is a very light protocol based on sampling. In short, it picks 1 packet out of 100 (default:configurable), remove the content just keeping layer 1 to 7 header information, add RMON interface statistics, and send the the short resulting packet like a SNMP trap to an aggregator (in your case the IMC/NTA server). So it uses very low bandwith and very little resource on your switch or router, the sampling being done in the ASIC.
However the IMC server must to handle these samples, aggregate them and load them in the NTA DB in an almost real-time. THIS takes some resources.
If I understand corectly your environment you want to aggregate the samples of 500 routers. I would not higher than 20 samples/s, which means that you should configure your routers to send 1 sample/30s. This will give you a rough accuracy of bandwidth usage and users.
The best is to go for a IMC distributed environment and have separate server(s) just for NTA. You could then short the sampling interval to 1/s.
I hope it helps