Hi Herman,
It helps to think about things in terms of two distinct categories:
1. Policies: The is effectively the business rules we are required to conform to. These are what directly governs compliance.
2. Technology: Transitory stuff beneath the hood that can make the user experience better - and PMK and OKC certainly do, but isn't as important as policy compliance.
The PMK cache affects both when it should only affect point 2. In our case we have multiple WLANs, each of which is governed by different policies. What the inability to specify a per WLAN timeout (somewhat) undermines is the timeframe in which a client is required to re-authenticate, which in turn undermines parts of some policies.
As an example, we have two guest-like networks where we have set the "re-auth interval" to one hour, except that the re-authentication doesn't actually happen after an hour on account of the client being validated via PMK matching for a static figure of eight hours. We have been able to observe this using a Samsung S3 Galaxy as the test client while watching the RADIUS server for authentication attempts (which of course never appear).
So, what this does is limit our options to either:
1. Opting out of the better user experience that PMK provides.
2. Prioritising user experience over policy (via enabling "authentication survival", or PMK).
3. Reduce the PMK cache interval to the lowest common denominator.
Point 2 is interesting because this is where the PMK caching interval matters the most. Policy owners are reluctantly agreeable to small variations here and there, but eight hours compared to the requirement of one hour is not a small variation.
For now, we've opted for option 3 as something of a compromise. But as you can no doubt appreciate, it results in a modest amount of extra full 802.1x RADIUS authentication conversations that otherwise could have been avoided.
As a side note, I'm not sure if it's a bug or I'm doing something wrong, but the PMK cache doesn't seem to be honouring the value specified via "auth-survivability cache-time-out" anyway. Here's our running config value taken from "show running-config | include cache":
auth-survivability cache-time-out 2
And here's our PMK cache contents for this controller:
PMK Cache Table
---------------
Client MAC Key OKC/11r Expiry Name Role VLAN ESSID
---------- --- ------- ------ ---- ---- ---- -----
00:38:df:b5:02:c0 C6CAEE720C74... 11r 7h:38m:57s Juniper_Voice 201 Juniper_Voice
4c:34:88:a1:53:6f C253B18912AB... okc 7h:19m:9s host/TP09.uchwa.com Juniper_data 101 Juniper_Staff
4c:34:88:a1:55:86 0DD760D11907... okc 7h:8m:46s host/tb48.uchwa.com Juniper_data 101 Juniper_Staff
4c:34:88:a0:b5:27 98C6606FE301... okc 6h:45m:15s host/TP32.uchwa.com Juniper_data 101 Juniper_Staff
c4:b9:cd:b4:4e:92 DE386929A18B... 11r 7h:38m:58s Juniper_Voice 201 Juniper_Voice
4c:34:88:a0:b5:2c B5EF5D6701C5... okc 7h:30m:40s host/tb76.uchwa.com Juniper_data 101 Juniper_Staff
4c:34:88:a0:e4:43 0763AE42796E... okc 6h:46m:35s host/tb58.uchwa.com Juniper_data 101 Juniper_Staff
PMK Cache Count:7
You can see the duration for the entries is clearly still based on the original eight hour default.
PMK (and OKC) do make a tangible difference, and we have enough mobility in our guests (most of whom are vendors that constantly roam around between Finance, ICT and training facilities covered by the same instant controller) that we'd love to be able to benefit from it, but can't because of the static nature of the entries in the PMK cache. If the duration was localised to the ssid-profile definition (even if the PMK cache itself is managed by a global process) then we could deliver the best experience possible to all the different WLAN consumers while still placating the policy owners.
I don't suppose you happen to know the answer to the question of whether you can remove a client from the PMK cache on the instant controllers, do you? I've only seen an aaa command that relates to changing the behaviour of what happens to the PMK cache entry when a user is removed but this doesn't seem to apply to the instant OS - or at least not under the "aaa" context from what I can see.
I'm keen to know whether you can either (or both) remove an explicit entry from the PMK cache or change the default behaviour (as per the above aaa command) for when a user is disconnected such that the corresponding PMK cache table entry is also removed.
Cheers,
Lain