Controllerless Networks

last person joined: 2 days ago 

Instant Mode - the controllerless Wi-Fi solution that's easy to set up, is loaded with security and smarts, and won't break your budget
Expand all | Collapse all

Instant OS and the PMK cache

This thread has been viewed 9 times
  • 1.  Instant OS and the PMK cache

    Posted Jan 08, 2018 01:04 AM

    Hi folks,

     

    Is there a way within the Instant OS to specify something less than eight hours as the PMK cache entry lifetime? Our version information is as follows:

     

    ArubaOS (MODEL: W-AP215), Version 6.5.4.2 (Dell variety)

     

    I've changed the value to 2 via the "auth-survivability cache-time-out" but this isn't impacting new connections, as verified "show ap pmkcache".

     

    Additionally, is there a way to expunge an entry from the PMK cache table? Disconnecting a user doesn't achieve this.

     

    Finally, it appears that the PMK caching period is a global variable. Is there a uservoice forum or something of that nature where suggestions can be made, as it would be quite useful to have this configurable per WLAN?

     

    I'm not a networking hardware guy, so excuse my ignorance if the technical questions have readily available answers, as I did try to resolve them through searching beforehand.

     

    Cheers,

    Lain



  • 2.  RE: Instant OS and the PMK cache

    EMPLOYEE
    Posted Jan 08, 2018 11:57 AM

    Lain,

    Can you explain what you are trying to achieve? I have not seen the request before to adjust PMK cache time parameters or purge the PMK cache. It is not something you typically would change in my experience.



  • 3.  RE: Instant OS and the PMK cache

    Posted Jan 08, 2018 06:10 PM

    Hi Herman,

     

    It helps to think about things in terms of two distinct categories:

     

    1. Policies: The is effectively the business rules we are required to conform to. These are what directly governs compliance.

    2. Technology: Transitory stuff beneath the hood that can make the user experience better - and PMK and OKC certainly do, but isn't as important as policy compliance.

     

    The PMK cache affects both when it should only affect point 2. In our case we have multiple WLANs, each of which is governed by different policies. What the inability to specify a per WLAN timeout (somewhat) undermines is the timeframe in which a client is required to re-authenticate, which in turn undermines parts of some policies.

     

    As an example, we have two guest-like networks where we have set the "re-auth interval" to one hour, except that the re-authentication doesn't actually happen after an hour on account of the client being validated via PMK matching for a static figure of eight hours. We have been able to observe this using a Samsung S3 Galaxy as the test client while watching the RADIUS server for authentication attempts (which of course never appear).

     

    So, what this does is limit our options to either:

     

    1. Opting out of the better user experience that PMK provides.

    2. Prioritising user experience over policy (via enabling "authentication survival", or PMK).

    3. Reduce the PMK cache interval to the lowest common denominator.

     

    Point 2 is interesting because this is where the PMK caching interval matters the most. Policy owners are reluctantly agreeable to small variations here and there, but eight hours compared to the requirement of one hour is not a small variation.

     

    For now, we've opted for option 3 as something of a compromise. But as you can no doubt appreciate, it results in a modest amount of extra full 802.1x RADIUS authentication conversations that otherwise could have been avoided.

     

    As a side note, I'm not sure if it's a bug or I'm doing something wrong, but the PMK cache doesn't seem to be honouring the value specified via "auth-survivability cache-time-out" anyway. Here's our running config value taken from "show running-config | include cache":

     

    auth-survivability cache-time-out 2

     

    And here's our PMK cache contents for this controller:

     

     

    PMK Cache Table
    ---------------
    Client MAC         Key              OKC/11r  Expiry      Name                 Role           VLAN  ESSID
    ----------         ---              -------  ------      ----                 ----           ----  -----
    00:38:df:b5:02:c0  C6CAEE720C74...  11r      7h:38m:57s                       Juniper_Voice  201   Juniper_Voice
    4c:34:88:a1:53:6f  C253B18912AB...  okc      7h:19m:9s   host/TP09.uchwa.com  Juniper_data   101   Juniper_Staff
    4c:34:88:a1:55:86  0DD760D11907...  okc      7h:8m:46s   host/tb48.uchwa.com  Juniper_data   101   Juniper_Staff
    4c:34:88:a0:b5:27  98C6606FE301...  okc      6h:45m:15s  host/TP32.uchwa.com  Juniper_data   101   Juniper_Staff
    c4:b9:cd:b4:4e:92  DE386929A18B...  11r      7h:38m:58s                       Juniper_Voice  201   Juniper_Voice
    4c:34:88:a0:b5:2c  B5EF5D6701C5...  okc      7h:30m:40s  host/tb76.uchwa.com  Juniper_data   101   Juniper_Staff
    4c:34:88:a0:e4:43  0763AE42796E...  okc      6h:46m:35s  host/tb58.uchwa.com  Juniper_data   101   Juniper_Staff
    PMK Cache Count:7

    You can see the duration for the entries is clearly still based on the original eight hour default. 

     

    PMK (and OKC) do make a tangible difference, and we have enough mobility in our guests (most of whom are vendors that constantly roam around between Finance, ICT and training facilities covered by the same instant controller) that we'd love to be able to benefit from it, but can't because of the static nature of the entries in the PMK cache. If the duration was localised to the ssid-profile definition (even if the PMK cache itself is managed by a global process) then we could deliver the best experience possible to all the different WLAN consumers while still placating the policy owners.

     

     

    I don't suppose you happen to know the answer to the question of whether you can remove a client from the PMK cache on the instant controllers, do you? I've only seen an aaa command that relates to changing the behaviour of what happens to the PMK cache entry when a user is removed but this doesn't seem to apply to the instant OS - or at least not under the "aaa" context from what I can see.

     

    I'm keen to know whether you can either (or both) remove an explicit entry from the PMK cache or change the default behaviour (as per the above aaa command) for when a user is disconnected such that the corresponding PMK cache table entry is also removed.

     

    Cheers,

    Lain



  • 4.  RE: Instant OS and the PMK cache

    Posted Jan 08, 2018 06:17 PM

    Hi Herman,

     

    I missed answering part of your question that relates to the context of "why" in relation to removing PMK entries.

     

    For removing individual entries:

    1. For testing clients and new RADIUS policies without having to remove authentication survivability from the WLAN.

    2. For emergency responses where it's critical to eject a suspect client from the network.

     

    For changing the default behaviour on user disconnection:

    1. For policy compliance (so the client can't undermine policy compliance through a PMK/OKC match and therefore never hitting RADIUS).

     

    Cheers,

    Lain

     



  • 5.  RE: Instant OS and the PMK cache

    EMPLOYEE
    Posted Jan 09, 2018 06:59 AM

    Which radius server are you using? Typically you would disconnect or change the authorization status a user that has already authenticated via a Radius COA (change of authorization) that is initiated by the radius server.

     

    Authentication Survivability is only available with ClearPass, and it is typically used for clients where the radius server is not available, so it might not be applicable here.

     

    The PMK cache helps clients not have to do a full radius authentication when they roam back to APs that they have alreay been on, but it is not typically used to stop clients from authenticating successfully.



  • 6.  RE: Instant OS and the PMK cache

    Posted Jan 09, 2018 07:36 AM

    The RADIUS server is Microsoft NPS, however, it doesn't matter what it is as if auth-survivability is enabled, the AP controller doesn't pass anything on to the RADIUS server for as long as a matching entry exists in the PMK cache table.

     

    Using the above-mentioned Samsung Galaxy S3, we were able to disable then enable the wireless on the phone, or wait for it to go to sleep and then wake it up and nothing triggered a full 802.1x authentication request until the entry expired from the PMK cache (which we had to wait overnight for as all of the above activities refreshed the PMK timer back to the default eight hours).

     

    It's only when auth-survivability is "no'd" out that the controller once again submits the authentication request to the RADIUS server, but of course this in turn precludes us from realising the performance benefit of PMK/OKC.

     

    If it were possible to reduce the PMK caching period on a per WLAN basis then it wouldn't be necessary to disable authentication survivability at all.

     

    Cheers,

    Lain



  • 7.  RE: Instant OS and the PMK cache

    EMPLOYEE
    Posted Jan 09, 2018 08:14 AM

    I think you are looking for the "inactivity timeout" parameter in the SSID configuration (by default 1000 seconds).  Without doing anything, that timer needs to expire and the client would have to be removed from the client table before a client would do a full reauthentication.

     

    You mentioned that you put a samsung device to sleep.  You should shut it off and make sure it has disappeared from the client table before booting it up again.

     

     



  • 8.  RE: Instant OS and the PMK cache

    Posted Jan 10, 2018 01:02 AM

    The inactivity-timeout value wasn't it. I'd set that to 3,600 (one hour) on Monday but no events were hitting the RADIUS server according to the schedule.

     

     

    Curiously, I restarted the virtual controller last night (Tuesday night) during a quiet period and today I'm seeing an observation of the inactivity timeout where Monday I wasn't.

     

    One thing that "feels" off today is that it now feels like PMC/OKC isn't working where it was on Monday. My problem is I need to do some reading on what debugging commands there are, as I need proof, not gut feelings based on what makes it to the RADIUS logs.

     

    Cheers,

    Lain



  • 9.  RE: Instant OS and the PMK cache

    Posted Jan 10, 2018 01:26 AM

    "show auth-survivability debug-log" seemed to offer some insight, but the log cleared in between commands.

     

    Is there a way to get the log to persist for a while? How does it decide (or where is it configured) when to clear like that?

     

    In any case, I can stare at what I still have on the screen while I wait for more meaningful events to re-populate. With that in mind though, what does the following sample line mean? What is "asap"? A process on the controller or something like that?

     

    Wed Jan 10 14:15:47  handle_asap_mesasge:4214  radiusd(pid 6291) received from asap with MAC xx:xx:xx:xx:xx:xx

    Cheers,

    Lain



  • 10.  RE: Instant OS and the PMK cache

    EMPLOYEE
    Posted Jan 10, 2018 05:52 AM

    If your requirement is that clients periodically reauthenticate, it is probably better to enable re-authentication on your SSID:

    2018-01-10 11_00_26-Instant.png

    Key caching has to do with better roaming and should operate within the session timers that are set with the Reauthentication interval. It is even more flexible if you return during authentication the IETF Session-Timeout (27) attribute which has in seconds the reauthentication period.

     

    When the timer expires, the client will go through re-authentication. I have not seen tweaking cache timers before to satisfy re-authentication requirements as the Session Timeout is the standard way for that.



  • 11.  RE: Instant OS and the PMK cache

    EMPLOYEE
    Posted Jan 10, 2018 06:50 AM

    @Lain Robertson wrote:

    "show auth-survivability debug-log" seemed to offer some insight, but the log cleared in between commands.

     

    Is there a way to get the log to persist for a while? How does it decide (or where is it configured) when to clear like that?

     

    In any case, I can stare at what I still have on the screen while I wait for more meaningful events to re-populate. With that in mind though, what does the following sample line mean? What is "asap"? A process on the controller or something like that?

     

    Wed Jan 10 14:15:47  handle_asap_mesasge:4214  radiusd(pid 6291) received from asap with MAC xx:xx:xx:xx:xx:xx

    Cheers,

    Lain


    Auth survivability requires ClearPass because it sends back an Aruba VSA to make it work.  Auth survivability are not the droids you are looking for:

    http://www.arubanetworks.com/techdocs/Instant_40_Mobile/Advanced/Content/UG_files/Authentication/Authentication%20Survivability.htm

    Screenshot 2018-01-10 at 05.48.31.png

    Like Herman Robers says above, Reauthentication Interval is what forces reauthentication, but it functions different depending on if you are using 802.1x or Captive Portal.