My question, to put it innocently: How is DNS hijacking (for the captive portal certificate CN) in a Instant AP supposed to work?
To make a long story... well... long - the scenario is this:
- IAP cluster with IAP-VPN configured
- Some vlans are locally bridged
- Some vlans are fully tunneled (declared in centralized,L2 DHCP scopes)
- Custom captive portal certificate, replacing the original securelogin.arubanetworks.com
- One of the tunneled vlans is used in a guest network
- Typical guest scenario, with ClearPass self-registration page, redirection to the "securelogin" hostname, role change
- Enterprise domains set to "*"
- Recent(ish) version: 8.7.1.7
A few days ago we noticed that the redirection after authentication was not working properly, and I managed to pinpoint the problem: the IAP was not hijacking the DNS request for the "securelogin" hostname, and the client was receiving the DNS response from the server, which is simply "no such name".
Disconnecting and connecting again works at that point, because there is MAC caching: the client goes directly to the authenticated role. But that's kind of cheating, right?
I noticed that the same issue happens in a few more networks (which don't really need it to work, but I tested them just for argument's sake) - but not all of them. I couldn't find any pattern that would explain why DNS hijacking works on some and not on others.
Today, after opening a TAC case, the bloody DNS hijack started working again in the guest network, with no configuration changes that I can relate in any way to DNS behavior! The TAC engineer was as baffled as me.
But still, hijacking does not work on all conditions. I configured a WLAN with simple WPA2-Personal encryption, to put a client in the same vlan as an unauthenticated guest. And DNS hijacking does not work there, so the behavior is not related to the vlan.
So, to summarize, DNS hijack does *not* depend on:
Client DNS request format: The same client, making the same request, shows different behaviors in different WLANs.
vlan: The same client in different WLANs, but same vlan, shows different behaviors.
Topology: The example above also hints at this, because same vlan => same topology. Same for bridged networks - in one WLAN it works, in the next it doesn't.
DNS server address: Same logic as, because same vlan => same DHCP pool => same DNS configuration, and different results depending on whether the client is connected to one WLAN or another.
Any ideas? Has anyone in the community gone through the same issues?
BR,
Mike
------------------------------
Miguel Goncalves
------------------------------