We have multiple sites with a similiar issue, they all have physical controllers on site that talk to a virtual conductor at our data center. Every 4-6 weeks, I will be notified that users cannot connect to the corporate wifi which is using credentials for login. Every time this occurs, I see in the Conductor that the controllers are marked as "down". I then go to our firewall on site and clear the tunnel session on port 4500 between the controller/conductor and almost immediately the Conductor shows the site controllers online and users can then connect.
It is almost as if the tunnel session goes stale and/or needs refreshed more frequently, do you know if there are any settings I can adjust to try and resolve this? We run Palo Altos at all sites and I have not found any configuration difference between the three that keep having this issue other than they are further away from the Conductor geographically. This will correct the issue for another 4+ plus weeks....
Below is our cli ipsec settings
Crypto Map Template"default-local-conductor-ipsecmap" 9999
IKE Version: 2
IKEv2 Policy: 10014
Security association lifetime seconds : [300 -86400]
Security association lifetime kilobytes: N/A
PFS (Y/N): N
Transform sets={ default-ml-transform }
Peer gateway: 10.x.x.x
Monitor IP: 0.0.0.0
Interface: VLAN 110
Source network: 10.x.x.x/255.255.255.255
Destination network: 10.x.x.x/255.255.255.255
Pre-Connect (Y/N): Y
Client NAT mode (Y/N): N
Tunnel Trusted (Y/N): Y
Forced NAT-T (Y/N): Y
Uplink Failover (Y/N): N
Force-Tunnel-Mode (Y/N): N
Uplink LoadBalance (Y/N): N
IP Compression (Y/N): Y
DPD counters req_initd:0 req_resent:0 reply_recvd:0 peer_dead:0
DPD counters req_recvd:0 reply_sent:0
XCHG counters peer dead:0
CFG_SET Initiate Sent/Retry-NoACK/Retry-NoVLAN/Ack-Recvd= 0/0/0/0
CFG_SET Responder Recvd/Ack-sent= 0/0
Tunnel status IPSEC: UP IKE: UP