Security

last person joined: yesterday 

Enterprise security using ClearPass Policy Management, ClearPass Security Exchange, IntroSpect, VIA, 360 Security Exchange, Extensions and Policy Enforcement Firewall (PEF).
Expand all | Collapse all

clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]

Jump to Best Answer
  • 1.  clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]

    Posted May 30, 2016 02:19 PM

    Hi Everybody,

     

    Has anyone experienced a ClearPass Publisher server crashing after a few hours of operations with the following messages displayed at console:

    clearpass-bug-soft-lockup.jpg

    The server has been reinstalled to expand disk capacity (it´s a clean deployment of CP-VA-25k, version 6.5.6, followed by a restore, under ESXi 6.0 update 2). The server stops responding in such a way that the only possible recovery is to power off the ClearPass VM in vCenter and power on again!

    This weekend we had 4 lockups, and after the 2nd, I updated vmware tools to the most recent version, but the server continnued to crash.

    Any ideas?

    Thanks,

     

    Heraldo.



  • 2.  RE: clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]

    Posted May 30, 2016 02:20 PM
    It's best to open a TAC case for this.


  • 3.  RE: clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]

    Posted May 30, 2016 02:24 PM

    Hi Cappalli,

    Thanks for the response!

     



  • 4.  RE: clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]

    Posted May 30, 2016 05:03 PM
    What resources have been made available to the Clearpass VM? I had a similar issue in my lab when my estimate server was way oversubscribed.


  • 5.  RE: clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]

    Posted May 30, 2016 10:01 PM

    Hi Jrwhitehead,

    Thanks for the response.

    Clearpass VM was deployed with recommended settings (64GB ram, 2.18TB disk space, 12 virtual processors - our dell server has two 6 cores processors).

    This weekend we had 4 crashes. What I did was after the 2nd crash was update vmware tools to the most recent version, but we still had 2 others crashes yesterday. This morning, before I restart the server, I also updated vmware compatibility to ESXi 6.0 or later (VM version 11). After the OVF deployment, compatibility was ESXi 5.0 or later (VM versoin 8). Maybe this update made some difference because so far the server is up and running as expected, no crashes or messages on the console since the reboot this morning. Fingers crossed to be only this!

    You said your lab server was oversubscribed... What exactly was oversubscribed?

    Thanks,

     



  • 6.  RE: clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]

    Posted May 31, 2016 03:10 AM

    You pretty sure have issues with your underlying hardware. These soft lockups indicate that the hardware cannot handle the load. It may be that you are running other VMs on the same hardware.

     

    Check this: http://ubuntuforums.org/showthread.php?t=2205211 (solution was replacing the power supply of the computer)

    Or this: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009996 (I don't agree with the workaround as if you see these soft lockups performance of your ClearPass will be very poor)

    Or this: http://unix.stackexchange.com/questions/70377/bug-soft-lockup-cpu-stuck-for-x-seconds

     

    From the information that I have now, I would thoroughly check the hardware and the VMWare ESXi information. May be ESXi gives you warnings/errors... might any of your harddisks be bad (resulting in hughe disk io delays)? After you validated that the hardware does meet the ClearPass system requirements, can you replace hardware components? Like harddisk, power supply, whole server?

     

    And yes, open a TAC case as well in parallel... however the messages come from the ClearPass kernel (which is the component closest to the hardware) and indicate issues/delays with the hardware it is installed on.



  • 7.  RE: clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]

    Posted May 31, 2016 01:37 PM

    Hi Herman,

    Thanks for the response and information!

    Our ClearPass VM is running on  a dedicated server. No other VM´s on the same server. We have beem using this server for a long time to run ClearPass VM. Last week we expanded the disk capacity of this server and to do this we redeployed ClearPass OVF file, updated to 6.5.6 and restored the databases. The disks on this server are all new now. Before this disk expansion, we´ve never seen this issue. It started after ClearPass reinstallation. We also upgraded ESXi from 5.5 to 6.0 update 2, and maybe I think the issue has something to do with VM hardware compatibility. Just after redeployment, ClearPass VM compatibility was ESXi 5.0 or later (VM version 8). Yesterday morning, we updated VM compatibility  to ESXi 6.0 or later (VM version 11) and since the last reboot after this VM compatibility update, ClearPass is up and running, no crashes or soft lockup messages on the console.

    If the messages were coming from ClearPass kernel, do you think that upgrading VM compatibility could have solved the issue?

    Thanks again!

     



  • 8.  RE: clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]

    Posted Jun 02, 2016 10:40 AM

    Not sure if the upgrades solved the issue, but it can be well possible. ClearPass includes the VM tools that communicate between ESXi and the virtual machine. Both the virtual hardware as the VM tools interface change during an ESXi upgrade. I have ran ClearPass on ESXi 5.5 for long time, never seen it; I consider it more likely that the hardware you run ESXi on is better supported int ESXi 6.0; and that may be a good reason that the issue was resolved.

     

    Good to hear that you were able to fix this issue.



  • 9.  RE: clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]

    Posted Jun 02, 2016 11:42 AM

    Hi Herman,

    Thanks for the response.

    I oppened a case and support recommended to "redeploy the servers with correct memory specification, as recommended in the user guide", but the servers have already been redeployed the correct memory specification.

    I think the issue may be solved with vmware tools end VM compabibility upgrade, but other hardware related things mey also bring the issue back.

    Our ClearPass servers have 2 processor sockets with 6 cores each. That gives us 12 physical cores. The servers also have Hyper Threading enable, whick makes the 12 physical cores present 24 logical processors to ESXi, right?

    Our ClearPass VMs were configured with 12 vCPU´s, mapping 1 vCPU-->1 physical core. Yesterday, I changed the configuration to 24 vCPU (1 vCPU-->1 Logical processor). The server stayed up and running the whole day and just after the nightly automated backup and cleanup routine it  crashed again!!!

    I changed the configuration back to 12 vCPU and rebooted the server.

    So, I guess that the correct vCPU configuration is to map one vCPU to one physical core. Does that make sense?

    Thank ypu again!



  • 10.  RE: clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]

    Posted May 31, 2016 05:05 AM

    My CPUs & RAM was oversubscribed... HP Microservers are great for size and noise levels but sadly only support 16GB of RAM.. Anyway.. I'm interested to hear what TAC say.

     

    Is Airwave supported on VM version 11?



  • 11.  RE: clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]

    Posted May 31, 2016 05:38 AM

    I have not seen issues with Airwave on VMware ESXi 6.0 and virtual hardware version 11.

     

    In case you need a definitive answer, please ask Aruba TAC.



  • 12.  RE: clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]

    Posted Jun 08, 2016 11:26 AM

    Hi,

    Just coming back to update on this issue.

    After a few changes in the ClearPass VM config, the server has been up and running for the last 3 days without crashing!

    When we first redeployed the OVF file to create the ClearPass VM, the vCPU configuration was set to 24 by default. We upgraded the ESXi from 5.5 to 6.0. Just after deployment, vmware tools was outdated and VM hadware compatibility was ESXi 5.0 or later (VM version 8).

    But our ClearPass servers have 2 processor sockets with 6 cores each, which gives us 12 physical cores. The servers also have Hyper Threading enable, whick makes the 12 physical cores look like 24 logical processors to ESXi.

    With vCPU set to 24, the servers crashes. It also crashes changing vCPU to 12 but leaving vmware tools and hardware compatibility outdated.

    It seems that the only way the servers does not crach it to map one vCPU to one physical core, and upgrading vmware tools and hardware compatibility to the latest versions (ESXi 6.0 or later VM version 11), if you are using ESXi 6.0.

    And, in my case,  the vCPU setting has to be 12 virtual processors, with 1 core per socket, that is 12 virtual sockets, even if the physical host has 2 sockets with 6 cores each!

    vCPU-config-Clearpass.jpg

    Setting vCPU to 12 with 6 cores per socket also resulted in crashing the server.

    Thanks for all replies!



  • 13.  RE: clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]

    Posted Jun 12, 2016 07:04 PM

    Hi,

    Coming back again to update on this issue.

    After 7 days up and running, the server crashed again!!! Unfortunatily, the vCPU configuration I posted before seems to had no effect in solving the issue. Very disappointed... We have an open case with TAC, they are analysing system logs, but no solution till now.

     



  • 14.  RE: clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]
    Best Answer

    Posted Aug 10, 2016 02:03 PM

    Hi,

    After a long time dealing with this issue, finally solved it.

    The problem was that ESXi 6.0 was not able to server CPPM after a few days of operation due to memory configuration/allocation.

    The ESXi host has 64GB of physical memory and CPPM VM was configured with all 64GB. After some time of operation, ESXi started to claim memory using the ballooning technique and this was causing the bug soft lockup error messages on the console and the crashes.

    We are in process of expanding the ESXi host physical memory and in the mean time solution was to configure CPPM VM with 60GB of memory, leaving some free memory to ESXi.



  • 15.  RE: clearpass server crashing - BUG: SOFT LOCKUP - CPU#2 STUCK FOR 24s! [policy_server:16529]

    Posted Aug 10, 2016 03:52 PM

    Thanks for letting us know. Sounds like a very nasty one to find.