Has anyone experienced a ClearPass Publisher server crashing after a few hours of operations with the following messages displayed at console:
The server has been reinstalled to expand disk capacity (it´s a clean deployment of CP-VA-25k, version 6.5.6, followed by a restore, under ESXi 6.0 update 2). The server stops responding in such a way that the only possible recovery is to power off the ClearPass VM in vCenter and power on again!
This weekend we had 4 lockups, and after the 2nd, I updated vmware tools to the most recent version, but the server continnued to crash.
Thanks for the response!
Thanks for the response.
Clearpass VM was deployed with recommended settings (64GB ram, 2.18TB disk space, 12 virtual processors - our dell server has two 6 cores processors).
This weekend we had 4 crashes. What I did was after the 2nd crash was update vmware tools to the most recent version, but we still had 2 others crashes yesterday. This morning, before I restart the server, I also updated vmware compatibility to ESXi 6.0 or later (VM version 11). After the OVF deployment, compatibility was ESXi 5.0 or later (VM versoin 8). Maybe this update made some difference because so far the server is up and running as expected, no crashes or messages on the console since the reboot this morning. Fingers crossed to be only this!
You said your lab server was oversubscribed... What exactly was oversubscribed?
You pretty sure have issues with your underlying hardware. These soft lockups indicate that the hardware cannot handle the load. It may be that you are running other VMs on the same hardware.
Check this: http://ubuntuforums.org/showthread.php?t=2205211 (solution was replacing the power supply of the computer)
Or this: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009996 (I don't agree with the workaround as if you see these soft lockups performance of your ClearPass will be very poor)
Or this: http://unix.stackexchange.com/questions/70377/bug-soft-lockup-cpu-stuck-for-x-seconds
From the information that I have now, I would thoroughly check the hardware and the VMWare ESXi information. May be ESXi gives you warnings/errors... might any of your harddisks be bad (resulting in hughe disk io delays)? After you validated that the hardware does meet the ClearPass system requirements, can you replace hardware components? Like harddisk, power supply, whole server?
And yes, open a TAC case as well in parallel... however the messages come from the ClearPass kernel (which is the component closest to the hardware) and indicate issues/delays with the hardware it is installed on.
Thanks for the response and information!
Our ClearPass VM is running on a dedicated server. No other VM´s on the same server. We have beem using this server for a long time to run ClearPass VM. Last week we expanded the disk capacity of this server and to do this we redeployed ClearPass OVF file, updated to 6.5.6 and restored the databases. The disks on this server are all new now. Before this disk expansion, we´ve never seen this issue. It started after ClearPass reinstallation. We also upgraded ESXi from 5.5 to 6.0 update 2, and maybe I think the issue has something to do with VM hardware compatibility. Just after redeployment, ClearPass VM compatibility was ESXi 5.0 or later (VM version 8). Yesterday morning, we updated VM compatibility to ESXi 6.0 or later (VM version 11) and since the last reboot after this VM compatibility update, ClearPass is up and running, no crashes or soft lockup messages on the console.
If the messages were coming from ClearPass kernel, do you think that upgrading VM compatibility could have solved the issue?
Not sure if the upgrades solved the issue, but it can be well possible. ClearPass includes the VM tools that communicate between ESXi and the virtual machine. Both the virtual hardware as the VM tools interface change during an ESXi upgrade. I have ran ClearPass on ESXi 5.5 for long time, never seen it; I consider it more likely that the hardware you run ESXi on is better supported int ESXi 6.0; and that may be a good reason that the issue was resolved.
Good to hear that you were able to fix this issue.
I oppened a case and support recommended to "redeploy the servers with correct memory specification, as recommended in the user guide", but the servers have already been redeployed the correct memory specification.
I think the issue may be solved with vmware tools end VM compabibility upgrade, but other hardware related things mey also bring the issue back.
Our ClearPass servers have 2 processor sockets with 6 cores each. That gives us 12 physical cores. The servers also have Hyper Threading enable, whick makes the 12 physical cores present 24 logical processors to ESXi, right?
Our ClearPass VMs were configured with 12 vCPU´s, mapping 1 vCPU-->1 physical core. Yesterday, I changed the configuration to 24 vCPU (1 vCPU-->1 Logical processor). The server stayed up and running the whole day and just after the nightly automated backup and cleanup routine it crashed again!!!
I changed the configuration back to 12 vCPU and rebooted the server.
So, I guess that the correct vCPU configuration is to map one vCPU to one physical core. Does that make sense?
Thank ypu again!
My CPUs & RAM was oversubscribed... HP Microservers are great for size and noise levels but sadly only support 16GB of RAM.. Anyway.. I'm interested to hear what TAC say.
Is Airwave supported on VM version 11?
I have not seen issues with Airwave on VMware ESXi 6.0 and virtual hardware version 11.
In case you need a definitive answer, please ask Aruba TAC.
Just coming back to update on this issue.
After a few changes in the ClearPass VM config, the server has been up and running for the last 3 days without crashing!
When we first redeployed the OVF file to create the ClearPass VM, the vCPU configuration was set to 24 by default. We upgraded the ESXi from 5.5 to 6.0. Just after deployment, vmware tools was outdated and VM hadware compatibility was ESXi 5.0 or later (VM version 8).
But our ClearPass servers have 2 processor sockets with 6 cores each, which gives us 12 physical cores. The servers also have Hyper Threading enable, whick makes the 12 physical cores look like 24 logical processors to ESXi.
With vCPU set to 24, the servers crashes. It also crashes changing vCPU to 12 but leaving vmware tools and hardware compatibility outdated.
It seems that the only way the servers does not crach it to map one vCPU to one physical core, and upgrading vmware tools and hardware compatibility to the latest versions (ESXi 6.0 or later VM version 11), if you are using ESXi 6.0.
And, in my case, the vCPU setting has to be 12 virtual processors, with 1 core per socket, that is 12 virtual sockets, even if the physical host has 2 sockets with 6 cores each!
Setting vCPU to 12 with 6 cores per socket also resulted in crashing the server.
Thanks for all replies!
Coming back again to update on this issue.
After 7 days up and running, the server crashed again!!! Unfortunatily, the vCPU configuration I posted before seems to had no effect in solving the issue. Very disappointed... We have an open case with TAC, they are analysing system logs, but no solution till now.
After a long time dealing with this issue, finally solved it.
The problem was that ESXi 6.0 was not able to server CPPM after a few days of operation due to memory configuration/allocation.
The ESXi host has 64GB of physical memory and CPPM VM was configured with all 64GB. After some time of operation, ESXi started to claim memory using the ballooning technique and this was causing the bug soft lockup error messages on the console and the crashes.
We are in process of expanding the ESXi host physical memory and in the mean time solution was to configure CPPM VM with 60GB of memory, leaving some free memory to ESXi.
Thanks for letting us know. Sounds like a very nasty one to find.
At Aruba, we believe that the most dynamic customer experiences happen at the Edge. Our mission is to deliver innovative solutions that harness data at the Edge to drive powerful business outcomes.
© Copyright 2020 Hewlett Packard Enterprise Development LPAll Rights Reserved.