Product and Software: This article applies to all MMS versions that support HA configuration.
In a twin MMS system configured for High Availability (HA), both systems should have a copy of the same Solid database. The master copy is on the active server and the standby should hold an exact copy or at least this copy is synched every few minutes. The copy on the standby server may become corrupted for a number of reasons, like isolation from the active server or crash on the standby server. Typically, recovery from such a situation would be the same as recovery from a network-induced HA switch over.
To recover an HA system from corruption, you must:
- Check for database corruption.
- Recreate the database on the standby server.
Check for Database Corruption
Connect to the server using ssh and run the following commands:
1) ls - l /opt/aruba/solid/data/solid.db -rw------- 1 root root 2437152768 Jan 31 14:16 /opt/aruba/solid/data/solid.db
Check timestamp and size. They should be similar.
2) more /opt/aruba/solid/data/solerror.out
Check for messages similar to this "Fri Jan 25 11:24:55 2008 Database is a broken HSB copy or netcopy". This indicates that the copy from the active never succeeded.
3) tail -f /var/log/messages
Check for errors.
4) ps aux|grep stunnel root 29382 0.0 0.0 4496 2020 ? Ss 18:57 0:00 /usr/sbin/stunnel /etc/stunneld.conf . root 30182 0.0 0.0 5272 656 pts/1 S+ 19:07 0:00 grep stunnel
During initial copy of the database, only one instance will be seen on the standby server. However the active server should have two.
5) ps aux |grep stunnel root 2328 0.0 0.0 4948 1732 ? Ss 15:34 0:00 /usr/sbin/stunnel /etc/stunneld.conf . root 5055 0.2 0.0 5352 2028 ? Ss 16:22 0:00 /usr/sbin/stunnel /opt/aruba/conf/stunnelc.conf root 5276 0.0 0.0 3896 632 pts/0 R+ 16:23 0:00 grep stunnel
If the standby database file /opt/aruba/solid/data/solid.db appears stale, then this could point to a database corruption. To recover, first disable the HA configuration from the admin server in the MMS web interface. After it is unconfigured on the active server, this condition should replicate itself to the standby.
6) Check the status of the database. Ensure that active server returns to a STANDALONE state.
/opt/aruba/solid/ solsql -e "admin command'hsb state'" dba dba Solid SQL Editor (teletype) v.04.50.0133 (C) Copyright Solid Information Technology Ltd 1993-2007 Connected to default server. RC TEXT -- ---- 0 STANDALONE => db state on the active server. 1 rows fetched. SOLID SQL Editor exiting. There are five states - PRIMARY ALONE, PRIMARY ACTIVE, SECONDARY ALONE, SECONDARY ACTIVE, STANDALONE.
Recreate the Database on the Standby Server
Second, you must recreate the entire database on the standby server. As the database recreation is destructive, it is advised to back up /opt/aruba/solid/data/*.out. These could be useful later.
The commands needed to recreate a new database are:
- /etc/init.d/mmgr stop: stops the MMS system
- /opt/aruba/solid/solid.sh wipe: removes the /opt/aruba/solid/data directory and all its contents
- /etc/init.d/mmgr start: restarts the MMS processes. Part of this script will start the solid database and create a new empty database.
When both servers are up and stable, reconfigure the HA configuration on the active server. Wait for the HA configuration to stabilize. This may be checked in the usual way.