Recovering the Distributed DDNS Data Node database cluster - Adaptive Applications - BlueCat Gateway - 21.3

BlueCat Distributed DDNS Administration Guide

Locale
English (United States)
Product name
BlueCat Gateway
Version
21.3

In the event of a major outage where all nodes within the Distributed DDNS Data Node database cluster crash or go offline, such as a power outage, you must recreate the cluster.

Recovering a single-node database cluster

The following steps outline the procedure for recovering a single-node Distributed DDNS Data Node.
  1. Log in to the console of the BDDS of the Distributed DDNS Data Node.
  2. If the container did not start when the BDDS was started, bring up the container using the following command:
    docker start <data_node_container_name>
  3. Check the log file to confirm whether the node is capable of synchronizing with the cluster using the following information:
    docker exec <node_name> tail -f /var/log/mysql/mariadb.err

    If the node synchronizes with the cluster, no further steps must be taken. If the node does not synchronize with the cluster, proceed to the next step

  4. Stop the database service process by executing the following command:
    docker exec <data_node_container_name> supervisorctl stop mariadb
  5. Check the recovery position by executing the following command:
    docker exec <data_node_container_name> mysqld --defaults-extra-file=/etc/mysql/custom/my.cnf -u mysql --wsrep_recover
  6. Restart the database service process by executing the following command:
    docker exec <data_node_container_name> supervisorctl start mariadb

Recovering a multi-node database cluster

The following steps outline the procedure for recovering a three-node Distributed DDNS Data Node database cluster with nodes node1, node2, and node3.
  1. Log in to the console on each node and review the contents of the /var/lib/docker/volumes/mariadb-data/_data/grastate.dat file to determine which node has the safe_to_bootstrap: 1 value. The following shows the contents of the grastate.dat file for each node.
    node1
    cat /var/lib/docker/volumes/mariadb-data/_data/grastate.dat
    
    # GALERA saved state
    version: 2.1
    uuid: 09c0543a-5b03-11eb-a3a4-47839b49567b
    seqno: -1
    safe_to_bootstrap: 0
    node2
    cat /var/lib/docker/volumes/mariadb-data/_data/grastate.dat
    
    # GALERA saved state
    version: 2.1
    uuid: 09c0543a-5b03-11eb-a3a4-47839b49567b
    seqno: -1
    safe_to_bootstrap: 0
    node3
    cat /var/lib/docker/volumes/mariadb-data/_data/grastate.dat
    
    # GALERA saved state
    version: 2.1
    uuid: 09c0543a-5b03-11eb-a3a4-47839b49567b
    seqno: -1
    safe_to_bootstrap: 1

    In this example, node3 contains the safe_to_bootstrap: 1 value and is the first node of the database cluster to be recovered.

    Note: If all of the nodes contain a safe_to_bootstrap value of 0, run the following command on each node to determine the recovery position of each node:
    docker exec <node_name> mysqld --defaults-extra-file=/etc/mysql/custom/my.cnf -u mysql --wsrep-recover
    To verify the output of the commands, execute the following command on each node:
    docker exec <node_name> tail /var/log/mysql/mariadb.err
    The following shows the example output from this command:
    2021-01-20 10:12:45 0 [Note] WSREP: Recovered position: 09c0543a-5b03-11eb-a3a4-47839b49567b:7255

    Where the recovery position is 7255. The node that contains the highest recover position value is the node that should be recovered first. If all nodes contain the same recovery position value, you can select any of the nodes to recovery first.

    Once you have determined the node to recover first, stop all node using the docker stop <node name> and modify the grastate.dat file on the first node to recover to set the safe_to_bootstrap value to 1. You can do so using the following command:
    sed -i 's/safe_to_bootstrap: 0/safe_to_bootstrap: 1' /var/lib/docker/volumes/mariadb-data/_data/grastate.dat
  2. Log in to the console of node1 and stop the node by running the following command:
    docker stop node1
  3. Log in to the console of node2 and stop the node by running the following command:
    docker stop node2
  4. Log in to the console of node3 and stop the node by running the following command:
    docker stop node3
  5. On node3 of the cluster, edit the wsrep_cluster_address section of the /var/lib/docker/volumes/mariadb-config/_data/custom/my.cnf file so that it does not include the IP addresses of the nodes. The following shows an example of what the wsrep_cluster_address section should appear as:
    wsrep_cluster_address=gcomm://
  6. Restart node3 using the following command:
    docker start node3
  7. Once you have verified that node3 has successfully started, log in to the console of node1 and start the node by running the following command:
    docker start node1
  8. Once you have verified that node1 has successfully started, log in to the console of node2 and start the node by running the following command:
    docker start node2
  9. When all nodes of the database cluster are successfully running, login to node3 and verify whether the wsrep_cluster_address section of the /var/lib/docker/volumes/mariadb-config/_data/custom/my.cnf file contains the IP addresses of all nodes within the cluster. If the cluster successfully synchronized, the entry appears as follows:
    wsrep_cluster_address=gcomm://<node1_IP>,<node2_IP>,<node3_IP>

    If the IP addresses of the other nodes do not appear, edit the wsrep_cluster_address entry so that it includes the IP addresses of the nodes.

  10. Restart node3 using the following command:
    docker restart node3