Restarting the database cluster when all clusters fail - Adaptive Applications - BlueCat Gateway - 23.2.3

BlueCat Distributed DDNS Administration Guide

Locale
English
Product name
BlueCat Gateway
Version
23.2.3

The following procedure describes how to bootstrap (restart) a Distributed DDNS database cluster when all nodes in the cluster stop functioning.

First, on each DDNS node, do the following:

  • View the contents of the file /var/lib/docker/volumes/mariadb-data/_data/grastate.dat as text. Locate the node with the file with the the safe_to_bootstrap variable set to 1.

Distributed DDNS Nodes write their last executed state in this file. If the cluster was able to fail gracefully, it will set this variable to 1 in the file of the node that you can safely start with when bootstrapping the cluster. It is possible that no node will have this variable set to 1.

Restarting from the "safe to bootstrap" node

  1. On the node where the grastate.dat file has safe_to_boostrap equal to 1, restart the container with the standard docker command:

    docker start <Node container name>

    Where <Node container name> is the name of the node's container.

    Note: Make sure you wait for the node to completely finish its startup.
  2. After the first node starts successfully, start the other nodes one by one with the same start command.

    Make sure you wait for each node to fully start and sync with the rest of the cluster before starting another node.

Restarting from the node with the most advanced state

Use the following procedure when all nodes have the following values in their grastate.dat files:

  • safe_to_bootstrap is 0
  • seqno is -1

To restart the cluster in this case:

  1. You will need to run commands within each database node's container, so restart any stopped nodes with the standard docker command.
    docker start <Node container name>

    Where <Node container name> is the name of the node's container.

    Here, the order of nodes does not matter.

  2. After all nodes are started, make sure the MariaDB process on each node is stopped. To do so, on each and every data node, run the following command:

    docker exec -it <node-container-name> supervisorctl stop mariadb
  3. After stopping the MariaDB process on all nodes in the cluster, determine the recovered position of each node and make note of it. To do so, run the following command on each node:

    docker exec -it <node-container-name> mysqld --defaults-extra-file=/etc/mysql/custom/my.cnf -u mysql --wsrep-recover

    Look for the following log entry in the output and write down the number after the colon at the end.

    Recovered position: 00000000-0000-0000-0000-000000000000:36
  4. Review your list to determine the node with the highest number. On that node, do the following:
    1. Manually edit the file /var/lib/docker/volumes/mariadb-data/_data/grastate.dat to set the value of safe_to_bootstrap to 1.

      Save the file when you're done.

    2. Manually edit the file /var/lib/docker/volumes/mariadb-config/_data/custom/my.cnf to set the value of wsrep_cluster_address to gcomm://. That is, remove all IP addresses from the entry, leaving only the gcomm:// prefix.

      Save the file when you're done.

    3. Restart the database container on this node with the standard restart command:
      docker restart <Node container name> 

      After this first node finishes its startup process, you will have a new 1-node cluster.

  5. After the first node starts successfully, restart the other nodes one by one with the restart command:

    docker restart <Node container name>

    Make sure you wait for each node to fully start and sync with the rest of the cluster before restarting another node.

    Note: You must use the restart command (not the start command) so that nodes first completely stop before starting again.