In the event of a major outage where all nodes within the Distributed DDNS Data Node database cluster crash or go offline, such as a power outage, you must recreate the cluster.
Recovering a single-node database cluster
- Log in to the console of the BDDS of the Distributed DDNS Data Node.
- If the container did not start when the BDDS was started, bring up the
container using the following
command:
docker start <data_node_container_name>
- Check the log file to confirm whether the node is capable of synchronizing
with the cluster using the following
information:
docker exec <node_name> tail -f /var/log/mysql/mariadb.err
If the node synchronizes with the cluster, no further steps must be taken. If the node does not synchronize with the cluster, proceed to the next step
- Stop the database service process by executing the following
command:
docker exec <data_node_container_name> supervisorctl stop mariadb
- Check the recovery position by executing the following
command:
docker exec <data_node_container_name> mysqld --defaults-extra-file=/etc/mysql/custom/my.cnf -u mysql --wsrep_recover
- Restart the database service process by executing the following
command:
docker exec <data_node_container_name> supervisorctl start mariadb
Recovering a multi-node database cluster
- Log in to the console on each node and review the contents of the
/var/lib/docker/volumes/mariadb-data/_data/grastate.dat
file to determine which node has the safe_to_bootstrap: 1
value. The following shows the contents of the
grastate.dat
file for each node.node1cat /var/lib/docker/volumes/mariadb-data/_data/grastate.dat # GALERA saved state version: 2.1 uuid: 09c0543a-5b03-11eb-a3a4-47839b49567b seqno: -1 safe_to_bootstrap: 0
node2cat /var/lib/docker/volumes/mariadb-data/_data/grastate.dat # GALERA saved state version: 2.1 uuid: 09c0543a-5b03-11eb-a3a4-47839b49567b seqno: -1 safe_to_bootstrap: 0
node3cat /var/lib/docker/volumes/mariadb-data/_data/grastate.dat # GALERA saved state version: 2.1 uuid: 09c0543a-5b03-11eb-a3a4-47839b49567b seqno: -1 safe_to_bootstrap: 1
In this example, node3 contains the
safe_to_bootstrap: 1
value and is the first node of the database cluster to be recovered.Note: If all of the nodes contain asafe_to_bootstrap
value of0
, run the following command on each node to determine the recovery position of each node:docker exec <node_name> mysqld --defaults-extra-file=/etc/mysql/custom/my.cnf -u mysql --wsrep-recover
To verify the output of the commands, execute the following command on each node:docker exec <node_name> tail /var/log/mysql/mariadb.err
The following shows the example output from this command:2021-01-20 10:12:45 0 [Note] WSREP: Recovered position: 09c0543a-5b03-11eb-a3a4-47839b49567b:7255
Where the recovery position is 7255. The node that contains the highest recover position value is the node that should be recovered first. If all nodes contain the same recovery position value, you can select any of the nodes to recovery first.
Once you have determined the node to recover first, stop all node using thedocker stop <node name>
and modify the grastate.dat file on the first node to recover to set thesafe_to_bootstrap
value to1
. You can do so using the following command:sed -i 's/safe_to_bootstrap: 0/safe_to_bootstrap: 1' /var/lib/docker/volumes/mariadb-data/_data/grastate.dat
- Log in to the console of node1 and stop the node by running the
following command:
docker stop node1
- Log in to the console of node2 and stop the node by running the
following command:
docker stop node2
- Log in to the console of node3 and stop the node by running the
following command:
docker stop node3
- On node3 of the cluster, edit the
wsrep_cluster_address
section of the /var/lib/docker/volumes/mariadb-config/_data/custom/my.cnf file so that it does not include the IP addresses of the nodes. The following shows an example of what thewsrep_cluster_address
section should appear as:wsrep_cluster_address=gcomm://
- Restart node3 using the following
command:
docker start node3
- Once you have verified that node3 has successfully started, log in to
the console of node1 and start the node by running the following
command:
docker start node1
- Once you have verified that node1 has successfully started, log in to
the console of node2 and start the node by running the following
command:
docker start node2
- When all nodes of the database cluster are successfully running, login to
node3 and verify whether the
wsrep_cluster_address
section of the /var/lib/docker/volumes/mariadb-config/_data/custom/my.cnf file contains the IP addresses of all nodes within the cluster. If the cluster successfully synchronized, the entry appears as follows:wsrep_cluster_address=gcomm://<node1_IP>,<node2_IP>,<node3_IP>
If the IP addresses of the other nodes do not appear, edit the
wsrep_cluster_address
entry so that it includes the IP addresses of the nodes. - Restart node3 using the following
command:
docker restart node3