In the event of a major outage where all nodes within the Distributed DDNS Data Node database cluster crash or go offline, such as a power outage, you must recreate the cluster.
Recovering a single-node database cluster
- Log in to the console of the BDDS of the Distributed DDNS Data Node.
- If the container did not start when the BDDS was started, bring up the
container using the following
command:
docker start distributed_ddns
- Stop the database service process by executing the following
command:
docker exec distributed_ddns supervisorctl stop mariadb
- Recover the database by running the following
command:
docker exec distributed_ddns mysqld -u mysql --wsrep_recover
- Restart the database service process by executing the following
command:
docker exec distributed_ddns supervisorctl start mariadb
Recovering a multi-node database cluster
- Log in to the console on each node and review the contents of the
/var/lib/docker/volumes/mariadb-data/_data/grastate.dat
file to determine which node has the safe_to_bootstrap: 1
value. The following shows the contents of the grastate.dat
file for each
node.node1
cat /var/lib/docker/volumes/mariadb-data/_data/grastate.dat # GALERA saved state version: 2.1 uuid: 09c0543a-5b03-11eb-a3a4-47839b49567b seqno: -1 safe_to_bootstrap: 0
node2cat /var/lib/docker/volumes/mariadb-data/_data/grastate.dat # GALERA saved state version: 2.1 uuid: 09c0543a-5b03-11eb-a3a4-47839b49567b seqno: -1 safe_to_bootstrap: 0
node3cat /var/lib/docker/volumes/mariadb-data/_data/grastate.dat # GALERA saved state version: 2.1 uuid: 09c0543a-5b03-11eb-a3a4-47839b49567b seqno: -1 safe_to_bootstrap: 1
In this example, node3 contains the safe_to_bootstrap: 1 value and is the first node of the database cluster to be recovered.
Note: If all of the nodes contain a safe_to_bootstrap value of 0, run the following command on each node to determine the recovery position of each node:docker exec <node_name> mysqld -u mysql --wsrep-recover
The following shows the example output from this command:2021-01-20 10:12:45 0 [Note] WSREP: Recovered position: 09c0543a-5b03-11eb-a3a4-47839b49567b:7255
Where the recovery position is 7255. The node that contains the highest recover position value is the node that should be recovered first. If all nodes contain the same recovery position value, you can select any of the nodes to recovery first.
- Log in to the console of node1 and stop the node by running the
following command:
docker stop node1
- Log in to the console of node2 and stop the node by running the
following command:
docker stop node2
- Log in to the console of node3 and stop the node by running the
following command:
docker stop node3
- On node3 of the cluster, edit the
wsrep_cluster_address section of the
/var/lib/docker/volumes/mariadb-config/_data/my.cnf
file so that it does not include the IP addresses of the nodes. The
following shows an example of what the
wsrep_cluster_address section should appear
as:
wsrep_cluster_address=gcomm://
- Restart node3 using the following
command:
docker start node3
- Once you have verified that node3 has successfully started, log in to
the console of node1 and start the node by running the following
command:
docker start node1
- Once you have verified that node1 has successfully started, log in to
the console of node2 and start the node by running the following
command:
docker start node2
- When all nodes of the database cluster are successfully running, login to
node3 and edit the wsrep_cluster_address section
of the
/var/lib/docker/volumes/mariadb-config/_data/my.cnf
file so that it includes the IP addresses of the nodes. The following shows
an example of what the wsrep_cluster_address section should
appear
as:
wsrep_cluster_address=gcomm://<node1_IP>,<node2_IP>,<node3_IP>
- Restart node3 using the following
command:
docker restart node3