How Availability groups work - Platform - BlueCat Gateway - 22.11.1

Gateway Administration Guide

Locale
English
Product name
BlueCat Gateway
Version
22.11.1

An Availability group has a fully-qualified domain name (FQDN) and requires a DNS server to work. When using Availability groups, all Gateway user requests must be sent through this FQDN (that is, not to a specific IP address). Initially, the DNS host record for that FQDN points to the Primary instance of Gateway. (Gateway does this for you when you set up the Availability group.) All network requests for that FQDN are passed to and handled by the Primary instance.

The Primary and Secondary nodes in the Availability group maintain communications with each other to let its partner know whether the Primary is still active. The process works as follows:

  • At a defined heartbeat interval, the Primary Gateway node sends a Health Report (or "heartbeat") to the Secondary instance, directly via HTTP/HTTPS. The Primary node also stores a timestamp for the most recent heartbeat in a TXT record for the FQDN. (A TXT record is a text record attached to a DNS host record. TXT records can hold any string data.)

    By default, the Heartbeat interval is 20 seconds.

  • At a separate check interval, the Secondary (or standby) node checks whether it has received a heartbeat within the past interval period.

    If it has not received a heartbeat, the Secondary node checks the FQDN's TXT record to read the timestamp of the last heartbeat. (This addresses cases where HTTP/HTTPS communications between the Primary and Secondary nodes don't work, but the nodes can still access the DNS Server.)

    By default, the Standby check interval is 30 seconds.

  • If the amount of time since the last heartbeat was received exceeds a separately defined failover period, the Secondary node considers the Primary node to have failed.

    The Secondary node updates the host record for the FQDN to point to itself, designating itself as the Primary and the original Primary as the Secondary. The standby Gateway instance will now handle DNS and network requests for that FQDN, in place of the original Primary node.

    If the original Primary instance recovers, it checks the TXT record and sees that it is no longer the Primary node. It then assigns itself the role of Secondary node.

    Note: There is no automated mechanism to restore the original Primary and Secondary roles for each instance. If you want to do so, you can manually assign them in the Availability Groups configuration page.

    By default, the Failover period is one minute (60 seconds).

During this process, there will be a delay of 1-2 minutes (depending on the length of the failover period) during which Gateway will remain inaccessible. After the Secondary node starts, Gateway users might also notice a change in performance depending on the capabilities of each instance.