Service point health check for Anycast - BlueCat Edge - Service Point v3.x.x

BlueCat Edge Deployment Guide

Locale
English
Product name
BlueCat Edge
Version
Service Point v3.x.x

This topic gives an overview of how BlueCat Edge service points configured for Anycast periodically self-assess their health and fitness for participating in Anycast. This topic assumes that at least one daemon is running.

When's the health check run?

The health check self-evaluation is run with periodically, with the time interval calculated as the minimum between the following intervals:
Daemon Timers Calculations
OSPFD Hello and Dead
  • 2 x hello interval
  • hello interval + 2 x dead interval
  • 30 seconds
BGP Keepalive and Hold
  • keepalive interval
  • keepalive interval + 2 x hold interval
  • 30 seconds

The calculated health check period can be verified in the output of the spDiagnostics API, in the "routing-controller-service" section, the value in ms for the dnsHealthMonitorInterval field.

For more information, see "Service Point diagnostic and health APIs" in the BlueCat Edge API Guide.

What does the health check evaluate?

The service point queries the version.bind against all its configured remote naming servers (RNS) services: the services representing the DNS resolvers, one RNS per namespace, a service point can have up to three running RNS services; when one of the configured RNS isn't responding to the probe query, up to four more rapid-fire health checks are issued.
  • If, in any of this subsequent attempts, all RNS respond successfully, then the service point maintains its Anycast configuration and resumes its regular period of checking its health.
  • If, in each of the attempts, at least one of the configured RNS services are failing to resolve the probe query, then the service point removes itself from the Anycast pool.

The removal is done by suspending the running daemons, and sending a message to the neighboring devices about going down so they can remove the Anycast route to itself.

This is reflected in the output of the spDiagnostics API "routing-controller-service" section, which will return suspended as the status of any previously enabled daemons.

A suspended state is also reflected in the output of the Anycast configuration utility show_daemons command showing no attached to all previously enabled daemons.

In the suspended state, the service point continues to periodically self-assess its health with the same health check period. When the health check passes, the daemons will be re-enabled, and the service point will be reachable by the Anycast IP.

Verifying which service point is responding to queries

To verify which service point is responding when querying the Anycast IP, run a chaos query against both the service IP of each service point that's configured to participate in the Anycast, including against the Anycast IP.

Comparing the response retrieved can inform administrators of what's the targeted service point within an Anycast pool.

For example, issuing multiple chaos queries against the Anycast IP shows that the query is always reaching the same service point, the one identified with the "1aa87e837b20" response.

$ dig @169.254.239.9 hostname.bind TXT chaos
; <<>> DiG 9.11.3-1ubuntu1.1-Ubuntu <<>> @169.254.239.9 hostname.bind TXT chaos
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63098
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; ANSWER SECTION:
hostname.bind. 0 CH TXT "1aa87e837b20"
...output removed
In this example, there are two service points configured in Anycast, but the fact that the response to the chaos query is always the same indicates that one of the following situations is occurring:
  • One service point is preferred over the other, and it's determined by the routers on the way that the most efficient path is always the same service point.
  • The other service point is misconfigured for Anycast and isn't even an option.

To determine which service point isn't elected to resolve the query, run the same chaos query against the actual service IP for each service point in the pool; the IP set on the ens192 interface.

$ dig @169.254.231.10 hostname.bind TXT chaos +short
"accbef482963"
$ dig @169.254.230.11 hostname.bind TXT chaos +short
"1aa87e837b20"

This indicates that you need to direct the tool and the show commands against the service point with the 169.254.231.10 IP.