This topic gives an overview of how DNS Edge service points configured for Anycast periodically self-assess their health and fitness for participating in Anycast. This topic assumes that at least one daemon is running.
When's the health check run?
Daemon | Timers | Calculations |
---|---|---|
OSPFD | Hello and Dead |
|
BGP | Keepalive and Hold |
|
The calculated health check period can be verified in the output of the spDiagnostics API, in the "routing-controller-service" section, the value in ms for the dnsHealthMonitorInterval field.
For more information, see "Service Point diagnostic and health APIs" in the BlueCat DNS Edge API Guide.
What does the health check evaluate?
- If, in any of this subsequent attempts, all RNS respond successfully, then the service point maintains its Anycast configuration and resumes its regular period of checking its health.
- If, in each of the attempts, at least one of the configured RNS services are failing to resolve the probe query, then the service point removes itself from the Anycast pool.
The removal is done by suspending the running daemons, and sending a message to the neighboring devices about going down so they can remove the Anycast route to itself.
This is reflected in the output of the spDiagnostics API "routing-controller-service" section, which will return suspended as the status of any previously enabled daemons.
A suspended state is also reflected in the output of the Anycast configuration utility show_daemons command showing no attached to all previously enabled daemons.
In the suspended state, the service point continues to periodically self-assess its health with the same health check period. When the health check passes, the daemons will be re-enabled, and the service point will be reachable by the Anycast IP.
Verifying which service point is responding to queries
To verify which service point is responding when querying the Anycast IP, run a chaos query against both the service IP of each service point that's configured to participate in the Anycast, including against the Anycast IP.
Comparing the response retrieved can inform administrators of what's the targeted service point within an Anycast pool.
For example, issuing multiple chaos queries against the Anycast IP shows that the query is always reaching the same service point, the one identified with the "1aa87e837b20" response.
$ dig @169.254.239.9 hostname.bind TXT chaos
; <<>> DiG 9.11.3-1ubuntu1.1-Ubuntu <<>> @169.254.239.9 hostname.bind TXT chaos
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63098
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; ANSWER SECTION:
hostname.bind. 0 CH TXT "1aa87e837b20"
...output removed
- One service point is preferred over the other, and it's determined by the routers on the way that the most efficient path is always the same service point.
- The other service point is misconfigured for Anycast and isn't even an option.
To determine which service point isn't elected to resolve the query, run the same chaos query against the actual service IP for each service point in the pool; the IP set on the ens192 interface.
$ dig @169.254.231.10 hostname.bind TXT chaos +short
"accbef482963"
$ dig @169.254.230.11 hostname.bind TXT chaos +short
"1aa87e837b20"
This indicates that you need to direct the tool and the show commands against the service point with the 169.254.231.10 IP.