In 8.4, we introduced several new components.
- An anomaly detection engine
- An embedded long-term time-series database
The anomaly detection engine uses machine learning models to identify outliers and unusual behaviors for several metrics. Upon detection of an anomaly, a warning alert is generated.
Awareness of such anomalies identifies early symptoms to emerging issues, allowing you to
address them before they become bigger problems. In our implementation, we’re using the
popular z-score method to detect anomalies. A z-score measures
exactly how many standard deviations above or below the mean a data point is. The system
evaluates several metrics based on a week's worth of data points. Whereas regular
issues, the system evaluates 90 minutes’ worth of data points. These data points are
stored in the newly embedded time-series database. When a new data point is collected, a
z-score is calculated. An alert is generated if the z-score of that data point is
greater than 3 (or less than –3). The alert will remain active for 10 minutes. During
that time, if no other anomalies are detected, the alert will resolve itself and go into
the cooldown state. If another data point has a z-score greater than 3 (or less than
–3), the alert will remain active for another 10 minutes.
Note: By default, the system
does not write metrics to the long-term time-series database. In other words, this
feature is disabled by default. Please create a ticket. Our support team will work
with you to ensure you’ve the resources to support anomaly detection before writing
metrics to the long-term time-series database.