Alert Management - LiveNX

LiveNX Operations Dashboard Admin Guide

ft:locale
en-US
Product name
LiveNX

Alert Management is where LiveNX’s Alerts can be enabled, thresholds configured, and sharingoptions defined. LiveNX’s alerting engine can track multiple KPIs, notify when thresholds have beencrossed, and provide both in-app and external notification.

LiveNX - Configure menu

The Alert Management page lists all available alerts and a summary of their configuration.


LiveNX - Alert Management screen

Single threshold Alerts can be enabled/disabled by selecting the alert and clicking Enable or Disable.


LiveNX - Enable and Disable alerts

Clicking on an alert will show its configuration detail settings.


LiveNX - Alert configuration

Each Alert’s details will have similar, yet distinct capabilities based on their respective use cases. For example, all alerts will provide the following general configuration settings:

  • Enable switch
  • Severity
  • Threshold
  • Sharing

But the level of complexity of the options presented are driven by the use case's needs.

Example of a simple, single threshold Alert:


Example of a simple, single threshold Alert

Example of a complex, multi threshold/ multi-Instance Alert:


Example of a complex, multi-instance Alert

Enable Switch

All alerts will have at least one enable switch:

Enable switch

Severity

LiveNX provides the dollowing severity levels for all Alerts:

  • Critical
  • Warning
  • Info

Simple Alerts have just one severity level for the Alert's one threshold:


Simple Alerts - Severity

While other, more complex Alerts may provide unique severities per threshold level, as well as Time to Trigger and Automatic Resolution Time settings.


Complex Alerts - Severity levels

Thresholds

There are threshold options that could be present for any given type of Alert.

The following are commonly seen across many Alert types:


Threshold settings

The Alert’s threshold must be crossed for at least this time period for the Alert to trigger. A Value of 0 will immediately trigger the Alert as soon as the threshold is crossed.


Minimum time period for the Alert to trigger

Time to Trigger

The time to wait before clearing an Alert after the threshold is no longer being crossed. This will help ensure an Alert is not “noisy” when the threshold is frequently being crossed and resolved. A value of 0 will immediately trigger the Alert as soon as its threshold is crossed.


Time to trigger

Automatic Resolution Time

This value controls the duration of time that a threshold must have returned to its normal state before an Alert is cleared. This will help ensure an Alert is not “noisy” when the threshold is frequently being crossed and resolved. A value of 0 will immediately clear the alert when the threshold is resolved.


Automatic Resolution Time

Example:

Threshold settings will work in conjunction with one another to determine when a specific alert should trigger or be cleared. The following provides a practical example of how a complex, multithreshold alert will operate in LiveNX. The following is the configuration for a High WAN Utilization alert:


Thresholds example

Time to Trigger >= 1 min

Automatic Resolution Time = 4 min

Critical >= 80%

Warning >= 60%

Info >= 40% (Disabled)

Next, consider the following time series graph representing a WAN interfaces utilization over time.


Graphic representation of WAN interfaces utilization over time
  • 10:00am - Utilization elevated over critical threshold
  • 10:01am – Time to Trigger exceeded, critical alert is opened
  • 10:05am – Utilization falls below all configured thresholds
  • 10:09am – Automatic Resolution Time exceeded, alert is resolved
  • 10:10am - Utilization elevated over critical threshold
  • 10:11am – Time to Trigger exceeded, critical alert is opened
  • 10:15am – Utilization falls below critical threshold, but above warning thresholds
  • 10:19am – Automatic Resolution Time exceeded, and critical alert is resolved. But Time to Trigger is exceeded and new Warning alert is opened

  • 10:25am – Utilization falls below all configured thresholds
  • 10:29am – Automatic Resolution Time exceeded, and warning alert is resolved

Sharing

Alerts can be shared when triggered via the following methods:

Email - Alerts can be forwarded to one or more email destinations.

ServiceNow - via API integration, LiveNX can forward its Alerts as either Events or Incidents.

SNMP Trap - Alert can be forwarded to an external SNMP server configured to receive traps.

WebUI - Alerts will be included in the LiveNX Operations Dashboard Notification Sidebar.

Syslog - Alert can be forwarded to an external Syslog server.


Sharing options

Please see the Integration section of this document for configuration prerequisite for Email, ServiceNow, SNMP Traps, and Syslog sharing.

Example of the LiveNX Operations Dashboard Notification Sidebar showing Alert notifications.


Operations Dashboard showing Alert notifications
There are two types of Alerts in LiveNX:
  • Single Instance Alert
  • Multi-Instance Alert

Single Instance Alerts

Single Instance Alerts are global in scope. All Sites/ Devices/ Interface will share the same threshold and sharing configuration.

Below is an example of a Single Threshold Alert:


Single Threshold Alert

One Single Instance Alert worth noting is the QoS Class Drop Alert. This alert is global in scope and applies to all devices, but unique thresholds can be configured for each class (queue) name.


QoS Class Drop settings

Multi-Instance Alerts

Multi-Instance Alerts help solve the following types of use cases:
  • Alert when Chicago’s WAN circuit is > 85% for the last 15 minutes and send a notification to just the Chicago admin.
  • Alert when New York’s WAN circuit is >75% utilized for the last 10 minutes and send a notification email both to the New York and Chicago admins.
  • Alert when all other WAN circuits are >95% utilized for the last 15 minutes and send a notification email to an all admins.

Multi-Instance Alerts could be conceptualized like an access list found in a router or firewall. They are an ordered list of thresholds that are matched in a top-down manner. Each Instance has an Alert Source Filter that defines the Sites/Devices/Interfaces/etc. that are matched by the Instance. Once a match is found, the associated Instance’s threshold will be considered for the KPI being measured and no additional Instances will be considered. If no specific Instance is matched, the KPI being measured will use the default instance, if it is enabled. If an Instance is not enabled, it will be ignored.

Below is an example of a Multi-threshold Alert:


High WAN Interface Utilization settings

In this example of the High WAN Interface Utilization Alert, there are three instances enabled and the Default threshold (Instance) is disabled. This configuration ensures only interfaces matching the Alert Source Filter of these three instances can generate an alert.

The top Instance named LiveWire interface eth1 provides the following configuration:

  • The Alert Source filter that is matching Device: SE-LiveWire-NY AND Interface SE-LiveWire-NY ->eth1. This means that this instance will only apply to the utilization of this specific interface.
  • The Threshold will monitor the utilization of this interface and can generate a Critical, Warning, and Info alert for it.
  • The Sharing settings will send an Email notification to test@test.com and also populate the LiveNX Notification sidebar.

By Default, Multi-Instance Alerts only have their Default threshold configured. If enabled, all applicable Sites/Devices/Interfaces/Applications will match this instance.


Device CPU Utilization - List of Instances

When a new Instance is configured, the Alert Source filter must be configured. This will define which Sites/Devices/Interfaces/Applications will match this instance.


Device CPU Utilization - Alert Source

In this example, the new Instance’s name is “Austin Router” and the Alert Source has been configured to only match “Device: RTR_Austin.liveaction.com”. Since Instances are matched in a top-down order, the Austin router will be measured against this specific Instance’s Threshold settings and all other devices will use the Default threshold Instance.


"Austin Router" as the only Alert Source - example

Alert Status

Some Alerts will drive Site, Device, and Interface status on other pages in LiveNX. LiveNX status is the real-time performance state of a monitored object. The available status severities are:

  • Green / Good
  • Yellow / Warning
  • Red / Critical
  • Grey / Unknown

Below are example page views that use status driven from Alerts:


Geo Topology view

Alert Status Overview

Alert Status - Sites

Alerts that drive status will be designated with a badge as shown below:


Alerts with badges