Issues move through different stages:
- Early-Symptom
- Unresolved
- Cooldown
- Resolved
Issues move from Early-Symptom to Unresolved, Unresolved to Cooldown and from Cooldown to Resolved. Issues are being updated by the system whenever there is a state change.
Some issues have issue items. Think of issue items as sub-issues. For example, interfaces are issues items for the “Network port(s) down” rule. Issues are being updated by the system whenever there is an issue item state change.
Every time there is a state change, the system populates the NOTES section with the state change information. It also sends additional notifications such as email.
List of state changes populated under NOTES:
- When an early symptom is detected.
- When an issue is created.
- When an issue is resolved.
- When an issue item is identified.
- When an issue enters cooldown mode.
- When an issue is archived.
- When an issue is un-archived.
- When an issue item is archived.
- When an issue item enters cooldown mode.
- When an issue item exits cooldown mode.
- When an issue item state changes, e.g. a vpn tunnel goes down or comes up.
Cooldown
The cooldown period is designed to reduce the signal to noise ratio for flapping issues. LiveAssurance identifies an issue, we notify you. If the issue is resolved, it is moved to a “cooldown” phase instead of moving it to resolved.
During this cooldown period, if the issue resolves and un-resolves again, we will not open a new issue thereby reducing the number of active issues. Effectively, we are building a patience mechanism into our system. We will not resolve issues so quickly in case of multiple recurrences of the same issue.
New NOTES will be added when an issue moves to the cooldown period. The period is determined based on historical data LiveAssurance collected from Insight.
In this example, LiveAssurance waited for a few hours before resolving the issue. During this time, LiveAssurance will not create new issues for the same problem.
If an issue consists of multiple issue items, the cooldown period will apply to the issue items.
Let us work through an example. An issue has two issue items, cpu-0 and cpu-1 utilization are high. cpu-1 utilization becomes low, cpu-1 issue item moves to cooldown. In this case, the cooldown period is 2 hours. Two hours later, cpu-1 issue item moves to resolved state. At this point, the issue is still active with one issue item in resolved state and the other issue item remains active. Now cpu-0 utilization becomes low, cpu-0 moves to cooldown. Two hours later, we move the issue item to resolved state. In this example, the cooldown period lasted four hours.
Early-Symptom
Early-Symptom is disabled by default. The idea is to suppress issues that are transient in nature causing temporary problems.
To configure Early-Symptom, navigate to the Application tab ().
Enable Early-Symptom, specify the desirable time range (e.g. 4 minutes).
When an issue is detected, instead of raising an issue immediately, LiveAssurance will wait for the defined time window. During that time, if the issue resolves itself in that timeframe, LiveAssurance will not alert the user and immediately move the issue to Resolved state and archive the issue automatically. If the issue persists, LiveAssurance will move the issue to Active state and send notifications.
Early-symptom is considered a transient state. If you want to see these transient issues at any one point in time, simply go to the Issues tab, set Status filter to Early-symptom.
It is true that LiveAssurance will not alert you on transient issues that resolve themselves within a window. If for some reason you are wondering why LiveAssurance did not send a notification, you can go to the Archived list, search for the device in question (or device and issue in question to narrow the search), you should see the issue was resolved and in the notes, it would indicate that LiveAssurance detected an early symptom but the issue resolved itself.