Manual Configuration of Alerts
To configure alerts manually, refer to the Grafana documentation for setting up alerts.
Configuration in Grafana Web UI
-
Open the Grafana Web UI.
-
Navigate to Alerts > Alerts Rules.
-
In the top right corner, click New alert rule.
-
In Step 1, enter the alert name.
-
Select your data source.
-
Configure the alert query and expression.
-
In Step 4, create a folder or assign the alert to an existing folder.
-
In Step 5, either select an existing or create new evaluation Group which defines the sample rate for all your alerts in the group. Configure the applicable pending mode.
-
Give the alert a summary and attach it to your dashboard widgets if needed.
Example Configuration Queries for Alerts
Query for alert - Security Gateway is "Down"
PromQL syntax for the query:
|
|
Expression:
Expression C:
-
Type: Threshold -
Input: A -
Is above: 0
Alert Summary:
The Security Gateway was up in the last 5 days but it is not sending telemetry in the last 3 minutes.
The Security Gateway may be down or unreachable.
Remediation:
-
Verify the Security Gateway is powered on.
-
Check connectivity, modem status and reachability with ping / SSH.
-
Check connectivity and skyline configuration between the Security Gateway and the monitoring system.
-
Based on the results of previous steps, contact the ISP Support or Check Point Support.
Labels:
-
Importance: Critical
-
Severity: Critical
Query for alert - ISP "Down"
PromQL syntax for the query:
|
|
Expression:
Is above or equal to: 1
Alert Summary:
SD-WAN detected that an ISP is unreachable based on nexthop probes.
This may be caused by a physical link failure, upstream outage, or unresponsive next hop.
Remediation:
-
Verify interface and modem status on the gateway.
-
Review recent changes in the ISP link configuration (speed/duplex, VLANs, etc.).
-
Follow the ISP troubleshooting steps in the admin guide.
-
If the link remains down, contact the ISP Support or Check Point Support.
Labels:
-
Importance: Critical
-
Severity: Critical
Query for alert - Link QOE is low
PromQL syntax for the query:
|
|
Expression:
Is below: 3.8
Alert Summary:
WAN link QOE score is lower than 3.8, which indicate degraded performance.
Remediation:
-
Check for high traffic volume, latency, jitter, or packet loss on the interface.
-
If degradation persists, open a ticket with the ISP and provide the time/metrics.
Labels:
-
Importance: High
-
Severity: High
Query for alert - VPN tunnel is "Down" while ISP link is "UP"
PromQL syntax for the query:
|
|
Expression:
Is above: 1
Alert Summary:
SD-WAN detected that the ISP is operational, but the Overlay VPN tunnel endpoint is not responding to SD-WAN overlay probing.
This can be a result of configuration mismatches, peer-side issues, or encryption failures.
Remediation:
-
Check interface / VPN configuration changes.
-
Verify that there is no Undelay link down on a remote Security Gateway.
-
Follow troubleshooting steps in the admin guide. If the tunnel remains down.
-
Contact Check Point Support.
Labels:
-
Importance: Low
-
Severity: Low
Query for alert - All public ISPs exceeded thresholds of one steering object
PromQL syntax for the query:
|
|
Expression:
Is above: 0
Alert Summary:
SD-WAN detected that every public ISP link assigned to this steering object violated one or more thresholds (latency, packet loss, or jitter).
No healthy public ISP path was available during this time.
Remediation:
-
Investigate local network metrics and utilization, utilization of resources on the Security Gateway, and last-mile connectivity.
-
Check the probing target / Application.
-
Contact the ISP Support if degradation persists.
Labels:
-
Importance: Medium
-
Severity: Medium
Query for alert - All public ISPs exceeded thresholds for multiple steering objects
PromQL syntax for the query:
|
|
Expression:
Is above: 1
Alert Summary:
SD-WAN detected simultaneous threshold violations on all public ISP links across more than one steering object.
This suggests a gateway-level, access-network, or regional ISP issue.
Remediation:
-
Review the Security Gateway health (CPU, drops, link utilization, shaping) and local network conditions.
-
Review ISP metrics and recent changes.
-
Contact the ISP Support if degradation persists.
Labels:
-
Importance: High
-
Severity: High
Query for alert - ISP exceeded threshold on one steering object
PromQL syntax for the query:
|
|
Expression:
Is equal to: 1
Alert Summary:
SD-WAN continuously monitors ISP performance. This alert was generated because one of the metrics in ISP crossed the configured threshold for the specified steering object.
Remediation:
-
Review current and historical ISP performance metrics.
-
Verify whether the issue is temporary or persistent.
-
Check application or probing target for issues.
-
If degradation continues beyond a short time window, contact the ISP Support.
Labels:
-
Importance: Low
-
Severity: Low
Query for alert - ISP exceeded thresholds for multiple steering objects
PromQL syntax for the query:
Query A:
|
|
Query B:
|
|
Expression:
-
Expression C.
Type: ThresholdInput: DIs equal to: 1 -
Expression D.
Type: Math${B} > 1 && ${A} > 0
Example:
Alert Summary:
SD-WAN detected threshold violations for the same ISP across multiple steering objects within a short time window.
This pattern indicates a shared ISP issue.
Remediation:
-
Review ISP metrics, link utilization, and recent changes.
-
Contact the ISP Support if degradation persists.
Labels:
-
Importance: Medium
-
Severity: Medium
Query for alert - ISP state changes frequently
Scenario:
When the status of an ISP link changes from Down to UP, or from UP to Down, more than 5 times in 1 last hour.
PromQL syntax for the query:
|
|
Expression:
Is above: 5
Alert Summary:
SD-WAN detected that an ISP is unreachable based on nexthop probes.
This may be caused by a physical link failure, upstream outage, or unresponsive next hop.
Remediation:
-
Review link flap timing and correlate with environmental or power outages.
-
Verify cabling, modem health, and local infrastructure.
-
If instability continues, contact the ISP Support.
Labels:
-
Importance: High
-
Severity: High
Query for alert - Overlay VPN tunnel is down (ISP down)
PromQL syntax for the query:
|
|
Expression:
Is above: 0
Alert Summary:
SD-WAN detected that an ISP is down, causing all Overlay VPN tunnels routed through it to go down.
Remediation:
-
Restore ISP connectivity.
-
Follow troubleshooting steps in the Administration Guide.
-
If the tunnels remain down after recovery, contact Check Point Support.
Labels:
-
Importance: Medium
-
Severity: Medium
Query for alert - All Overlay VPN tunnels to a peer are down
PromQL syntax for the query:
|
|
Expression:
Is above: 0
Alert Summary:
SD-WAN detected that an ISP is down, causing all Overlay VPN tunnels routed through it to go down.
Remediation:
-
Check remote gateway status.
-
Check VPN configuration.
-
Follow troubleshooting steps in the Administration Guide.
-
If the tunnel remains down, contact Check Point Support.
Labels:
-
Importance: Critical
-
Severity: Critical
Query for alert - Overlay VPN tunnel state changes frequently
PromQL syntax for the query:
|
|
Expression:
Is above: 5
Alert Summary:
The Overlay VPN tunnel on the Security Gateway, the Overlay VPN tunnel from local ISP to peer had status change for at least 6 times over the last day.
This indicates an unstable connectivity.
Remediation:
-
Review timing of the flapping events, verify link statuses, cabling and modem health, and assess environmental or power-related factors.
-
Follow the troubleshooting steps in the Administration Guide.
-
If the issue continues, contact Check Point Support.
Labels:
-
Importance: High
-
Severity: High
Query for alert - High CPU utilization
PromQL syntax for the query:
|
|
Expression:
Is above: 75
Alert Summary:
CPU utilization on the Security Gateway is greater than 75%.
Remediation:
-
Check for spikes in traffic volume or new large flows.
-
Review CPU utilization by processes on the Security Gateway.
-
Consider optimizing security policies, inspection profiles, or logging if relevant.
-
If the issue persists, evaluate the Security Gateway capacity (scale-up / scale-out).
Labels:
-
Importance: High
-
Severity: High
Query for alert - High memory utilization
PromQL syntax for the query:
|
|
Expression:
Is above: 75
Alert Summary:
Memory utilization on the Security Gateway is greater than 75%.
Remediation:
-
Review memory utilization by processes on the Security Gateway.
-
Review logs for memory-related errors or crashes.
-
Restart non-critical services during maintenance windows if needed.
-
If the issue persists, plan for software upgrade or hardware resource increase.
Labels:
-
Importance: High
-
Severity: High


