Open Frames Download Complete PDF Send Feedback Print This Page

Previous

Next

System Under Load

System Under Load (SUL) prevents failover to standby SGMs when an SGM experiences high CPU load. It also does not allow remote SGMs to be set to DOWN when they cannot send Cluster Control Protocol (CCP) packets for a specified timeout. The timeout is 3 seconds by default. You can configure each SGM individually.

The system is considered Under Load (SUL state ON) when least one SGM has kernel CPU Usage above the CPU threshold. The threshold is 80% by default.

SUL calculates the CPU usage based on 5 samples by default. It takes samples every 0.2 seconds.

Each SGM calculates its own Kernel High CPU Usage. It uses CCP packets to talk to remote SGMs.

Local Kernel CPU usage and remote usage differ in these ways:

  • If the Local Kernel CPU load is high or the Remote Kernel CPU load is high, the SUL state is set to ON.
  • If both the Local User Space CPU load and the Remote User Space CPU load are high, SUL does not allow Critical Devices that monitor User Space processes (like fwd) to send PNOTE notifications on the local SGM.

SUL state change

SUL Feature flow

  • SUL is set to ON if there is reported high CPU usage.
  • SUL is set to OFF if no report is received for at least 10 seconds by default from the last report (short timeout).

    If the system is continually under load (high CPU report gap is less then short timeout), SUL stays on for up to 3 minutes by default (Long interval).

SUL is turned ON when:

  • Every SGM calculates CPU usage on all CPU cores. It picks the highest and stores it in memory.
  • On every CPU state check (called periodically) the system takes the average of recent 5 highest samples (Number of sample) and publishes through CCP.
  • By receiving CCP with SGM CPU:

    If > threshold (CPU threshold)--> toggle SUL ON

  • By calculating locally:
    1. If > threshold (CPU threshold)--> toggle SUL ON
    2.          --> local load is ON (for local user-space PNOTEs)

SUL ON mode is delayed for a fixed timeout (Start timeout) (default=0) if at least one SGM continually reports high CPU more than 3min (Long interval) and the reason for setting OFF from the beginning was the long-timeout expiration.

SUL is turned OFF when:

  • The system is idle - no SGM reported High CPU usage for at least 10 seconds (default timeout of Short timeout)
  • The system is Under Load for longer than the Long Interval (by default 3 minutes). It is forced to toggle OFF, even if SGMs are still reporting High CPU. SUL is ON again if they keep reporting high CPU after SUL is set to OFF after the Start timeout period (by default 0).
  • You manually disable the feature while SUL is ON.

Syntax

# fw ctl set int fwha_pnote_timeout_mechanism_monitor_cpu <on_off>

Parameter

Description

<on_off>

Turns on or off SUL

Valid values:

  • 0 - Turns on SUL
  • 1 - Turns off SUL

Example

# fw ctl set int fwha_pnote_timeout_mechanism_monitor_cpu 1

Logs

Every state change (ON/OFF) is logged in SmartView Tracker & /var/log/messages (dmesg), when only the SMO sends the SVT messages.

Log Example in SmartView Tracker:

Configuring SUL

You can configure SUL to meet specific needs.

Syntax

# fw ctl set int <parameter> <value>

Parameter

Description

fwha_pnote_timeout_mechanism_cpu_load_limit <value>

CPU threshold

Highest average CPU usage of a single core

Default - 80

fwha_sul_num_sample_cpu_check <value>

Number of samples

Number of samples the CPU average is based on

Sample is taken every 2 HTUs

Default - 5

HTU - HA Time Unit (0.1s)

fwha_pnote_timeout_mechanism_disable_feature_timeout <value>

Long Interval

Maximum time allowed for SUL ON state

Default - 1800 HTU (3 minutes)

HTU - HA Time Unit (0.1s)

fwha_system_under_load_short_timeout <value>

Short Timeout

Low CPU usage period to set SUL off

Default = 100 HTU (10 seconds)

HTU - HA Time Unit (0.1s)

fwha_system_under_load_start_timeout <value>

Start Timeout

Delay time until next SUL ON, if last ON period interrupted by a long interval

Default = 0 HTU (0 seconds)

HTU - HA Time Unit (0.1s)

Note - To make sure that SUL parameters, including state (ON/OFF), survive reboot, add them to fwkern.conf:

# g_update_conf_file utility

 
Top of Page ©2014 Check Point Software Technologies Ltd. All rights reserved. Download Complete PDF Send Feedback Print