Open Frames Download Complete PDF Send Feedback Print This Page

Previous

Next

System Under Load

Description

System Under Load feature (SUL) enables the Gateway to monitor high CPU load and also suspends setting remote SGMs to DOWN state when cannot receive CCP packets for a timeout of BLADE_DEAD_INTERVAL (default is 3 sec) and when SUL state ON.

It enables every SGM to act differently when they/other SGM are under load.

Being under load (SUL state ON) meaning at least one SGM has reported Kernel CPU Usage above threshold of 80% by default (CPU threshold)

Highest average Kernel CPU usage of a single core is being calculated locally and is published via CCP packets to remote SGMs

The average is based on 5 samples by default (Number of sample) – sample is taken every 2 HA Time Units (HTU=0.1s)

Every SGM calculates its own Kernel High CPU

Local Kernel High CPU usage and remote usage have almost the same handler with minor changes

  • Local or Remote Kernel High CPU will set SUL state ON
  • Local User space + Kernel High CPU will triggers PNOTE timeout postponer to all user-space PNOTEs (etc fwd) on local SGM

SUL state change

SUL Feature flow

  • SUL set to ON - if reported high CPU
  • SUL will set to OFF if no report has been received for at least 10 seconds by default from the last report (short timeout)

    if system is continually under load (high CPU report gap is less then short timeout, SUL will stay ON for up to 3 minutes by default (Long interval)

When / why SUL is ON?

  • Every SGM calculates CPU usage on all cores, picking the highest and stores in memory.
  • On every CPU state check (called periodically) we take the average of recent 5 highest samples (Number of sample) and publish via CCP
  • By receiving CCP with SGM CPU:

    If > threshold (CPU threshold)--> toggle SUL ON

  • By calculating locally:
    1. If > threshold (CPU threshold)--> toggle SUL ON
    2.          --> local load is ON (for local user-space PNOTEs

SUL ON mode will be delayed for a fixed timeout (Start timeout) (default=0) if at least one SGM continually reports high CPU more than 3min (Long interval) and the reason for setting OFF from the begging was the long-timeout expiration

When / why SUL is OFF?

SUL can be toggle OFF after one of the following scenarios:

  • System is idle - no SGM reported High CPU usage for at least 10 seconds (default timeout of Short timeout)
  • System is Under Load for too long - after a fixed watermark of 3 minutes (Long interval) the SUL in ON, it will be forced to toggle OFF, even if SGMs still reporting High CPU. SUL will be ON again if they will keep reporting high CPU after the shutdown but only after fix timeout – 0 by default is over (Start timeout)
  • User decided to manually disable the feature while SUL was ON

Syntax

fw ctl set int fwha_pnote_timeout_mechanism_monitor_cpu <value>

 

Value

Description

0

Turns SUL mechanism ON

1

Turns SUL mechanism OFF

 

Example

Enabling SUL feature: (SUL is enabled by default)

fw ctl set int fwha_pnote_timeout_mechanism_monitor_cpu 1

Output

Every state change (ON/OFF) is logged via SmartView Tracker & /var/log/messages (dmesg), when (only SMO sends the SVT messages)

Log Example in SmartView Tracker:

Tuning feature Parameters

SUL feature can be modified and tuned to meet user specific needs.

Syntax

fw ctl set int <parameter> <numerical value>

 

Parameter

Description

fwha_pnote_timeout_mechanism_cpu_load_limit

(CPU threshold)

(highest average CPU usage of a single core)

default = 80

fwha_sul_num_sample_cpu_check

(Number of sample)

(on how many samples the CPU average will be based on; sample is taken every 2 HTUs)

default = 5

HTU - HA Time Unit (0.1s)

fwha_pnote_timeout_mechanism_disable_feature_timeout

(Long interval)

(maximum continues time allowed for SUL ON state)

default = 1800 HTU (3 minutes)

HTU - HA Time Unit (0.1s)

fwha_system_under_load_short_timeout

(Short timeout)

(low CPU usage period for setting SUL OFF)

default = 100 HTU (10 seconds)

HTU - HA Time Unit (0.1s)

fwha_system_under_load_start_timeout

(Start timeout)

(delay time between next SUL ON, if last ON period interrupted by Long interval)

default = 0 HTU (0 seconds)

HTU - HA Time Unit (0.1s)

 

Notes

In order for the modified SUL parameters, including state (ON/OFF) to survive reboot, add them to the fwkern.conf file using the g_update_conf_file utility

 
Top of Page ©2014 Check Point Software Technologies Ltd. All rights reserved. Download Complete PDF Send Feedback Print