System Under Load
System Under Load (SUL) prevents failover to standby SGMs when an SGM experiences high CPU load. It also does not allow remote SGMs to be set to DOWN when they cannot send Cluster Control Protocol ( CCP) packets for a specified timeout. The timeout is 3 seconds by default. You can configure each SGM individually.
The system is considered Under Load (SUL state ON) when least one SGM has kernel CPU Usage above the CPU threshold. The threshold is 80% by default.
SUL calculates the CPU usage based on 5 samples by default. It takes samples every 0.2 seconds.
Each SGM calculates its own Kernel High CPU Usage. It uses CCP packets to talk to remote SGMs.
Local Kernel CPU usage and remote usage differ in these ways:
- If the Local Kernel CPU load is high or the Remote Kernel CPU load is high, the SUL state is set to ON.
- If both the Local User Space CPU load and the Remote User Space CPU load are high, SUL does not allow Critical Devices that monitor User Space processes (like
fwd ) to send PNOTE notifications on the local SGM.
SUL state change
SUL Feature flow
SUL is turned ON when:
SUL ON mode is delayed for a fixed timeout (Start timeout) (default=0) if at least one SGM continually reports high CPU more than 3min (Long interval) and the reason for setting OFF from the beginning was the long-timeout expiration.
SUL is turned OFF when:
- The system is idle - no SGM reported High CPU usage for at least 10 seconds (default timeout of Short timeout)
- The system is Under Load for longer than the Long Interval (by default 3 minutes). It is forced to toggle OFF, even if SGMs are still reporting High CPU. SUL is ON again if they keep reporting high CPU after SUL is set to OFF after the Start timeout period (by default 0).
- You manually disable the feature while SUL is ON.
Syntax
# fw ctl set int fwha_pnote_timeout_mechanism_monitor_cpu <on_off>
Parameter
|
Description
|
<on_off>
|
Turns on or off SUL
Valid values:
0 - Turns on SUL1 - Turns off SUL
|
Example
# fw ctl set int fwha_pnote_timeout_mechanism_monitor_cpu 1
Logs
Every state change (ON/OFF) is logged in SmartView Tracker & /var/log/messages (dmesg) , when only the SMO sends the SVT messages.
Log Example in SmartView Tracker:
Configuring SUL
You can configure SUL to meet specific needs.
Syntax
# fw ctl set int <parameter> <value>
Parameter
|
Description
|
fwha_pnote_timeout_mechanism_cpu_load_limit <value>
|
CPU threshold
Highest average CPU usage of a single core
Default - 80
|
fwha_sul_num_sample_cpu_check <value>
|
Number of samples
Number of samples the CPU average is based on
Sample is taken every 2 HTUs
Default - 5
HTU - HA Time Unit (0.1s)
|
fwha_pnote_timeout_mechanism_disable_feature_timeout <value>
|
Long Interval
Maximum time allowed for SUL ON state
Default - 1800 HTU (3 minutes)
HTU - HA Time Unit (0.1s)
|
fwha_system_under_load_short_timeout <value>
|
Short Timeout
Low CPU usage period to set SUL off
Default = 100 HTU (10 seconds)
HTU - HA Time Unit (0.1s)
|
fwha_system_under_load_start_timeout <value>
|
Start Timeout
Delay time until next SUL ON, if last ON period interrupted by a long interval
Default = 0 HTU (0 seconds)
HTU - HA Time Unit (0.1s)
|
|
Note - To make sure that SUL parameters, including state (ON/OFF), survive reboot, add them to fwkern.conf:
# g_update_conf_file utility
|
|