Cluster Failover

What is Failover?

FailoverClosed Transferring of a control over traffic (packet filtering) from a Cluster Member that suffered a failure to another Cluster Member (based on internal cluster algorithms). Synonym: Fail-over. is a clusterClosed Two or more Security Gateways that work together in a redundant configuration - High Availability, or Load Sharing. redundancy operation that automatically occurs if a Cluster MemberClosed Security Gateway that is part of a cluster. is not functional. When this occurs, other Cluster Members take over for the failed Cluster Member.

In the High Availability mode:

In Load Sharing modes:

  • If a Cluster Member detects that it cannot function as a Cluster Member, it notifies the peer Cluster Members that it must go down. Traffic load will be redistributed between the working Cluster Members.

  • If the Cluster Members stop receiving Cluster Control Protocol (CCP) packets from one of their peer Cluster Member, those working Cluster Members can assume that their peer Cluster Member failed. As a result, traffic load will be redistributed between the working Cluster Members.

  • Because by design, all Cluster Members are always synchronized, current connections are not interrupted when cluster failover occurs.

To tell each Cluster Member that the other Cluster Members are alive and functioning, the ClusterXLClosed Cluster of Check Point Security Gateways that work together in a redundant configuration. The ClusterXL both handles the traffic and performs State Synchronization. These Check Point Security Gateways are installed on Gaia OS: (1) ClusterXL supports up to 5 Cluster Members, (2) VRRP Cluster supports up to 2 Cluster Members, (3) VSX VSLS cluster supports up to 13 Cluster Members. Note: In ClusterXL Load Sharing mode, configuring more than 4 Cluster Members significantly decreases the cluster performance due to amount of Delta Sync traffic. Cluster Control Protocol (CCP) maintains a heartbeat between Cluster Members. If after a predefined time, no CCP packets are received from a Cluster Member, it is assumed that the Cluster Member is down. As a result, cluster failover can occur.

Note that more than one Cluster Member may encounter a problem that will result in a cluster failover event. In cases where all Cluster Members encounter such problems, ClusterXL will try to choose a single Cluster Member to continue operating. The state of the chosen member will be reported as Active(!). This situation lasts until another Cluster Member fully recovers. For example, if a cross cable connecting the sync interfaces on Cluster Members malfunctions, both Cluster Members will detect an interface problem. One of them will change to the Down state, and the other to Active (!) state.

What Happens When a Cluster Member Recovers?

In the High Availability mode:

  • If cluster object is configured as Maintain current active Cluster Member, it means any Cluster Member that becomes Active, remains Active.

    If the Cluster Member with highest priority fails, cluster failover occurs. A Cluster Member with the next highest priority becomes Active.

    If the Cluster Member with highest priority recovers, cluster failover does not occurs again, and that Cluster Member becomes Standby.

  • If cluster object is configured as Switch to higher priority Cluster Member, it means that Cluster Member with the highest priority always has to be Active.

    Cluster Member with the highest priority is the Cluster Member that appears at the top of the list in Cluster object > Cluster Members pane.

    If the Cluster Member with the highest priority fails, cluster failover occurs. A peer Cluster Member in Standby state, with the next highest priority, becomes Active.

    If the Cluster Member with the highest priority recovers, cluster failover occurs again. The Cluster Member with the highest priority becomes Active again. The Cluster Member with the next highest priority that was Active, returns to the Standby state.

In the Load Sharing modes:

  • When the failed Cluster Member recovers, all connections are redistributed between all Active Cluster Members.

How a Recovered Cluster Member Obtains the Security Policy

The Administrator installs the Security Policy on the cluster object, rather than separately on individual Cluster Members. The policy is automatically installed on all Cluster Members. The policy is sent to the IP addresses defined in the General Properties page of the cluster member object.

When a failed cluster member recovers, first it tries to fetch a policy from one of the peer Active Cluster Members. The assumption is that the other Cluster Members have a more up to date policy. If fetching a policy from peer cluster member fails, the recovered cluster member compares its own local policy to the policy on its Management ServerClosed Check Point Single-Domain Security Management Server or a Multi-Domain Security Management Server.. If the policy on the Management Server is more up to date than the one on the recovered cluster member, the policy is fetched from the Management Server. If the cluster member does not have a local policy, it retrieves one from the Management Server. This ensures that all Cluster Members use the same policy at any given moment.

General Failover Limitations

Some connections may not survive cluster failover: