Synchronizing Connection Information Across the Cluster
The Check Point State Synchronization Solution
A failure of a firewall results in an immediate loss of active connections in and out of the organization. Many of these connections, such as financial transactions, may be mission critical, and losing them will result in the loss of critical data. ClusterXL supplies an infrastructure that ensures that no data is lost in case of a failure, by making sure each gateway cluster member is aware of the connections going through the other members. Passing information about connections and other Security Gateway states between the cluster members is called State Synchronization.
Every IP based service (including TCP and UDP) recognized by the Security Gateway is synchronized.
State Synchronization is used both by ClusterXL and by third-party OPSEC-certified clustering products.
Machines in a ClusterXL Load Sharing configuration must be synchronized. Machines in a ClusterXL High Availability configuration do not have to be synchronized, though if they are not, connections will be lost upon failover.
The Synchronization Network
The Synchronization Network is used to transfer synchronization information about connections and other Security Gateway states between cluster members.
Since the synchronization network carries the most sensitive Security Policy information in the organization, it is critical that you protect it against both malicious and unintentional threats. We recommend that you secure the synchronization interfaces using one of the following methods:
- Using a dedicated synchronization network
- Connecting the physical network interfaces of the cluster members directly using a cross-cable. In a cluster with three or more members, use a dedicated hub or switch
Following these recommendations guarantees the safety of the synchronization network because no other networks carry synchronization information.
It is possible to define more than one synchronization network for backup purposes. It is recommended that the backup be a dedicated network.
In Cluster XL, the synchronization network is supported on the lowest VLAN tag of a VLAN interface. For example, if three VLANs with tags 10, 20 and 30 are configured on interface eth1, interface eth1.10 may be used for synchronization.
How State Synchronization Works
Synchronization works in two modes:
- Full sync transfers all Security Gateway kernel table information from one cluster member to another. It is handled by the fwd daemon using an encrypted TCP connection.
- Delta sync transfers changes in the kernel tables between cluster members. Delta sync is handled by the Security Gateway kernel using UDP multicast or broadcast on port 8116.
Full sync is used for initial transfers of state information, for many thousands of connections. If a cluster member is brought up after being down, it will perform full sync. After all members are synchronized, only updates are transferred via delta sync. Delta sync is quicker than full sync.
State Synchronization traffic typically makes up around 90% of all Cluster Control Protocol (CCP) traffic. State Synchronization packets are distinguished from the rest of CCP traffic via an opcode in the UDP data header.
In a gateway cluster, all connections on all cluster members are normally synchronized across the cluster. Not all services that cross a gateway cluster must be synchronized.
- You can decide not to synchronize TCP, UDP and other service types. By default, all these services are synchronized.
- The VRRP and IP Clustering control protocols, and the IGMP protocol, are not synchronized by default (but you can choose to turn on synchronization for these protocols). Protocols that run solely between cluster members need not be synchronized. Although you can synchronize them, no benefit will be gained. This synchronization information will not help a failover. These protocols are not synchronized by default: IGMP, VRRP, IP clustering and some other OPSEC cluster control protocols.
- Broadcasts and multicasts are not synchronized, and cannot be synchronized.
You can have a synchronized service and a non-synchronized definition of a service, and use them selectively in the Rule Base.
Configuring Services not to Synchronize
Synchronization incurs a performance cost. You may choose not to synchronize a service if these conditions are true:
- A significant amount of traffic crosses the cluster through a particular service. Not synchronizing the service reduces the amount of synchronization traffic and so enhances cluster performance.
- The service usually opens short connections, whose loss may not be noticed. DNS (over UDP) and HTTP are typically responsible for most connections and frequently have short life and inherent recoverability in the application level. Services which typically open long connections, such as FTP, should always be synchronized.
- Configurations that ensure bi-directional stickiness for all connections do not require synchronization to operate (only to maintain High Availability). Such configurations include:
- Any cluster in High Availability mode (for example, ClusterXL New HA or IPSO VRRP).
- ClusterXL in a Load Sharing mode with clear connections (no VPN or static NAT).
- OPSEC clusters that guarantee full stickiness (refer to the OPSEC cluster's documentation).
- VPN and Static NAT connections passing through a ClusterXL cluster in a Load Sharing mode (either multicast or unicast) may not maintain bi-directional stickiness. State Synchronization must be turned on for such environments.
Duration Limited Synchronization
Some TCP services (HTTP for example) are characterized by connections with a very short duration. There is no point in synchronizing these connections because every synchronized connection consumes gateway resources, and the connection is likely to have finished by the time a failover occurs.
For all TCP services whose Protocol Type (that is defined in the GUI) is HTTP or None, you can use this option to delay telling the Security Gateway about a connection, so that the connection will only be synchronized if it still exists x seconds after the connection is initiated. This feature requires a SecureXL device that supports "Delayed Notifications" and the current cluster configuration (such as Performance Pack with ClusterXL LS Multicast).
This capability is only available if a SecureXL-enabled device is installed on the Security Gateway through which the connection passes.
The setting is ignored if connection templates are not offloaded from the ClusterXL-enabled device. See the SecureXL documentation for additional information.
A connection is called sticky if all packets are handled by a single cluster member. In a non-sticky connection, the reply packet returns via a different gateway than the original packet.
The synchronization mechanism knows how to properly handle non-sticky connections. In a non-sticky connection, a cluster member gateway can receive an out-of-state packet, which Security Gateway normally drops because it poses a security risk.
In Load Sharing configurations, all cluster members are active, and in Static NAT and encrypted connections, the source and destination IP addresses change. Therefore, Static NAT and encrypted connections through a Load Sharing cluster may be non‑sticky. Non-stickiness may also occur with Hide NAT, but ClusterXL has a mechanism to make it sticky.
In High Availability configurations, all packets reach the Active machine, so all connections are sticky. If failover occurs during connection establishment, the connection is lost, but synchronization can be performed later.
If the other members do not know about a non-sticky connection, the packet will be out-of-state, and the connection will be dropped for security reasons. However, the Synchronization mechanism knows how to inform other members of the connection. The Synchronization mechanism thereby prevents out-of-state packets in valid, but non‑sticky connections, so that these non-sticky connections are allowed.
Non-sticky connections will also occur if the network administrator has configured asymmetric routing, where a reply packet returns through a different gateway than the original packet.
Non-Sticky Connection Example: TCP 3-Way Handshake
The 3-way handshake that initiates all TCP connections can very commonly lead to a non-sticky (often called asymmetric routing) connection. The following situation may arise as depicted in this illustration:
Client A initiates a connection by sending a SYN packet to server B. The SYN passes through Gateway C, but the SYN/ACK reply returns through Gateway D. This is a non-sticky connection, because the reply packet returns through a different gateway than the original packet.
The synchronization network notifies Gateway D. If gateway D is updated before the SYN/ACK packet sent by server B reaches it, the connection is handled normally. If, however, synchronization is delayed, and the SYN/ACK packet is received on gateway D before the SYN flag has been updated, then the gateway will treat the SYN/ACK packet as out-of-state, and will drop the connection.
You can configure enhanced 3-Way TCP Handshake enforcement to address this issue.
Synchronizing Non-Sticky Connections
The synchronization mechanism prevents out-of-state packets in valid, but non-sticky connections. The way it does this is best illustrated with reference to the 3-way handshake that initiates all TCP data connections. The 3-way handshake proceeds as follows:
- SYN (client to server)
- SYN/ACK (server to client)
- ACK (client to server)
- Data (client to server)
To prevent out-of-state packets, the following sequence (called "Flush and Ack") occurs
- Cluster member receives first packet (SYN) of a connection.
- Suspects that it is non-sticky.
- Hold the SYN packet.
- Send the pending synchronization updates to all cluster members (including all changes relating to this packet).
- Wait for all the other cluster members to acknowledge the information in the sync packet.
- Release held SYN packet.
- All cluster members are ready for the SYN-ACK.
Synchronizing Clusters on a Wide Area Network
Organizations are sometimes faced with the need to locate cluster members in geographical locations that are distant from each other. A typical example is a replicated data center whose locations are widely separated for disaster recovery purposes. In such a configuration it is clearly impractical to use a cross cable as the synchronization network.
The synchronization network can be spread over remote sites, which makes it easier to deploy geographically distributed clustering. There are two limitations to this capability:
- The synchronization network must guarantee no more than 100ms latency and no more than 5% packet loss.
- The synchronization network may only include switches and hubs. No routers are allowed on the synchronization network, because routers drop Cluster Control Protocol packets.
You can monitor and troubleshoot geographically distributed clusters using the command line interface.
Synchronized Cluster Restrictions
The following restrictions apply to synchronizing cluster members:
- Only cluster members running on the identical platform can be synchronized.
- All cluster members must use the same Check Point software version.
- A user-authenticated connection through a cluster member will be lost if the cluster member goes down. Other synchronized cluster members will be unable to resume the connection.
- However, a client-authenticated connection or session-authenticated connection will not be lost.
- The reason for these restrictions is that user authentication state is maintained on Security Servers, which are processes, and thus cannot be synchronized on different machines in the way that kernel data can be synchronized. However, the state of session authentication and client authentication is stored in kernel tables, and thus can be synchronized.
- The state of connections using resources is maintained in a Security Server, so these connections cannot be synchronized for the same reason that user-authenticated connections cannot be synchronized.
- Accounting information is accumulated in each cluster member and reported separately to the Security Management server, where the information is aggregated. In case of a failover, accounting information that was accumulated on the failed member but not yet reported to the Security Management server is lost. To minimize the problem it is possible to reduce the period in which accounting information is "flushed". To do this, in the cluster object's Logs and Masters > Additional Logging page, configure the attribute Update Account Log every:.
Configuring State Synchronization
Configure State synchronization as part of the process of configuring ClusterXL and OPSEC certified clustering products. Configuring State synchronization involves the following steps:
- Setting up a synchronization network for the gateway cluster
- Installing a Security Gateway and enabling synchronization during the configuration process
- Enabling State Synchronization on the ClusterXL page for the cluster object
For configuration procedures, refer to the sections for configuring ClusterXL and OPSEC certified cluster products.
Configuring a Service Not to Synchronize
To set a service not to synchronize:
- In the Services branch of the objects tree, double click the TCP, UDP or Other type service that you do not wish to synchronize.
- In the Service Properties window, click Advanced to display the Advanced Services Properties window.
- Clear Synchronize connections on the cluster.
Creating Synchronized and Non-Synchronized Versions
It is possible to have both a synchronized and a non-synchronized definition of the service, and to use them selectively in the Security Rule Base.
- Define a new TCP, UDP and Other type service. Give it a name that distinguishes it from the existing service.
- Copy all the definitions from the existing service into the Service Properties window of the new service.
- In the new service, click Advanced to display the Advanced Services Properties window.
- Copy all the definitions from the existing service into the Advanced Service Properties window of the new service.
- Set Synchronize connections on the cluster in the new service, so that it is different from the setting in the existing service.
Configuring Duration Limited Synchronization
Before you start this procedure, become familiar with the concept.
Note - This feature is limited to HTTP-based services. The Start synchronizing option is not displayed for other services.
To configure duration limited synchronization:
- In the Services branch of the objects tree, double click the TCP, UDP or Other type service that you wish to synchronize.
- In the Service Properties window, click Advanced to display the Advanced Services Properties window.
- Select Start synchronizing x seconds after connection initiation.
- In the seconds field, enter the number of seconds or select the number of seconds from the list, for which you want synchronization to be delayed after connection initiation.