Connection Draining in Cloud Firewall for GCP Network Security Integration

Connection draining (also referred to as "instance draining" or "graceful termination") is a mechanism used to gracefully remove compute instances from service while maintaining application availability.

It ensures that when a virtual machine (VM) is removed from an instance group due to scale-in events or rolling updates, existing connections are allowed to complete before the instance is terminated.

Connection Draining Overview

Connection draining is a key capability for enabling graceful shutdown or decommissioning of compute instances, such as MIG instances behind a Load Balancer. It allows in-flight requests to complete before the instance is removed from service, minimizing service disruption during scale-in operations or rolling updates.

A connection draining mechanism typically involves:

  • Preventing new incoming connections to the instance.

  • Allowing existing connections to continue for a defined period.

  • Graceful removal of the instance from the backend pool after ongoing connections are completed.

Connection Draining Use Cases

  • Rolling updates

    Connection draining helps avoid dropped connections during version or template upgrades.

  • Auto-scaling operations

    Connection draining allows smooth removal of VMs in a scale-in event.

Connection Draining Setup

In GCPClosed See 'Google Cloud Platform'., connection draining is configured on Load Balancers to ensure graceful handling of in-flight connections during instance termination. Many Load Balancers have connection draining enabled by default, with a timeout typically set to 300 seconds.

Best Practice - Always make sure that connection draining is enabled and that the configured timeout value aligns with your application requirements.

To configure connection draining on existing GCP Load Balancers, do these steps:

  1. Go to the Google Cloud Console.

  2. Navigate to Network Services > Load balancing.

  3. On the Load balancing page, click on the name of your TCP/UDP network Load Balancer.

  4. On the Load Balancer configuration page, scroll to the Backend section and select Advanced configurations. Check if the Connection draining timeout parameter is enabled and has a value.

  5. If the timeout value is not specified, click Edit in the top menu.

  6. Scroll down to the Advanced configurations section.

  7. Set the Connection draining timeout duration (in seconds).

  8. Click Update.

For more information on enabling connection draining in GCP, see this guide: https://cloud.google.com/load-balancing/docs/enabling-connection-draining.

Connection Draining During Scale-In

When a scale-in event triggers and a Cloud Firewall Gateway instance is marked for termination, connection draining enables the following behavior:

  1. The External Load Balancer stops routing new requests to the marked Cloud Firewall Gateway instance.

  2. The existing connections are completed within the specified timeout period.

  3. GCP Autoscale terminates the marked Cloud Firewall Gateway instance only after all active sessions have completed or the timeout has expired.

This approach minimizes user disruption and ensures in-flight sessions are preserved wherever possible.

Connection Draining During Instance Reboot

When preparing to reboot a Cloud Firewall Gateway instance in a GCP Managed Instance Group (MIG), make sure the new connections are no longer forwarded to this instance and existing connections are gracefully completed.

To achieve this, the Cloud Firewall MIG instance should be configured to stop responding to health check probe requests from the GCP Load Balancer. This can be done by disabling the health probe port (usually, 8117) using the kernel variable: cloud_balancer_port.

When the Cloud Firewall MIG instance stops responding to health probes, the Load Balancer marks the instance as unhealthy and stops routing new connections to it.

Existing TCP connections remain active and continue to be processed by the Cloud Firewall Gateway until they either complete naturally or are terminated due to session timeouts.

This allows in-flight sessions to complete without interruption, even though the instance is no longer considered healthy.

Note - The Connection draining timeout setting on the Load Balancer does not apply in this scenario. When an instance is marked unhealthy, it is removed from serving new traffic based on the health check status, not through the connection draining mechanism. As a result, whether existing connections are dropped or allowed to persist depends on the application behavior and broader network conditions.

Follow these steps for a graceful reboot:

  1. Connect through SSH to the Cloud Firewall MIG instance that needs maintenance or reboot.

  2. Switch to Expert mode.

  3. To view the current number of active connections, run:

    fw tab -t connections -s

  4. Check the current Load Balancer health probe port number with this command:

    fw ctl get int cloud_balancer_port

    The default port number is 8117.

  5. Disable Cloud Firewall MIG instance responses to Load Balancer health probes. This stops the creation of new connections. For that, run

    fw ctl set int cloud_balancer_port 0

  6. Monitor traffic drain with this command:

    fw tab -t connections -s

    Wait until the number of active connections decreases to a minimum.

    Note - System connections usually persist, so the value will never be 0.

  7. When connections have been reduced, proceed with reboot or other planned maintenance on the instance.

  8. To return the instance to service after maintenance, re-enable the health probing with this command:

    fw ctl set int cloud_balancer_port 8117

Note - After the instance is rebooted, the health probe port automatically starts responding again. To prevent automatic probe recovery, set cloud_balancer_port=0 in the kernel parameters configuration file (kern.conf).

Refer to sk26202 for instructions on modifying the kernel configuration file.

Connection Draining During Instance Termination

When a Cloud Firewall Gateway instance in a GCP Managed Instance Group (MIG) terminates abruptly, it is immediately removed from service. As a result:

  • The GCP Load Balancer does not have the opportunity to mark the Cloud Firewall MIG instance as unhealthy.

  • New connections are no longer routed to the Cloud Firewall MIG instance after it is removed from the backend service pool.

  • All existing TCP/UDP connections are instantly dropped, regardless of their state or duration of inactivity period.

Graceful Termination

To minimize service disruptions, we recommend using graceful termination and draining connections of the Cloud Firewall MIG instance before initiating termination. For that, you need to:

  • Manually disable the health probe responses on the Cloud Firewall MIG instance.

  • Wait for connections to drain based on the session timeout.

  • Only proceed with termination when the number of active connections is minimal.

Follow these steps:

  1. Connect to the Cloud Firewall MIG instance via SSH.

  2. Switch to Expert mode.

  3. To view the current number of active connections, run:

    fw tab -t connections -s

  4. Check the current Load Balancer health probe port number with this command:

    fw ctl get int cloud_balancer_port

    The default port number is 8117.

  5. DisableCloud Firewall MIG instance responses to Load Balancer health probes. This stops the creation of new connections. For that, run

    fw ctl set int cloud_balancer_port 0

  6. Monitor traffic drain with this command:

    fw tab -t connections -s

    Wait until the number of active connections decreases to a minimum.

    Note - System connections usually persist, so the value will never be 0.

  7. In the GCP portal, go to Instance Group> target MIG, and select the Cloud Firewall MIG instance you want to terminate. Click Delete.