Load Balancer Memory Leak – A Postmortem

Some applications deployed with Nanobox have been affected by a memory leak in their load balancer platform component. The cause of the leak has been identified and patched and there are some steps you may need to take to apply the update. For information about the project behind your load balancer and the cause of the leak, read on. If you're just looking to apply the patch, skip to the update instructions.

Portal

Every app deployed with Nanobox is deployed with a load balancer platform component. The project underneath the hood of your app's load balancer is Portal, an open source, API-driven, in-kernel, layer 2/3 load balancer maintained under the Nanopack organization.

Portal handles all requests on ports 80 and 443, terminates SSL, and proxies those requests to your web components on port 8080. If using multi-node web component(s), it will "round robin" requests between nodes in the cluster.

The Memory Leak

Portal is written in Go. Requests passing through Portal are processed using goroutines, lightweight threads of execution. Each goroutine receives a request, appends X-Forwarded-For and X-Forwarded-Proto headers, then sends the request on its way. In some instances, the client would disconnect partway through, either due to a network interruption or the request simply being terminated before it was complete. When this happened, the goroutine would hold the connection open waiting for the request to finish, resulting in stale, unused connections unnecessarily consuming memory.

The memory leak has been resolved by adding goroutine timeouts to Portal that automatically close stale connections.

  • 5 second read-header timeout
  • 2 minute idle timeout

Updating Portal

If you have updated your load balancer or your app was created after January 12, 2018, you're already using the new version of Portal. If you're unsure, you can console into your Portal container and check the version:

# Console into your app's portal container
nanobox console mesh.data.portal

# Check the Portal version
portal -v

As of today, February 1st, 2018, the most recent version of Portal is:

portal 0.1.1 (c72f282)

Important: The Portal update process does require downtime as data is migrated between old and new Portal nodes. This typically only takes a few seconds.

To update to the most recent version of Portal that includes the memory leak fix, click on "Platform" in your application dashboard, then "Manage" under your Load Balancer component.

Manage Load Balancer

Click "Admin" under the mesh.data.portal component and select the "Update" option.

Update mesh.data.portal

Nanobox will rebuild your load balancer using the latest version of Portal.

Posted in Updates, Postmortem