Failures for application rolling restarts
Incident Report for cloud.gov
Resolved
On January 9 around 11 AM ET, the cloud.gov began deploying an update to CloudFoundry, which is the underlying technology for our platform. About an hour later at 12 PM ET, some customers who were attempting to do a rolling restart of their applications began reporting failures.

After some investigation, the cloud.gov determined that there was a bug in part of the CloudFoundry release which was updating database schemas, which in turn was causing application rolling restarts to fail.

Once the team recognized the issue, they rolled back our deployment of CloudFoundry to the previous stable version. Customers reported that application restarts had stabilized around 2 PM ET.

As an aside, while cloud.gov does have development and staging environments were platform updates are tested before pushing to production, this bug was not caught in those environments because there are no applications that do rolling restarts in those environments.

As a follow up, the cloud.gov team did report this deployment bug to the CloudFoundry team and they have opened an issue to resolve it: https://github.com/cloudfoundry/cloud_controller_ng/issues/3592. The cloud.gov team will keep our deployment of CloudFoundry using the latest known stable version until this bug is fixed.

As always, we apologize for the inconvenience and we will work to ensure a similar incident does not recur in the future.

Thanks for being a cloud.gov customer!
Posted Jan 09, 2024 - 11:00 EST