cloud.gov Status - Incident History

GSA SSO Maintenance

2024-11-13T10:19:46-05:00

Nov 13, 10:19 EST
Completed - The scheduled maintenance on the GSA IDP has been completed.

Nov 13, 09:30 EST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.

Nov 6, 10:36 EST
Scheduled - This scheduled maintenance effects GSA Customers only.

For GSA users of cloud.gov the platform, cloud.gov is migrating the GSA IDP used for cloud.gov login from GSA SecureAuth to GSA Auth.

What does this mean for you:

GSA Customers who continue to use the cloud.gov platform after this maintenance will need to complete their login going forward using GSA Auth. GSA Auth is replacing the soon to be retired GSA SecureAuth platform.

What do GSA Customers need to do:

GSA Customers need to sign-up for their Production GSA Auth accounts at https://auth.gsa.gov if they have not done so already. If you have already have an account with GSA Auth then this transition will be relatively seamless with the only change being the GSA login experience for you once cloud.gov makes this change.

If GSA Customers on the platform are using cloud.gov UAA services to provide GSA IDP authentication for their apps, these customers should communicate this change to their end users and have those users make sure to sign-up for a GSA Auth account before this change.

All Other Customers:

This only effects GSA users of the platform. All other customers using their agency IDP or the cloud.gov IDP are not effected by this maintenance.

Customer GitHub actions referencing cloud-gov/cg-cli-tools failing

2024-10-03T17:00:00-04:00

Oct 3, 17:00 EDT
Resolved - As part of a broader project, Cloud.gov renamed the GitHub repository cloud-gov/cg-cli-tools, which hosts a GitHub Action that customers can use to deploy applications to Cloud.gov. When a repository is renamed, GitHub redirects most requests from the old repository name to the new one. However, GitHub Actions does not automatically redirect calls to an action hosted by a renamed repository. This caused Actions referencing this repository to fail. The original repository name has been restored and use of the Action should return to normal.

cloud.gov IDP maintenance

2024-10-03T09:00:02-04:00

Oct 3, 09:00 EDT
Completed - The scheduled maintenance has been completed.

Oct 3, 08:30 EDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.

Oct 1, 11:16 EDT
Scheduled - cloud.gov will be doing maintenance on the internal IDP (login) system on Thursday October 3, 2024 at 08:30 AM EDT

Why is this maintenance being done?

To keep the platform secure, SSL certificates are used to encrypt communications between different pieces of the platform. Certificates have known start and end expiration dates and need to be rotated on a regular basis.

Who is affected by this maintenance?

This maintenance will ONLY affect those customers that login using the cloud.gov IDP option.

Customers using the census.gov, DOJ.gov, ed.gov, EPA.gov, FDIC.gov, fed.gov, frtib.gov, GSA.gov, mcc.gov, nih.gov, OMB.gov, onrr.gov, usaide.gov, and SSA.gov IDPs are NOT affected during this window.

Please note that cloud.gov IDP customers that login to the system earlier in the day will not be affected during the window but only customers trying to login during the window.

Questions? Contact support@cloud.gov

GSA SecureAuth outage

2024-09-24T14:42:37-04:00

Sep 24, 14:42 EDT
Resolved - SecureAuth availability has been restored. GSA users should be able to sign into cloud.gov. The cloud.gov team will continue to monitor for any further incidents.

Sep 24, 14:28 EDT
Investigating - GSA SecureAuth is currently experiencing an outage. GSA customers who log into cloud.gov via SecureAuth may see the error "Invalid User" when attempting to sign in. We are monitoring the outage and will post updates as available.

Cloud Foundry cf CLI GPG error

2024-09-11T14:51:25-04:00

Sep 11, 14:51 EDT
Resolved - The Cloud Foundry team has resolved the issue and apt installations are now working normally.

https://github.com/cloudfoundry/cli/issues/3194#issuecomment-2344076985

Sep 11, 10:41 EDT
Investigating - Customers who install the Cloud Foundry (cf) CLI using the apt package manager are encountering the following error:

W: GPG error:
https://cf-cli-debian-repo.s3.amazonaws.com/
stable InRelease: The following signatures were invalid: EXPKEYSIG 172B5989FCD21EF8 CF CLI Team
[cf-cli-eng@pivotal.io](mailto:cf-cli-eng@pivotal.io)
E: The repository '
https://cf-cli-debian-repo.s3.amazonaws.com/
stable InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

This issue is being tracked upstream with this issue: https://github.com/cloudfoundry/cli/issues/3194

The cloud.gov team recommends installing the CLI using an alternate installation method while the Cloud Foundry team works to resolve the issue. The latest releasees of the CLI are available on GitHub: https://github.com/cloudfoundry/cli/releases

Intermittent issues connecting to S3 buckets

2024-09-03T10:27:45-04:00

Sep 3, 10:27 EDT
Resolved - After monitoring this incident over the weekend, we feel confident that the issues are now resolved. If you continue to experience issues, please contact us at support@cloud.gov.

As with all incidents, the cloud.gov team will be conducting a post-mortem analysis of this incident and publishing our findings in the coming days.

Thank you for being a cloud.gov customer!

Aug 30, 18:52 EDT
Monitoring - We believe we have identified the source of the problem. Our "trusted_local_networks_egress" security group, which allows applications to connect to S3, was not allowing egress to all of the possible IP ranges for S3.

We have updated the "trusted_local_networks_egress" to allow egress to all of the IP ranges for S3 published by AWS.

In our testing, it seems that the updated egress IP ranges have resolved the issue, but we will continue to monitor and update this page as necessary.

Aug 30, 18:04 EDT
Investigating - Some customers are experiencing intermittent "connection refused" errors when their apps are connecting to S3 buckets.

We are investigating the issue and coordinating with AWS support. We will update this incident as discover further information

Cloud Foundry Database Upgrade

2024-08-07T09:33:31-04:00

Aug 7, 09:33 EDT
Completed - The scheduled maintenance has been completed.

Aug 7, 09:00 EDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.

Jul 24, 12:48 EDT
Scheduled - We’re planning routine maintenance for the cloud.gov API to upgrade the underlying database.

This DOES NOT impact your running user-facing applications. All applications and their databases will continue to run as normal.

During this maintenance window, any developer requests that use the cloud.gov API may not work, including:

* CF command-line interface (CLI) commands
* cloud.gov dashboard actions
* cloud.gov API requests

We will send out a notice once the upgrade is complete and developer requests are functional again.

If you have any questions or concerns, please contact us at support@cloud.gov

Outage for online CF buildpacks for new application pushes

2024-07-10T10:03:42-04:00

Jul 10, 10:03 EDT
Resolved - The incident has been resolved. Applications which leverage online buildpacks at buildpacks.cloudfoundry.org will now build. Again, thank you for your patience as we worked this problem to resolution.

Jul 10, 09:45 EDT
Investigating - Details: An issue with an upstream dependency affecting new Cloud Foundry (CF) applications was identified around 9 AM ET on July 10, 2024. This issue is causing failures when attempting to push new applications that utilize online buildpacks pointing to ‘buildpacks.cloudfoundry.org’.

Current Status: We are actively monitoring the situation and are in communication with the group responsible for the CDN and this URL.

Impact: Currently running applications are not impacted. Customers may experience failures when pushing new applications that depend on the affected buildpacks.

Next steps: We will provide updates as soon as more information becomes available and as the situation progresses towards resolution. We apologize for any inconvenience caused and appreciate your understanding and patience as we work to resolve this issue.

Some logins to cloud.gov slow or failing

2024-06-06T08:30:00-04:00

Jun 6, 08:30 EDT
Resolved - Some attempts to log in to cloud.gov took longer than usual or failed due to time-out during this time. The cloud.gov team became aware of the issue at 11:45a ET and implemented a fix at 12:10p ET. We will continue to monitor the system for further issues.

Ongoing DDoS attacks

2024-04-23T14:13:44-04:00

Apr 23, 14:13 EDT
Resolved - While we have detected additional DDoS attacks against the platform over the last week, there have been no additional platform outages, so we are resolving this incident.

As per our usual process, in the next few days the cloud.gov team plans to hold a retrospective on all of the DDoS incidents for the platform over the past two weeks. Once the retrospective is complete, we will publish our post-mortem analysis of the incidents, including lessons learned and planned improvements to the platform.

As always, thank you for being a cloud.gov customer!

Apr 15, 16:22 EDT
Update - From 3:51 PM ET to 3:56 PM ET, we detected another large-scale DDoS attack against the platform. Thanks to the currently deployed mitigations, the platform did not experience a full outage, but customers may have experienced elevated error rates from their applications.

Apr 15, 11:45 EDT
Monitoring - Incident Summary

Throughout the past week, our platform has been subject to ongoing Distributed Denial of Service (DDoS) attacks, as evidenced by the previous StatusPage updates.

Thanks to our security measures and platform automation, the platform was able to recover from those attacks in under five minutes, so we immediately marked all of those previous incidents as “Resolved”. To be clear, even though we considered those incidents resolved, our investigation into their causes and how to mitigate them more effectively remains ongoing.

Since these attacks are still ongoing and varying in scale, it is possible there could be further disruptions to our platform. To centralize and to improve our communications on these incidents, we will leave this particular incident open and will update it with announcements of any further outages or implemented mitigations.

Incident Details - DDoS attack

Time Detected: 4/15/2024 8:45 AM ET
Duration: Around 2 minutes
Impact: Users may have experienced slow response times or elevated rates of 502 error responses during the attack.
Resolution: Our automated DDoS protection systems quickly identified and mitigated the attack, restoring normal service operations without significant impact.

Actions Taken

In response to these ongoing attacks, we have implemented changes to the scaling of our platform infrastructure and the way that malicious traffic is intercepted.

Since these measures are being deployed actively in response to ongoing attacks, we cannot specify exactly what they are, but hopefully once these attacks subside we can provide further clarity.

Next Steps

We will continue to monitor our systems closely and adjust our security measures as needed. We will keep our users updated on any relevant developments or preventive measures being implemented.

Once the attacks have subsided or have been sufficiently mitigated, our team will conduct a post-mortem analysis of these incidents in order to identify any potential improvements to our security posture or our incident response techniques and processes.

We will publish a summary of our post-mortem with the findings of our investigation once it is complete.

Acknowledgment

We appreciate your understanding and patience during this incident. The swift resolution of this DDoS attack underscores our commitment to providing a secure and reliable platform. If you have any concerns or questions, please do not hesitate to contact our support team at support@cloud.gov.

Thank you for your continued trust in cloud.gov.

DDoS outage and CDN-based traffic outage

2024-04-12T15:12:04-04:00

Apr 12, 15:12 EDT
Resolved - Incident Summary

Throughout this week, our platform has been subject to massive, coordinated Distributed Denial of Service (DDoS) attacks.

Today, on 4/12/2024, our platform experienced another DDoS attack that took our platform down for around 5 minutes.

Thanks to our security measures and platform automation, we were able to fully recover and mitigate the effects of the DDoS attack in under five minutes.

Unfortunately, while deploying additional mitigations for the underlying source of the DDoS attacks, there was an interruption to all traffic coming into our platform from a CDN, including traffic for cloud.gov Pages customers.

Incident Details - DDoS outage

Time Detected: 4/12/2024 1:54 PM ET
Duration: Around 5 minutes
Impact: Users may have experienced slow response times or temporary inability to access our services during the attack.
Resolution: Our automated DDoS protection systems quickly identified and mitigated the attack, restoring normal service operations without significant impact.

Incident Details - CDN-based traffic outage

Time Detected: 4/12/2024 2:25 PM ET
Duration: Around 5 minutes
Impact: All customers whose traffic passes through a CDN, including cloud.gov Pages customers or users of brokered CDN services, experienced a full outage of their services.
Resolution: We manually reverted the change which caused CDN-based traffic to be rejected and also reverted the change in the infrastructure source code, so that the change will not be deployed again.

Actions Taken

Our DDoS mitigation tools were activated to rate limit malicious traffic, allowing the platform to recover from the initial DDoS attack.

Our security team is conducting a thorough investigation into the attack to understand its origins and to prevent similar incidents in the future.

Next Steps

We will continue to monitor our systems closely and adjust our security measures as needed. An in-depth review of this incident is being conducted to identify any potential improvements to our security posture. We will keep our users updated on any relevant developments or preventive measures being implemented.

We will publish a post-mortem with the findings of our investigation in the coming days.

Acknowledgment

We appreciate your understanding and patience during this incident. The swift resolution of this DDoS attack underscores our commitment to providing a secure and reliable platform. If you have any concerns or questions, please do not hesitate to contact our support team.

Thank you for your continued trust in cloud.gov.

DDoS incident and platform outage

2024-04-11T18:13:12-04:00

Apr 11, 18:13 EDT
Resolved - Incident Summary
On 4/11/2024, our platform experienced a Distributed Denial of Service (DDoS) attack that briefly impacted our services. We want to assure our users that the security and reliability of our platform are of utmost importance. Thanks to our robust security measures and platform automation, we were able to fully recover and mitigate the effects of the DDoS attack in under two minutes.

Incident Details
Time Detected: 4/11/2024 4:08 PM ET
Duration: Less than 2 minutes
Impact: Users may have experienced slow response times or temporary inability to access our services during the attack.
Resolution: Our automated DDoS protection systems quickly identified and mitigated the attack, restoring normal service operations without significant impact.

Actions Taken
Immediate mitigation: Our DDoS mitigation tools were activated to rate limit malicious traffic, allowing the platform to recover.
Investigation: Our security team is conducting a thorough investigation into the attack to understand its origins and to prevent similar incidents in the future.

Next Steps
We will continue to monitor our systems closely and adjust our security measures as needed. An in-depth review of this incident is being conducted to identify any potential improvements to our security posture. We will keep our users updated on any relevant developments or preventive measures being implemented.

We will publish a post-mortem with the findings of our investigation in the coming days.
Acknowledgment
We appreciate your understanding and patience during this incident. The swift resolution of this DDoS attack underscores our commitment to providing a secure and reliable platform. If you have any concerns or questions, please do not hesitate to contact our support team.

Thank you for your continued trust in cloud.gov.

Partial Platform Outage

2024-04-08T20:00:00-04:00

Apr 8, 20:00 EDT
Resolved -

Incident Summary:

On 4/7/2024 and 4/8/2024, our platform experienced a Distributed Denial of Service (DDoS) attack that briefly impacted our services. We want to assure our users that the security and reliability of our platform are of utmost importance. Thanks to our robust security measures and platform automation, we were able to fully recover and mitigate the effects of the DDoS attack in under two minutes.

Incident Details:

Time Detected: 4/7/2024 6:30 PM & 4/8/2024 7:45/8:10 PM ET
Duration: Each of the 3 events lasted less then 2 minutes each
Impact: Users may have experienced slow response times to access our services during the attack.
Resolution: Our automated DDoS protection systems quickly identified and mitigated the attack, restoring normal service operations without significant impact.

Actions Taken:

Immediate Mitigation: Our DDoS mitigation tools were activated to filter out malicious traffic, allowing legitimate user traffic to continue unaffected.
Investigation: Our security team is conducting a thorough investigation into the attack to understand its origins and to prevent similar incidents in the future.

Next Steps:

We will continue to monitor our systems closely and adjust our security measures as needed. An in-depth review of this incident is being conducted to identify any potential improvements to our security posture. We will keep our users updated on any relevant developments or preventive measures being implemented.

Acknowledgment:

We appreciate your understanding and patience during this incident. The swift resolution of this DDoS attack underscores our commitment to providing a secure and reliable platform. If you have any concerns or questions, please do not hesitate to contact our support team.

Thank you for your continued trust in cloud.gov

4/8/2024 DDoS

2024-04-08T10:30:00-04:00

Apr 8, 10:30 EDT
Resolved -

Incident Summary:

On 4/8/2024, our platform experienced a Distributed Denial of Service (DDoS) attack that briefly impacted our services. We want to assure our users that the security and reliability of our platform are of utmost importance. Thanks to our robust security measures and platform automation, we were able to fully recover and mitigate the effects of the DDoS attack in under two minutes.

Incident Details:

Time Detected: 4/8/2024 10:15 AM ET
Duration: Less than 2 minutes
Impact: Users may have experienced slow response times or temporary inability to access our services during the attack.
Resolution: Our automated DDoS protection systems quickly identified and mitigated the attack, restoring normal service operations without significant impact.

Actions Taken:

Next Steps:

Acknowledgment:

Thank you for your continued trust in cloud.gov

Out of memory issues in running and staging apps

2024-03-26T15:59:52-04:00

Mar 26, 15:59 EDT
Resolved - Since rolling out stemcell version 1.404 to the platform last week, we have received no further reports of out of memory issues and our own internal metrics show a decline in these errors, so this incident is resolved.

If you are still experiencing issues with your applications, please contact support@cloud.gov.

Mar 19, 10:42 EDT
Update - After further testing and debugging, members of the CloudFoundry community were able to isolate the cause of the "out of memory" issues to an incompatibility between Linux cgroups v1 and version 6.5 of the Linux kernel, both of which were used by the latest stemcells. cgroups are a process isolation mechanism often used to manage container processes, including customer applications running on cloud.gov.

To fix the out of memory issues, the CloudFoundry community has released a new stemcell version, 1.404, which uses version 5.15 of the Ubuntu Jammy kernel. Version 5.15 is the long-term supported release of Ubuntu Jammy (https://ubuntu.com/about/release-cycle#ubuntu-kernel-release-cycle), so this release will continue to receive security patches and other fixes.

We are rolling out stemcell version 1.404 to the platform today and expect to see a reduction in memory use across our platform, including the possible resolution of all "out of memory" issues for customer applications.

Even though we only expect these changes to benefit our platform and our customers, we will still be closely monitoring our platform for stability as we roll out these changes. If you experience any issues, don't hesitate to contact us at support@cloud.gov.

Feb 20, 12:04 EST
Update - Since the changes that were deployed last week to increase the number of VMs available to host customer applications and to double the amount of memory available for staging operations, customers have reported a reduction in the frequency of "out of memory" issues, but are still experiencing them.

The cloud.gov team has continued to investigate the cause of these issues. After consulting with the CloudFoundry community, we believe that these issues may be caused by faulty memory allocation in the Linux kernel which is built-in to the stemcells for CloudFoundry VMs. This GitHub issue is being used to track investigation and resolution of the stemcell memory issues: https://github.com/cloudfoundry/bosh-linux-stemcell-builder/issues/318

One of the recommendations from the community to resolve the "out of memory" issues was to roll back to stemcell version 1.340, however every stemcell release includes fixes for a number of CVEs (https://github.com/cloudfoundry/bosh-linux-stemcell-builder/releases), so rolling back would expose the platform and our customers to CVEs that are patched in the current stemcell version. At this time, cloud.gov does not plan to roll back our stemcell version given the potential security risk.

Another recommendation from the community was to increase the amount of memory available for staging, which we did last week when we increased the value from 1024 MB to 2048 MB, but customers continue to experience issues.

At this point, the plan for mitigation is to pursue ad hoc memory increases for applications that are still experiencing issues until a fix for the kernel/stemcell issue is released from upstream and can be deployed to our platform.

If your applications are still experiencing issues, please contact us at support@cloud.gov so we can work to resolve them for you.

Thank you for being a cloud.gov customer!

Feb 13, 14:37 EST
Update - The cloud.gov team is continuing to investigate the causes of "out of memory" errors that are being seen for some customer applications.

In order to address these errors, at approximately 10:48 AM ET, the cloud.gov team deployed two changes to our production environment:

- Increased the number of VMs available to host customer applications
- Doubled the amount of memory available for staging applications from 1024 MB to 2048 MB

Customers experiencing "out of memory" errors for their applications should try restaging their applications via 'cf restage' or 'cf restage --strategy rolling' to see if the issue is resolved.

Please contact support@cloud.gov if you have further questions or concerns.

Feb 12, 13:40 EST
Update - Due to an on-going security incident, we have temporarily paused internal cloud.gov platform deployments. This pause will continue to impact the time to resolution for the Out Of Memory (OOM) issue that we are still addressing. We are continuing to mitigate the OOM issue in the meantime.

Please reach out to support@cloud.gov if you are experiencing issues with your applications and we will assist you with mitigations while we work to resolve these incidents.

Feb 8, 18:26 EST
Monitoring - The cloud.gov team has deployed a fix and is monitoring the result. Customers whose applications are failing with memory-related errors should restage their applications with `cf restage` or `cf restage --strategy rolling` and reach out to cloud.gov support via support@cloud.gov if they continue to experience errors.

Feb 8, 12:25 EST
Update - We believe the OOM errors may be caused by a bug in the latest stemcell version pushed to production on January 30. We are deploying an updated version which contains a fix. Deployment is expected to complete after east coast close of business. We will monitor the rollout and post updates as we have them.

Feb 8, 10:54 EST
Investigating - Some applications on cloud.gov have been experiencing intermittent out-of-memory errors while staging or running on the platform starting on January 30. The cloud.gov team is investigating the issue. For apps experiencing OOM errors, 'cf restage' or 'cf restage --strategy rolling' may temporarily resolve the issue.

Log cache component not returning logs

2024-03-08T19:35:28-05:00

Mar 8, 19:35 EST
Resolved - The log cache system has been updated with the renewed certificate. Our testing indicates that real-time logs can now be successfully retrieved using the "cf logs" CLI commands.

As with all incidents, the cloud.gov team will conduct a post-mortem analysis of this incident in the coming days and post our findings here as an update.

Thank you for being a cloud.gov customer!

Mar 8, 17:23 EST
Update - We have renewed the certificate for the log cache component and we have started a full redeployment of our production system to apply the renewed certificates to the log cache.

It may take several hours for the renewed certificate to roll out through the system, but we will post an update once we can confirm the updated certificate has been applied.

Mar 8, 17:01 EST
Identified - We have received reports from customers that using "cf logs" CLI command to retrieve logs from their applications is either not working or not showing recent logs.

Customers have confirmed that real-time logs are still being received in the customer logs Elasticsearch/Kibana instance at https://logs.fr.cloud.gov and are being sent correctly through log drains.

Our team has already identified the possible cause of this issue as an expired certificate for the Log Cache component, which is the component that the "cf logs" CLI command uses to retrieve logs. The certificate expired at approximately 1:18 PM ET. We are working to remediate the issue.

AWS RDS deprecating MySQL 5.7.43 - 5.7.44 - Upgrade your databases

2024-03-01T00:00:56-05:00

Mar 1, 00:00 EST
Completed - The scheduled maintenance has been completed.

Feb 29, 00:00 EST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.

Nov 27, 09:40 EST
Scheduled - AWS RDS is deprecating support for MySQL 5.7, versions 5.7.43 - 5.7.44: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Concepts.VersionMgmt.html.

Please see our website for guidance on how to upgrade your brokered MySQL 5.7 databases to a supported version: https://cloud.gov/2023/06/05/aws-ending-support-mysql-57/.

Any databases that you do not upgrade will get auto-upgraded by AWS in the next maintenance window.

If you have any questions or concerns, please contact support@cloud.gov.

Cloud Foundry Database Upgrade

2024-02-29T09:43:55-05:00

Feb 29, 09:43 EST
Completed - The scheduled maintenance has been completed.

Feb 29, 09:00 EST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.

Feb 21, 14:37 EST
Scheduled - We’re planning routine maintenance for the cloud.gov API to upgrade the underlying database.

This DOES NOT impact your user-facing applications. All applications and their databases will continue to run as normal.

During this maintenance window, any developer requests that use the cloud.gov API will not work, including:

* CF command-line interface (CLI) commands
* cloud.gov dashboard actions
* cloud.gov API requests

We will send out a notice once the upgrade is complete and developer requests are functional again.

If you have any questions or concerns, please contact us at support@cloud.gov.

cloud.gov Pages builds caching error for Jekyll sites

2024-02-27T13:00:34-05:00

Feb 27, 13:00 EST
Resolved - Builds are proceeding as normal. If you experience issues building Jekyll with the error "libffi.so.7: cannot open shared object file", please contact pages-support@cloud.gov for resolution.

Feb 20, 14:25 EST
Monitoring - cloud.gov Pages customers using Jekyll may be experiencing failing builds due to a caching error on our application. We have identified the fix and are resetting the cache for affected sites. Please contact us if you are experiencing an unexpected Jekyll build error

Temporary pause for internal cloud.gov platform deployments due to security incident

2024-02-21T09:26:56-05:00

Feb 21, 09:26 EST
Resolved - This incident has been resolved and all impacted internal development tools are operating as expected.

Feb 12, 13:23 EST
Update - cloud.gov is continuing to manage a security incident. This incident does not impact platform availability or customer access to cloud.gov. We do not believe at this time that this security incident impacts customer data or application security posture.

We are working to resume normal internal pipeline activities. Until fully resolved, this will continue to impact the time to resolution for the Out Of Memory (OOM) issue that we are still addressing. We are continuing to work to fix the OOM issue in the meantime.

Please reach out to support@cloud.gov if you are experiencing issues with your applications and we will assist you with mitigations while we work to resolve these incidents.

Feb 9, 13:47 EST
Monitoring - Cloud.gov has become aware of and is managing a security incident which is not impacting platform availability or customer access to cloud.gov. We do not believe at this time that this security incident impacts customer data or application security posture.

As a security measure, we have temporarily paused internal cloud.gov platform deployments. This pause will impact the time to resolution for the Out Of Memory (OOM) issue that we are still addressing. We are investigating additional mitigations for the OOM issue in the meantime.

Please reach out to support@cloud.gov if you are experiencing issues with your applications and we will assist you with mitigations while we work to resolve these incidents.

AWS RDS deprecating MySQL 5.7.37 - 5.7.42 - Upgrade your databases

2024-01-17T00:00:08-05:00

Jan 17, 00:00 EST
Completed - The scheduled maintenance has been completed.

Jan 16, 00:01 EST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.

Nov 27, 09:39 EST
Scheduled - AWS RDS is deprecating support for MySQL 5.7, versions 5.7.37 - 5.7.42: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Concepts.VersionMgmt.html.

Please see our website for guidance on how to upgrade your brokered MySQL 5.7 databases to a supported version: https://cloud.gov/2023/06/05/aws-ending-support-mysql-57/.

Any databases that you do not upgrade will get auto-upgraded by AWS in the next maintenance window.

If you have any questions or concerns, please contact support@cloud.gov.

Limited support for cloud.gov

2024-01-11T10:29:54-05:00

Jan 11, 10:29 EST
Completed - The scheduled maintenance has been completed.

Nov 20, 10:07 EST
Scheduled - With many cloud.gov team members taking time off for the holiday season between December 25, 2023 and January 1, 2024, customers may experience slower response times when contacting support@cloud.gov. Customers should expect responses to their emails within 1 business day.

Happy holidays!

Failures for application rolling restarts

2024-01-09T11:00:00-05:00

Jan 9, 11:00 EST
Resolved - On January 9 around 11 AM ET, the cloud.gov began deploying an update to CloudFoundry, which is the underlying technology for our platform. About an hour later at 12 PM ET, some customers who were attempting to do a rolling restart of their applications began reporting failures.

After some investigation, the cloud.gov determined that there was a bug in part of the CloudFoundry release which was updating database schemas, which in turn was causing application rolling restarts to fail.

Once the team recognized the issue, they rolled back our deployment of CloudFoundry to the previous stable version. Customers reported that application restarts had stabilized around 2 PM ET.

As an aside, while cloud.gov does have development and staging environments were platform updates are tested before pushing to production, this bug was not caught in those environments because there are no applications that do rolling restarts in those environments.

As a follow up, the cloud.gov team did report this deployment bug to the CloudFoundry team and they have opened an issue to resolve it: https://github.com/cloudfoundry/cloud_controller_ng/issues/3592. The cloud.gov team will keep our deployment of CloudFoundry using the latest known stable version until this bug is fixed.

As always, we apologize for the inconvenience and we will work to ensure a similar incident does not recur in the future.

Thanks for being a cloud.gov customer!

Database upgrades for cloud.gov Pages

2024-01-09T07:30:25-05:00

Jan 9, 07:30 EST
Completed - The scheduled maintenance has been completed.

Jan 9, 07:00 EST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.

Dec 21, 13:13 EST
Scheduled - We will be upgrading the cloud.gov Pages database for general maintenance to the platform. There will be some downtime while the database goes through the upgrade process. We anticipate roughly 5 to 10 minutes and it will only affect the platform application and any running site builds.

This will NOT affect any live customer sites.

Deprecation of node v16 for Pages site builds

2023-12-19T10:00:15-05:00

Dec 19, 10:00 EST
Completed - The scheduled maintenance has been completed.

Dec 19, 09:00 EST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.

Dec 6, 18:32 EST
Scheduled - As part of our ongoing reliability and stability enhancements to cloud.gov Pages, use of node v16 will be deprecated as of December 19, 2023. Builds that specify an unsupported version in .npmrc will fail with the following messages in the build logs:

"Unsupported node major version specified in .nvmrc. Please upgrade to LTS major version 18 or 20, see https://nodejs.org/en/about/releases/ for details."

In addition, the default node version, when one is not specified in .npmrc, will be v18.

More information is available in our documentation at https://cloud.gov/pages/documentation/node-on-pages/.

If you have any questions, please contact pages-support@cloud.gov.