Completed -
The scheduled maintenance has been completed. The rotated SSO root certificate has been switched to be the primary certificate used by cloud.gov UAA SAML integration. The expiring certificate is secondary and will expire tomorrow April 23rd. Customers who dynamically pull SAML metadata using https://login.fr.cloud.gov/saml/metadata, should not be affected.
If you need to trust our SAML provider for your IDP, there are two methods for retrieving the new certificate:
Follow your agency's instructions for consuming and trusting our root certificate for SAML.
Apr 22, 08:30 EDT
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Apr 22, 08:00 EDT
Scheduled -
As part of maintenance of the cloud.gov platform it has come time to rotate the certificate associated with UAA SAML used by various IDPs. We've rotated the SSO root certificate used by cloud.gov IDP Integrators and have rolled it out. The new certificate is now available to be used and the old certificate will expire on Wednesday April 23rd, 2025. On Tuesday April 22nd, 2025, we will swap the primary certificate to the new certificate and keep the old expiring certificate as secondary. Customers who dynamically pull SAML metadata using https://login.fr.cloud.gov/saml/metadata, should not be affected.
If you need to trust our SAML provider for your IDP, there are two methods for retrieving the new certificate:
Resolved -
Status: • The OpenSearch cluster has processed the entire backlog and is now ingesting logs in real time without delay. • All indices are writable and healthy, and write throughput remains stable.
Resolution Details: • We increased disk capacity on the affected data nodes and rebalanced shard allocation to clear the high‐watermark condition. • The cluster’s health is green and all new log events are successfully indexed. • Live log streaming via cf logs APP_NAME continues to work as expected.
Next Steps: • We will keep a heightened watch on disk usage and shard distribution over the next 24 hours to ensure sustained health. • If you notice any further issues with log visibility or performance, please open a support ticket.
Thank you for your patience and apologies for any inconvenience.
Apr 17, 15:42 EDT
Monitoring -
Update – 12:56 PM ET
Status: - Logs are flowing into the OpenSearch cluster again, but indices are still catching up to real time. - Full real-time ingestion is expected to resume within the next few hours. - In the meantime, stream live application logs with: cf logs APP_NAME
----
Technical Details
Durable storage & caching: Application logs are first written to S3 for durability, then passed through a cache before landing in OpenSearch. This two‑step process ensures no data loss even if the cluster becomes temporarily unavailable.
Root cause: Several OpenSearch data nodes exceeded their disk‑usage high watermark. When this threshold is crossed, OpenSearch marks the affected indices as read‑only and rejects new writes.
Mitigation: We increased storage capacity on the affected nodes and rebalanced shard allocation across the cluster. The cluster is now healthy and processing the backlog of cached logs.
----
Next Update: We will continue to monitor cluster health and ingestion rates. Our next status update will be posted by 3:30 PM ET, or sooner if anything changes.
Apr 17, 12:56 EDT
Investigating -
We have noticed that no logs are appearing in OpenSearch for customer logs (https://logs.fr.cloud.gov) after approximately 10:11 AM ET. We are investigating and will provide an update as soon as we know more.
Apr 17, 11:46 EDT