Increased error rates

Incident Report for cloud.gov

Postmortem

We've identified the root cause of the issue with the increased 500 error rates with S3 after engaging in an investigation into the issue with AWS.  At this time no more errors should be occurring and we’ve received no further reports of them either over the last couple of weeks.

Based on our investigation, we believe the errors and timeouts started occurring because the brokered S3 buckets in the platform exist in a large partition that could not efficiently process all of its keys during a lookup operation.  What this means is that S3 struggled with finding any given bucket in a short period of time, and would intermittently return a timeout error that resulted in the 500 response some customers were experiencing.

AWS has created child partitions that account for our bucket naming scheme so that we no longer have this issue with our existing buckets or any future buckets.

If anyone has any questions or concerns about this, or continues to experience 500 error responses from their S3 bucket(s), please reach out to us at support@cloud.gov.

Posted Jan 06, 2022 - 09:52 EST

Resolved

We've identified the root cause of the issue with the increased 500 error rates with S3 after engaging in an investigation into the issue with AWS. At this time no more errors should be occurring.
Posted Jan 06, 2022 - 09:51 EST

Monitoring

The increased error rate has subsided. We're continuing to monitor and work with AWS to ensure the issue is resolved.
Posted Dec 09, 2021 - 19:42 EST

Identified

We've identified an increased error rate in S3 as the likely cause the error response rate we're seeing, and have engaged AWS for additional support
Posted Dec 09, 2021 - 17:35 EST

Update

We are continuing to investigate this issue.
Posted Dec 09, 2021 - 17:33 EST

Investigating

We're seeing increased 5xx error rates across the platform and are investigating.
Posted Dec 09, 2021 - 16:39 EST
This incident affected: cloud.gov customer applications (Applications, Service - S3) and cloud.gov website.