logs.fr.cloud.gov may return inconsistent results
Scheduled Maintenance Report for cloud.gov
Postmortem

What happened

As standard procedure for operating cloud.gov, we have two ways of storing customer application logs, and normally during scheduled upgrades of the logs front end (https://logs.fr.cloud.gov), the system temporarily saves logs only to the secondary backup system. We then use that backup to restore the log information in the logs front end.

During our scheduled upgrade of the logs front end on 8/10/17, we lost approximately 20 minutes of customer application logs between 19:25 and 19:45 EDT.

This happened because the upgrade triggered a defect in how we configured our secondary backup system. As a result we did not have the backup logs to restore from the 20 minute period during the upgrade.

The upgrade also caused a database schema issue that caused the log viewer to not be able to display aggregate log data. We reindexed the data to fix that.

What we’re doing

We fixed our log backup process on 8/11/17 and have scheduled work to improve monitoring and alerting for our backup process. This means that if something else related to this logging system goes wrong and we start losing logs, the team will immediately notice the new problem and be able to take action to resolve the issue much more quickly.

For future upgrades to the logs front end, we will more thoroughly review and test upstream changes.

Posted Aug 23, 2017 - 10:46 EDT

Completed
The aggregate views on logs.fr.cloud.gov are no longer restricted to August 2017 and again allow aggregating across all available logs.
Posted Aug 16, 2017 - 15:58 EDT
Verifying
We completed our scheduled update to logs.fr.cloud.gov, and new logs are being indexed.

However, we lost about 20 minutes of customer application log data from approximately 19:25 to 19:45 EDT during the upgrade.

Also, the default logs dashboard is currently only displaying logs from August 2017. Logs prior to August are intact, and they will be displayed again on the default dashboard by Monday, August 14. You can still view these logs in logs.fr.cloud.gov by searching for them, but they will not appear in aggregated views such as the dashboard summary.

We apologize for the loss of log data and for the inconvenience of reduced visibility into older logs. We will update this status post once complete aggregations are restored.

We will also update this status post with a postmortem explaining our analysis of causes and our steps to prevent those two side-effects from happening again.
Posted Aug 10, 2017 - 23:46 EDT
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Aug 10, 2017 - 19:00 EDT
Scheduled
During this maintenance window, viewing and searching data in the logs front end (logs.fr.cloud.gov) may return inconsistent and incomplete log results.

Logs will be stored throughout the period, and all log data will be indexed and available in search results when we’ve completed this maintenance.
Posted Aug 09, 2017 - 13:43 EDT