As standard procedure for operating cloud.gov, we have two ways of storing customer application logs, and normally during scheduled upgrades of the logs front end (https://logs.fr.cloud.gov), the system temporarily saves logs only to the secondary backup system. We then use that backup to restore the log information in the logs front end.
During our scheduled upgrade of the logs front end on 8/10/17, we lost approximately 20 minutes of customer application logs between 19:25 and 19:45 EDT.
This happened because the upgrade triggered a defect in how we configured our secondary backup system. As a result we did not have the backup logs to restore from the 20 minute period during the upgrade.
The upgrade also caused a database schema issue that caused the log viewer to not be able to display aggregate log data. We reindexed the data to fix that.
We fixed our log backup process on 8/11/17 and have scheduled work to improve monitoring and alerting for our backup process. This means that if something else related to this logging system goes wrong and we start losing logs, the team will immediately notice the new problem and be able to take action to resolve the issue much more quickly.
For future upgrades to the logs front end, we will more thoroughly review and test upstream changes.