Google Apps Incident Report Gmail Outage - April 17, 2012 Prepared for Google Apps for Business customers
The following is the incident report for the Gmail outage on April 17, 2012. We understand this service issue has affected our valued customers and users, and we apologize for the impact. Issue Summary From 9:09 AM PDT to 10:45 AM PDT, the affected Gmail users experienced: ●
Error messages while signed in. These messages were 700 series error messages in the Gmail interface such as, “the system encountered a problem” and ”Retrying.”
●
Sign-in and access issues. These include the inability to sign in (receiving 500 series errors) or being unable to interact with the Gmail interface. Some users also experienced delays in incoming message delivery for a brief period after their access was restored.
This incident affected 1.4% of the Gmail user base. During the course of the event, the Apps Status Dashboard specified at times that up to 2%, or up to 10%, of users were affected. This was due to uncertainty at the time regarding the number of users affected and Google's philosophy of being as transparent as possible about service performance. Actions and Root Cause Analysis At 9:28 AM PDT, Google Engineering began receiving alerts indicating internal tests were failing. Simultaneously, customers started reporting the issues to Google Enterprise Support. The root cause was a misconfiguration that occurred during a routine capacity upgrade. This misconfiguration prevented changes to existing customer data for upgraded users. As designed, message processing stopped to avoid any potential for data loss or corruption. At 9:46 AM PDT, Google Engineering identified the misconfiguration, and began reverting it and restarting the corresponding servers at 10:10 AM PDT. This started to resolve users’ access and interface problems; by 10:45 AM PDT, Gmail services returned to normal for all affected users. Corrective and Preventative Measures The Google Engineering team conducted an internal review and analysis and is performing the following actions to help address the underlying cause of the problem and prevent recurrence: Prevention ● ● ● ●
Reduce and isolate the effects of this class of configuration issue. Improve the system responsible for managing configurations. Add additional safeguards to identify configuration errors before release. Enhance internal documentation for configuration management.
Detection and speed of recovery ● Implement additional monitoring to catch earlier failure symptoms in production. ● Eliminate the need for server restarts to recover from this type of error. We appreciate your patience and again apologize for the impact to your organization. We thank you for your business and continued support.
This misconfiguration prevented changes to existing customer data for upgraded users. ... Eliminate the need for server restarts to recover from this type of error.
At 7:50 AM PT | 16:00 UTC November 15, Google Calendar Engineering brought a system of servers ... your business and continued support during this time.
Feb 27, 2011 - Google Engineering reverted the storage software update, and halted ... better identify this class of bug during the software development cycle.
We understand this service issue has impacted our valued customers and users, and we apologize to everyone ... At 6:12 AM PDT, a bug in a thirdparty software update caused a partial failure of a Google network router in ... escalated the software iss
Mar 18, 2013 - service disruption was an issue in the network control software. Actions and Root Cause Analysis. At 6:09 AM PT, a portion of Google's network ...
Mar 17, 2014 - Issue Summary. From 8:35 AM to 12:10 PM PT, Google Talk, Google Hangouts (including Chat and Video), Google. Voice, and the Google App ...
Apr 17, 2013 - The following is the incident report for the Google services access ... Talk, Google Sync, the Admin panel, and the Cloud Console, and to a ...
Mar 19, 2013 - Applications using the Google Drive and Docs APIs also returned errors. ... We thank you for your business and continued support. Sincerely,.
Feb 27, 2011 - Google Engineering reverted the storage software update, and halted further deployment. Restoration Process. While analyzing the issue and its root cause, Google Engineering also worked on the process to restore users' accounts. At 6:0
Google Drive list. Applications using ... The Google Engineering team conducted an internal review and analysis of the March 21 event. They ... Modify the Drive software to more reliably serve user requests during short periods where overall.
Google Apps Incident Report. Google Docs Outage - September 7, 2011. Prepared for Google Apps for Business customers. The following is the incident report ...
Google Apps Incident Report. Gmail Outage - September 23, 2011. Prepared for Google Apps for Business customers. The following is the incident report for the ...
Apr 17, 2013 - The following is the incident report for the Google services access disruption that occurred on. April 17 ... Talk, Google Sync, the Admin panel, and the Cloud Console, and to a lesser extent Groups,. Sites, and ... misconfiguration oc
Google Docs Outage - September 7, 2011. Prepared for Google Apps for Business customers. The following is the incident report for the Google Docs access ...
Sep 1, 2009 - On Tuesday, September 1, a small portion of Gmail's web capacity was taken ... request routing automatically directs users' requests to available servers. ... Over the next few weeks, we are dedicated to implementing these ...
Sep 25, 2009 - Between 7:00 AM - 9:50 PDT | 14:00 - 16:50 GMT, Thursday September 24, Google Apps users were unable to access the Contacts feature through the Gmail interface. However, they could view their contacts at an alternate URL. During this p
Sep 25, 2009 - Prepared for Google Apps Premier Edition Customers. Incident ... add users to their Google Apps accounts. ... business and continued support.
Sep 1, 2009 - server. Gmail processing and access through the IMAP/POP interfaces ... Over the next few weeks, we are dedicated to implementing these ...
Mar 16, 2010 - resources for Gmail routing and greatly increased the number of active Gmail routers. Following an internal investigation and analysis, the ...
Engineering was made of aware of the problem and promptly began to work to manage excessive traffic ... your business and continued support during this time.
from the Google Engineering team traced this problem to new code introduced at 3:00 PM PDT |. 22:00 UTC August 19. The Google Engineering team repaired ...
Feb 24, 2009 - The root cause of the problem was a software bug that caused an ... we monitor our systems 24 x 7, we have engineers available to analyze.