Google opens up about reason for yesterday's Gmail outage.
Google's web-based email Gmail service is back online, after being knocked down for almost two hours yesterday.
"We know how many people rely on Gmail for personal and professional communications, and we take it very seriously when there's a problem with the service. Thus, right up front, I'd like to apologise to all of you — today's outage was a Big Deal, and we're treating it as such. We've already thoroughly investigated what happened, and we're currently compiling a list of things we intend to fix or improve as a result of the investigation,"Ben Treynor, Gmail's vice president of engineering and site reliability czar, said on the official Gmail blog.
Google has also chosen to be quite open about what exactly happened to cause the outage, saying that the problem stemmed from a routine server upgrade that took much longer than planned and knocked the service online.
The blog post continued: "However, as we now know, we had slightly underestimated the load which some recent changes (ironically, some designed to improve service availability) placed on the request routers — servers which direct web queries to the appropriate Gmail server for response. At about 12:30 pm Pacific a few of the request routers became overloaded and in effect told the rest of the system "stop sending us traffic, we're too slow!". This transferred the load onto the remaining request routers, causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded. As a result, people couldn't access Gmail via the web interface because their requests couldn't be routed to a Gmail server. IMAP/POP access and mail processing continued to work normally because these requests don't use the same routers."
Earlier yesterday, Gmail's engineering director David Besbris said that the issue was more than a minor issue, and apologised for the downtime. He also advised: "If you have IMAP or POP set up already, you should be able to access your mail that way in the meantime."
But Google claims it is determined not to let a similar incident happen again.
"We'll be hard at work over the next few weeks implementing these and other Gmail reliability improvements - Gmail remains more than 99.9% available to all users, and we're committed to keeping events like today's notable for their rarity," Treynor added.