Web sites go down. Circuits fail. Network engineers goof router configs. And few of these outages ever make the nightly news…
But if you happen to be Google and your content constitutes up to 5% of all Internet traffic, people notice. Network engineers around the world frantically email traceroutes to mailing lists. IRC channels fill with speculation (“definitely was a DDoS attack”, “no, a worm”, “it was ISP xxx’s fault!”). And end users Twitter (a lot).
So what does it look like when 5% of the Internet disappears on an otherwise uneventful Thursday morning? The below graph shows average traffic across 10 tier1/2 ISPs in North America from Google’s network (ASN 15169). Outage began roughly at 10:15am and lasted through 12:15pm EDT.
Looking at the data, most large transit providers appear to have been impacted (e.g., Level3, AT&T, etc.). Other providers (e.g. large consumer DSL / Cable) showed no drop in traffic from/to Google.
Looking at BGP (below snapshot is from Arbor’s Routeviews Servers) we see a lot of churn in Google’s BGP routes around the outage timeframe — one prefix I choose at random flaps across half a dozen providers before getting withdrawn.
In a recent official company blog post, Google blamed some combination of airplanes and BGP for the outage.