191
Vanishing power feeds, UPS batteries, failover fails... Cloudflare explains that two-day outage
(www.theregister.com)
This is a most excellent place for technology news and articles.
This is the best summary I could come up with:
Cloudflare's main network and security duties continued as normal throughout the outage, even if customers couldn't make changes to their services at times, Prince said.
We're told by Prince that "counter to best practices, Flexential did not inform Cloudflare that they had failed over to generator power," and so didn't have a heads up that maybe things were potentially about to go south and that contingencies should be in place.
Whatever the reason, a little less than three hours later at 1140 UTC (0340 local time), a PGE step-down transformer at the datacenter – thought to be connected to the second 12.47kV utility line – experienced a ground fault.
By that, he means at 1144 UTC - four minutes after the transformer ground fault – Cloudflare's network routers in PDX-04, which connected the cloud giant's servers to the rest of the world, lost power and dropped offline, like everything else in the building.
At this point, you'd hope the servers in the other two datacenters in the Oregon trio would automatically pick up the slack, and keep critical services running in the absence of PDX-04, and that was what Cloudflare said it had designed its infrastructure to do.
The control plane services were able to return online, allowing customers to intermittently make changes, and were fully restored about four hours later from the failover, according to the cloud outfit.
The original article contains 1,302 words, the summary contains 228 words. Saved 82%. I'm a bot and I'm open source!