CloudFlare network suffers weekend collapse
Attempt to protect service from attack results in crash.
Thousands of websites were taken offline over the weekend, after web services provider CloudFlare suffered a router crash that took its entire network down.
The company first acknowledged the issue on Sunday morning, California time, stating "we're experencing (sic) a network-wide issue. Looking into the root cause," on Twitter.
CloudFlare promises to "protect and accelerate" websites.
"Once your website is a part of the CloudFlare community, its web traffic is routed through our intelligent global network. We automatically optimise the delivery of your web pages so your visitors get the fastest page load times and best performance," the company claims on its website.
"We also block threats and limit abusive bots and crawlers from wasting your bandwidth and server resources. The result: CloudFlare-powered websites see a significant improvement in performance and a decrease in spam and other attacks," it adds.
However, an attempt by the company to thwart a Distributed Denial of Service (DDoS) attack resulted in it accidentally crashing its routers, taking all its clients' websites offline.
The company noticed packets between 99,971 and 99,985 long attempting to access customers' DNS servers, which it took as an indication of a DDoS attack. Therefore it wrote a rule to drop the packets, which inadvertently caused all the routers that received it to crash.
In a blog post explaining the outage, CloudFlare said: "Flowspec accepted the rule and relayed it to our edge network. What should have happened is that no packet should have matched that rule because no packet was actually that large. What happened instead is that the routers encountered the rule and then proceeded to consume all their RAM until they crashed."
The manner in which the crashes happened prevented the routers from automatically rebooting, the company added.
The company has said accounts covered by SLAs will get credits and it is still investigating what caused the routers to crash.
Unlocking collaboration: Making software work better together
How to improve collaboration and agility with the right techDownload now
Four steps to field service excellence
How to thrive in the experience economyDownload now
Six things a developer should know about Postgres
Why enterprises are choosing PostgreSQLDownload now
The path to CX excellence for B2B services
The four stages to thrive in the experience economyDownload now