‘Software glitch’ to blame for global Cloudflare outage

Approximately 10% of the internet suffered 502 errors during a chaos-ridden 30-minute outage

CloudFlare

Cloudflare has resolved an issue that led to websites serviced by the networking and internet security firm to show 502 Bad Gateway' errors en masse for half an hour yesterday.

From 2:42pm BST the networking giant suffered a massive spike in CPU utilisation to its network, which Cloudflare is blaming on bad software deployment. This affected websites hosted in territories across the entire world.

Once this faulty deployment was rolled back, its CTO John Graham-Cumming explained, service was returned to normal operation and all domains using Cloudflare returned to normal traffic levels.

"This was not an attack (as some have speculated) and we are incredibly sorry that this incident occurred," Graham-Cumming said.

Advertisement
Advertisement - Article continues below
Advertisement - Article continues below

"Internal teams are meeting as I write performing a full post-mortem to understand how this occurred and how we prevent this from ever occurring again."

The incident affected several massive industries, including cryptocurrency markets, with users not able to properly access exchanges like CoinMarketCap and CoinBase.

Cloudflare issued an update last night suggesting the global outage was caused by the deployment of just one misconfigured rule within the Cloudflare Web Application Firewall (WAF) during a routine deployment. The company had aimed to improve the blocking of inline JavaScript used in cyber attacks.

One of the rules it deployed caused CPU to spike to 100% on its machines worldwide, and subsequently led to the 502 errors seen on sites across the world. Web traffic dropped by 82% at the worst point during the outage.

"We were seeing an unprecedented CPU exhaustion event, which was novel for us as we had not experienced global CPU exhaustion before," Graham-Cumming continued.

"We make software deployments constantly across the network and have automated systems to run test suites and a procedure for deploying progressively to prevent incidents.

Advertisement - Article continues below

"Unfortunately, these WAF rules were deployed globally in one go and caused today's outage."

At 3:02pm BST the company realised what was going on and issued a global kill on the WAF Managed Rulesets which dropped CPU back to normal levels and restored traffic, before fixing the issue and re-enabling the Rulesets approximately an hour later.

Many on social media were speculating during the outage that the 502 Bad Gateway errors may be the result of a distributed denial-of-service (DDoS) attack. However, these suggestions were fairly quickly quashed and confirmed to be untrue by the firm.

"The impact of the Cloudflare outage shows the sometimes-unexpected impact of massive success - much as with early outages at AWS and other cloud providers, it's a reminder of how dependent the internet ecosystem can become on the utility and expediency of a singular platform," analyst for the cloud transformation channel with 451 Research Carl Brooks told IT Pro.

"Cloudflare has a lot going for it: it effectively ended DDOS as an attack platform as we knew it, for instance, and it's a vital performance booster for extremely reasonable prices, but it has also quietly become a part of the backbone of the internet, and like every other provider out there, it will have hiccups."

Featured Resources

Digital Risk Report 2020

A global view into the impact of digital transformation on risk and security management

Download now

6 ways your business could suffer if you don’t backup Office 365

Office 365 makes it easy to lose valuable data regularly, unpredictably, unintentionally, and for good

Download now

Get the best out of your workforce

7 steps to unleashing their true potential with robotic process automation

Download now

8 digital best practices for IT professionals

Don't leave anything to chance when going digital

Download now
Advertisement

Most Popular

Visit/mobile/28299/how-to-use-chromecast-without-wi-fi
Mobile

How to use Chromecast without Wi-Fi

5 Feb 2020
Visit/operating-systems/27717/how-to-fix-a-stuck-windows-10-update
operating systems

How to fix a stuck Windows 10 update

12 Feb 2020
Visit/security/34616/the-top-ten-password-cracking-techniques-used-by-hackers
Security

The top ten password-cracking techniques used by hackers

10 Feb 2020
Visit/software/linux/354831/microsoft-to-add-defender-antivirus-software-to-linux-ios-and-android
Linux

Microsoft to add Defender antivirus software to Linux, iOS and Android

21 Feb 2020