Decades-old bug wiped out UK air traffic control
Report on December Nats outage reveals bug downed aircraft for five hours
A 20-year-old bug was behind a software outage that grounded planes at Heathrow for five hours in December, according to an independent inquiry into the incident.
A failure in the National Air Traffic Services (Nats) system shut down the air-traffic control centre at Swanwick at the end of last year, causing chaos at Heathrow.
Then business secretary Vince Cable told the BBC at the time that the Nats system was "ancient" and that the organsiation behind it was "skimping" on investment.
Now, an official inquiry into the incident has revealed a bug in the System Flight Server was at the root of the fault - and that the flaw had been in the software since the 1990s.
The server was rolled out at Swanwick in 2002, and the bug was already present in it then.
Despite the age of the flaw, the inquiry's findings, titled NATS System Failure 12 December 2014 Final Report, didn't criticise Nats.
Instead it said that "it is unrealistic to expect that software faults will not be introduced in development" of such complex systems.
Nats' processes "are thorough and professional", according to the report, and there's a "strong and effective process" for software updates.
"The resultant integrity appears better than would be expected for software of this importance," it added.
The system is already set for upgrade as a new Europe-wide system called SESAR is rolled out in the next few years. The report said that deployment shouldn't be accelerated in light of the discovered bug, as "a search for earlier benefits would be likely to lead to shortcuts being taken".
It also made a series of recommendations to bring into SESAR to avoid future problems, including better hardware redundancy, software audits and more.
The report had praise for Nats' engineers, saying that "identifying a software fault in such a large system (the total application exceeds two million lines of code), within only a few hours, is a surprising and impressive achievement".
The system was taken offline at 2.55pm that day, and mostly restored less than an hour later; by 7pm, engineers believed they had uncovered the reason behind the fault, with full service back by 8.30pm.
Despite reports at the time saying UK airspace was completely closed, it wasn't - the delays were because controllers had to use manual methods to manage flight paths.
"NATS estimates that... a maximum of 1,900 flights and 230,000 passengers were affected during the afternoon and evening of 12 December," the report said. "Additionally several airlines reported some level of cancellations and flight disruption running into 13 December with approximately 60 aircraft and 6,000 passengers affected."
Unlocking collaboration: Making software work better together
How to improve collaboration and agility with the right techDownload now
Four steps to field service excellence
How to thrive in the experience economyDownload now
Six things a developer should know about Postgres
Why enterprises are choosing PostgreSQLDownload now
The path to CX excellence for B2B services
The four stages to thrive in the experience economyDownload now