IT Pro is supported by its audience. When you purchase through links on our site, we may earn an affiliate commission. Learn more

London air traffic failure: NATS' lessons are not blue-sky thinking

Last week's air traffic control outage has both lessons, and reassurances, for critical IT systems.

Plane

Inside the Enterprise: Late last week, airspace over southern England was strangely quiet.

For several hours, aircraft movements were restricted, following problems with the air traffic control system run by NATS, at its Swanwick centre.

The failure was the result of an issue with the air traffic IT system, and specifically, a system called the Flight Data Controller.

According to reports, the problem was caused by just a single line of code, which has now been fixed. But the knock-on effects, in terms of disruption to travellers and to airlines, could turn out to be extensive.

NATS has already said that it will face financial consequences, as a result of the outage; NATS will refund air traffic control charges to airlines, and the airlines themselves will do doubt face compensation claims from irate passengers.

NATS, though, has also announced an independent inquiry into the outage on 12 December, including an examination of the root causes and whether the system had enough, in-built resilience. In particular, the company will examine whether there need to be "further measures to avoid technology or process failures in this critical national infrastructure and reduce the impact of any unavoidable disruption".

It has suffered problems before, in particular an extensive outage in 2013. The inquiry will also ask whether NATS had fully learned the lessons from that incident.

However, at one level inconvenient though delayed flights are NATS' systems worked exactly as they should. Airspace was not, in fact, closed but aircraft movements were restricted: a necessary safety measure if controllers do not have all the information they would usually use, to control flights.

The key to a system such as air traffic control, where lives are at stake, is for fail-overs to backup systems to act smoothly and seamlessly, and if a system's performance does degrade, for that to happen in a controlled, rather than sudden, manner. The actual outage at NATS was just 45 minutes long, even though the disruption inevitably lasted rather longer.

NATS' experience, in fact, provides three lessons for anyone running mission-critical IT. 

The first is to have systems that can handle failure, or if they do fail, for performance to "degrade gracefully". This, of course, is an engineering issue, and one that also requires investment.

The second is the importance of communications. Maintaining a flow of information to customers, shareholders and other stakeholders is vital. Rumours spread at the speed of social media, and a clear public communication channel, including making senior managers available to the media, is essential if an organisation is to stay in control.

The third step is, after the outage, to investigate the causes, and act on the findings.

With its announcement of an independent inquiry into the problems at Swanwick, NATS has shown that it is, at the very least, following these three steps. And, with businesses running critical infrastructure facing growing scrutiny by regulators, all industries should draw lessons from this latest, IT-related outage.

 Stephen Pritchard is a contributing editor at IT Pro.

 

 

Featured Resources

Accelerating AI modernisation with data infrastructure

Generate business value from your AI initiatives

Free Download

Recommendations for managing AI risks

Integrate your external AI tool findings into your broader security programs

Free Download

Modernise your legacy databases in the cloud

An introduction to cloud databases

Free Download

Powering through to innovation

IT agility drive digital transformation

Free Download

Most Popular

The UK's best cities for tech workers in 2022
Business strategy

The UK's best cities for tech workers in 2022

24 Jun 2022
LockBit 2.0 ransomware disguised as PDFs distributed in email attacks
Security

LockBit 2.0 ransomware disguised as PDFs distributed in email attacks

27 Jun 2022
Salaries for the least popular programming languages surge as much as 44%
Development

Salaries for the least popular programming languages surge as much as 44%

23 Jun 2022