London air traffic failure: NATS' lessons are not blue-sky thinking

Last week's air traffic control outage has both lessons, and reassurances, for critical IT systems.

Plane

Inside the Enterprise: Late last week, airspace over southern England was strangely quiet.

For several hours, aircraft movements were restricted, following problems with the air traffic control system run by NATS, at its Swanwick centre.

The failure was the result of an issue with the air traffic IT system, and specifically, a system called the Flight Data Controller.

According to reports, the problem was caused by just a single line of code, which has now been fixed. But the knock-on effects, in terms of disruption to travellers and to airlines, could turn out to be extensive.

NATS has already said that it will face financial consequences, as a result of the outage; NATS will refund air traffic control charges to airlines, and the airlines themselves will do doubt face compensation claims from irate passengers.

NATS, though, has also announced an independent inquiry into the outage on 12 December, including an examination of the root causes and whether the system had enough, in-built resilience. In particular, the company will examine whether there need to be "further measures to avoid technology or process failures in this critical national infrastructure and reduce the impact of any unavoidable disruption".

It has suffered problems before, in particular an extensive outage in 2013. The inquiry will also ask whether NATS had fully learned the lessons from that incident.

However, at one level inconvenient though delayed flights are NATS' systems worked exactly as they should. Airspace was not, in fact, closed but aircraft movements were restricted: a necessary safety measure if controllers do not have all the information they would usually use, to control flights.

The key to a system such as air traffic control, where lives are at stake, is for fail-overs to backup systems to act smoothly and seamlessly, and if a system's performance does degrade, for that to happen in a controlled, rather than sudden, manner. The actual outage at NATS was just 45 minutes long, even though the disruption inevitably lasted rather longer.

NATS' experience, in fact, provides three lessons for anyone running mission-critical IT. 

The first is to have systems that can handle failure, or if they do fail, for performance to "degrade gracefully". This, of course, is an engineering issue, and one that also requires investment.

The second is the importance of communications. Maintaining a flow of information to customers, shareholders and other stakeholders is vital. Rumours spread at the speed of social media, and a clear public communication channel, including making senior managers available to the media, is essential if an organisation is to stay in control.

The third step is, after the outage, to investigate the causes, and act on the findings.

With its announcement of an independent inquiry into the problems at Swanwick, NATS has shown that it is, at the very least, following these three steps. And, with businesses running critical infrastructure facing growing scrutiny by regulators, all industries should draw lessons from this latest, IT-related outage.

 Stephen Pritchard is a contributing editor at IT Pro.

 

 

Featured Resources

What you need to know about migrating to SAP S/4HANA

Factors to assess how and when to begin migration

Download now

Your enterprise cloud solutions guide

Infrastructure designed to meet your company's IT needs for next-generation cloud applications

Download now

Testing for compliance just became easier

How you can use technology to ensure compliance in your organisation

Download now

Best practices for implementing security awareness training

How to develop a security awareness programme that will actually change behaviour

Download now

Most Popular

Visit/policy-legislation/data-governance/354496/brexit-security-talks-under-threat-after-uk-accused-of
data governance

Brexit security talks under threat after UK accused of illegally copying Schengen data

10 Jan 2020
Visit/security/cyber-security/354468/if-not-passwords-then-what
cyber security

If not passwords then what?

8 Jan 2020
Visit/policy-legislation/31772/gdpr-and-brexit-how-will-one-affect-the-other
Policy & legislation

GDPR and Brexit: How will one affect the other?

9 Jan 2020
Visit/web-browser/30394/what-is-http-error-503-and-how-do-you-fix-it
web browser

What is HTTP error 503 and how do you fix it?

7 Jan 2020