Fail to plan, plan to fail: Firms neglect recovery plans at their peril

business continuity

Inside the enterprise: When it comes to IT, businesses are failing to learn from their mistakes.

This is especially true when it comes to business continuity planning and disaster recovery.

Over the last few months, IT Pro has covered a number of high-profile business outages caused by IT failures; the cost of these outages easily runs to millions of pounds. Examples include the failure of the UK's air traffic control system last December and a series of failures in the banking sector, notably at RBS.

These and indeed most IT-related outages could have either been foreseen, or their impact minimised, through better testing and better business continuity planning.

Firms are either unaware of the risks they face, or are failing to take them seriously enough.

In the case of NATS, it's possible to argue the system worked as it should, leading to a partial shut down but keeping planes safe. But NATS, airlines and passengers would far rather the fault had never happened in the first place.

In the case of the RBS group, the root cause seems to be some years of under-investment in technology. But it is hard not to feel, at least based on what is in the public domain, that banks of that scale should be better prepared to limit the impact of a fairly mundane IT failures.

Apparently, they are not alone. A survey carried out by Timico, an online services provider which sells, among other things, cloud-based disaster recovery tools, suggests a quarter of IT managers had experienced an outage during the past month. This is a significant figure.

The survey found one in eight companies had not tested their business recovery plans. This suggests firms are either unaware of the risks they face, or are failing to take them seriously enough.

It also gives some interesting insights into why systems fail. In one in four cases, the root cause was a power failure; whether this is a grid level power cut or a failure of internal power distribution systems isn't specified. As many as 40 per cent of outages were caused by either hardware or software failure, or possibly both.

Natural disasters were said to be behind 11 per cent of incidents, with human error causing seven per cent, and malicious activity, five per cent. This suggests in most cases, an IT outage could have been prevented.

Even where prevention is impossible, a good business continuity plan should allow an organisation to continue to function, even if IT capacity is reduced for a time. But this has long been a weakness of both business continuity planning, and conventional disaster recovery tools. The tools need to be tested, to ensure they work, and businesses need to rehearse and practice their recovery, and workrounds.

In theory, cloud-based recovery services should help SMEs to keep working, and keep their data safe, if something goes wrong. But if IT departments fail to put this to the test, they will never know.

Stephen Pritchard is a contributing editor at IT Pro.