Information archaeology

With your inbox always full to overflowing, you may not look back at older messages and documents very often but that's where most of the knowledge in your organisation is buried.

Search tools make it easier to find without digging away by hand, but what happens to the older files and the messages you have to delete to stay under your mail quota? It's not just about the storage space file formats change and older applications won't always run on the latest version of Windows, even if you can find the software you were using 10 years ago.

It's a major issue for governments, says Natalie Ceeney, the chief executive of the National Archives. "We're used to being able to walk into a business and read documents in Word. We expect our salary to be paid, we expect to get a pension. That means the government needs to keep pension records. We need to keep records on where we bury nuclear waste. We need to make sure censuses taken today are readable in a hundred years. But digital information is inherently more ephemeral than paper and we are living in a world with a ticking time bomb."

The problem has already reached the oil and gas industry. Many of the elephant' oil fields discovered recently aren't new, it's just that the analysis and extraction techniques have improved to the point that it's now economical to mine them. But if the survey is 20 years old, is it cheaper to rescue the data or do the survey all over again?

How about a company that closes a research lab and then needs to defend a patent developed by that lab, or restart old research using a new generation of technology. A pharmaceutical company that needs to play video footage from a 20-year-old clinical study has the same problem as a police force that needs to produce video evidence used for a conviction that's appealed a decade later.

Adam Farquhar, head of digital library technology at the British Library, is part of the EU's PLANETS projects (Preservation and Long-term Access through Networked Services) which calculates that losing access to documents loses businesses across Europe 3 billion (2.7 billion) every year.

"Billions and billions of documents representing billions of Euros are at substantial risk," he believes. "This affects everyone who keeps digital media for more than 15 years."

Physical failure and formats

Failure of the media files are stored on is familiar. CDs often last fewer than 10 years, magnetic tape has to be spooled regularly to keep it readable, and older drives may no longer be available and if you do have both the media and the drive, you need drivers for your current operating system and the right connector.

Assuming the files are available, the real issue is the file format. Microsoft Office includes converters for many older formats from Microsoft and other vendors, as does OpenOffice.org, but subtle formatting changes can cause problems if a document is repaginated and a legal agreement refers to a clause on a specific page, for example.

Mary Branscombe

Mary is a freelance business technology journalist who has written for the likes of ITPro, CIO, ZDNet, TechRepublic, The New Stack, The Register, and many other online titles, as well as national publications like the Guardian and Financial Times. She has also held editor positions at AOL’s online technology channel, PC Plus, IT Expert, and Program Now. In her career spanning more than three decades, the Oxford University-educated journalist has seen and covered the development of the technology industry through many of its most significant stages.

Mary has experience in almost all areas of technology but specialises in all things Microsoft and has written two books on Windows 8. She also has extensive expertise in consumer hardware and cloud services - mobile phones to mainframes. Aside from reporting on the latest technology news and trends, and developing whitepapers for a range of industry clients, Mary also writes short technology mysteries and publishes them through Amazon.