We're creating more data than we can cope with
The problem isn't just storage – it's how to organise and manage an ever-growing hoard of files
Friends have been recommending the TV series Silicon Valley to me for years and this month I've finally taken the plunge. I have to say I'm enjoying it greatly, as it paints a humorously recognisable picture of the absurdities and extravagances of the tech industry.
It's also refreshing to see a mainstream TV show with such an unapologetically geeky premise. After you've sat through a thousand boy-meets-girl stories, it all starts to feel a bit formulaic. But a comedy drama about a revolutionary compression algorithm? Now that's interesting. And it rings true, because we all know the challenge of dealing with ever-increasing volumes of data. Speaking as someone whose home NAS drive recently ticked over the "90% full" mark, it's an issue I can certainly relate to.
Storage isn't the problem
Frankly, I'm amazed how quickly this day has come around. I originally anticipated that four 2TB drives would provide enough storage to last me for a decade or more. Of course, I didn't properly factor in the voodoo maths beloved of hard disk manufacturers, whereby a 2TB drive only yields 1.8TB of usable space. Nor did I think as hard as I should have about the fact that RAID 5 would require me to sacrifice one drive's entire capacity in the name of fault-tolerance.
But even taking those issues into account, I was always destined to be tripped up, as what I chiefly didn't appreciate was how quickly the space would be consumed.
In the past five years I've somehow accumulated almost as much new data as I had previously amassed in my entire lifetime and I know my experience isn't particularly unusual. No wonder everyone in Silicon Valley wants to get their hands on Richard Hendricks' magical compression algorithm.
Yet even if such technology existed it wouldn't solve the real problem. Naturally, I'd jump at the chance to squeeze an extra dozen terabytes onto my NAS for free. But if my storage demands continue to snowball and I see no reason why they wouldn't then that would only be a one-off boost in capacity. In a few years, the drive would be full again, and I'd be back where I started.
And that's not really a disaster. Yes, it's annoying and expensive to keep having to buy new disks, but storage only gets cheaper and denser. It's hard to feel too hard done by when each upgrade is better value than the last and this particular conveyor-belt of progress shows no signs of slowing down.
Managing it is the hard part
No, the problem isn't storing all my data, but managing it. Not many years ago, I used to regularly make time to sit down and sort through all the miscellaneous files scattered across my various hard disks to categorise and organise the important ones, and delete those I no longer needed. Nowadays that task seems to have slipped over the event horizon of feasibility. When I try to get a handle on my archives, I'm no longer dealing with tens or hundreds of files, but tens of thousands. The job is simply too big to do by hand.
The worst offender, in this regard, is my photo library. Every time I hit the shutter on my little Sony RX100 camera, that's 20MB of data brought into the world enough to fill the entire hard disk on my first PC. And then I press the shutter a few more times, just in case someone was blinking, or I accidentally shook the camera. In this way I've gradually accrued terabytes of images, of which a mere fraction are probably worth keeping. But trawling through the whole collection and individually rating each image is a chore I wouldn't wish on my worst enemy.
What I long for is not a way to cram yet more undifferentiated images onto my spindles, but a way of making sense of what I already have an AI-driven approach, for example, that can work out which pictures are worth keeping and which can be ditched.
AI can do the job just fine
That might sound like a task you'd be hesitant to entrust to a machine, especially when it comes to irreplaceable items such as photos. But I've learnt not to underestimate machine learning. In fact, while researching this column, I carried out a quick exploratory web search to see how viable it is to use AI to rate photographs and someone's already doing it. You can try it at everypixel.com/aesthetics: simply upload your own pictures and you'll receive a machine-generated "awesomeness" quotient for each one.
It may look like a parlour trick, but I tried it with a dozen of my own photos and couldn't fault its judgment. So now all I'm waiting for is a real-life Richard Hendricks to come along with a way to generalise this sort of approach to apply a similar technique to all the old documents, screenshots, PDFs and installers floating around on my NAS, to help sort the wheat from the chaff.
There's the catch, though. Such a project would likely call for immense computing power, a vast corpus of training data, and the resources to buy in brains as needed. No surprise, then, that machine learning isn't being advanced by startups; it's huge firms like Google that are bringing it into our lives, in forms such as Gmail's automatic email filters and Google Photos' image-recognition features. I'm sure that before long Google Docs will be advising me which documents look important and which I can probably bin.
And all of this does make me wonder if I've been taking the wrong message from Silicon Valley. Throughout the series we naturally root for Richard and his team as they take on the tech giants. But in fiction, as in fact, it's those giants who end up changing the world sometimes even for the better.
The essential guide to cloud-based backup and disaster recovery
Support business continuity by building a holistic emergency planDownload now
Trends in modern data protection
A comprehensive view of the data protection landscapeDownload now
How do vulnerabilities get into software?
90% of security incidents result from exploits against defects in softwareDownload now
Delivering the future of work - now
The CIO’s guide to building the unified digital workspace for today’s hybrid and multi-cloud strategies.Download now