The history and evolution of storage
Over the years, as our data usage has risen exponentially, storage technology has improved to match
According to Domo's annual Data Never Sleeps research, by 2020 there will be 1.7MB of data created every second for every single person on Earth. With the global population estimated to be around 7.6 billion people by 2020, that's more than 12,000 terabytes of data a second. Our storage needs are increasing every year, and when you consider that the first computer was built scarcely 80 years ago, we've come a long way in a short space of time. In this feature, we trace the journey of data storage through its key stages to the present day's bleeding edge.
Data storage has always been a key part of the way computers work, right back to when John Von Neumann defined his eponymous architecture in 1945 for the ENIAC (Electronic Numerical Integrator and Computer), an architecture that is still the basis of today's PCs. In this conception, the main job of the Central Processing Unit was to read data and instructions from storage, perform operations, and write the results back to storage with dynamic memory and longer-term capacity not necessarily differentiated. Initially, the inputs and outputs were punch cards designed to be compatible with IBM's 405 accounting machine. Each card could hold 12 rows of 80 columns, although the encoding system wasn't necessarily binary so this doesn't directly translate to 960 bits.
Nevertheless, in 2013 XKCD's Randall Monroe calculated that Google's 15EB of data would require an area the size of New England and a depth of 4.5km if it were stored on punch cards. Aside from the space required, imagine how long it would take to sort through that many cards for a specific piece of data. Clearly, this form of storage couldn't cope with today's data needs, considering that by 2020 we will be creating this much data every 21 minutes. But, fortunately, our storage technology has developed to match.
By 1953, the ENIAC added a 100-word magnetic-core memory system built by the Burroughs corporation, which performed the same function as today's RAM. Around the same time, in 1951, the first tape storage system arrived for the UNIVAC I, the first computer system commercially available in the USA. Using a half-inch wide nickel-plated phosphor bronze recording medium, this UNISERVO drive could store 128 characters per inch at 12,800 characters per second.
However, IBM's magnetic tape design became the standard, with tape lengths up to 2,400ft, moving up to 3,600ft in the 1980s. The capacity increased over time, starting at 200 six-bit characters per inch for the first seven-track designs and culminating in 6,250 8-bit characters using the group-coded recording encoding system and nine-track tape. With a 2,400ft tape this equates to 5MB capacity for the initial version, rising to 140MB for the final iteration, which was still in use in the 1980s. Modern descendants of tape remain a valid choice for huge storage, with the roadmap for the LTO format extending to 192TB cartridges and Sony demonstrating tape storage cartridges capable of 330TB capacity in 2017.
In the 1970s and 1980s, punch cards and magnetic tape were still being used for storage and data entry. But a much faster alternative had already been in existence since the mid-1950s the hard disk. However, the first IBM 350 RAMAC was the size of two full-height refrigerators and contained 50 platters each with a 24in diameter wider than a dart board. Each IBM 350 had a similar 5MB capacity as a 2,400ft reel of tape during the same era (equivalent to 64,000 punch cards). But it could access a record in 600 milliseconds, whereas a tape drive might take six minutes to get from one end to the other of a 2,400ft reel.
By the 1960s, hard disk platters had shrunk to 14in diameter and a removable format providing 2MB per pack had arrived, with each pack about the size of a cake box. In 1980, Shugart Technology (later to become Seagate Technology) created a much smaller 5.25in variant with 5MB capacity, but most personal computers in this era relied on the floppy disk. This started off as an 8in design in 1973 with a 237.25KB formatted capacity, then shrank to 5.25in in 1976 with 87.5KB formatted capacity. In the early 1980s the 3.5in floppy arrived, although this coexisted with the 5.25in variety for some time. The most popular PC version offered 360KB capacity in single-sided variant, eventually reaching 1.44MB in high-density format and 2.88MB in extra-high density format.
There have been numerous variants on the themes of these technologies over the years, including the LS-120 floppy offering 120MB, the magneto-optical MiniDisc, SyQuest removable hard drives, Iomega's Zip and Rev drives, as well as rewritable optical CD, DVD, Blu-ray and HD-DVD formats that seemed very important for a few years but are scarcely used today. However, during the last decade the main storage choice for personal computers and servers has been between new generations of hard disk and Flash memory-based solid-state disks (SSDs).
Hard disks are still readily available in 3.5in and 2.5in formats, with standard rotational speeds ranging from 5,400 to 15,000rpm. The demise of the hard disk has been considered imminent for at least 15 years, but it hasn't happened yet. For a start, the longitudinal magnetic grain technology used in hard disks appeared to be reaching the extent of its abilities according to the laws of physics. But reorienting the grains in a perpendicular fashion has allowed hard drives to reach 12TB in capacity in 2018, so HDDs continue to progress as a cost-effective mass storage medium.
Nevertheless, ever since SSDs became available as desktop and server drives around 2006, it has been clear that the dominance of the hard disk would slowly be eroded. Although the cost per megabyte of SSDs is still an order of magnitude greater than HDDs, the latest NVMe M.2 SSDs offer sustained reading throughputs in excess of 3,000MB/sec, 10-15 times what the fastest hard disks can muster, and access times are 100 times quicker.
Performance is not the only reason to choose SSDs, either, particularly for data centres. A typical hard disk will consume 5-10W when in use, and not that much less when idle unless spun down. An SSD, in contrast, uses less than a Watt when idle and a few Watts in use. This has a knock-on effect for how much heat they produce as well. As a result, all-Flash SSD arrays are gaining ground for business storage, since the running costs will be a lot lower and performance much higher, whilst over-provisioning alongside sophisticated controller technology can provide mean time before failure (MTBF) ratings on par with hard disks as well.
Perhaps the most significant recent development for SSDs has been 3D XPoint memory, sold by Intel under its Optane brand. This is a relative of Flash memory with characteristics that make it particularly attractive for enterprise applications. Whilst raw reading and writing throughput aren't a noticeable improvement over traditional SSDs, the read latency is two or three times lower, and write latency less too. This means that with random access of smaller files, Optane SSDs can easily outperform regular Flash equivalents. Enterprise-focused Optane SSDs also support 100 times as many write cycles over their lifetime than conventional Flash SSDs.
Optane technology is only getting started, too, as it can also be incorporated into RAM modules, which Intel launched around the middle of 2018 and will be shipping in volume in 2019. These Optane DC Persistent Memory modules can be installed just like regular DRAM DIMMs, except that they're non-volatile, so they won't lose their data when the power goes off. This is a potent feature for database applications, when allied with the much faster interface of memory DIMM slots.
Even though our data needs are expanding every day, storage technology appears to be keeping pace very well. Emerging SSD technologies like QLC and increased 3D NAND layering will bring the cost per megabyte of commodity SSDs below that of HDDs in a few years, perhaps as soon as 2021. PCI Express 4.0-based NVMe interfaces will allow even faster sustained throughput. Although we're producing more data than ever before, we will be able to store it more cheaply and access it more quickly.