Google report casts doubt over hard drive failure

The effects of high temperatures and disk usage on hard drive failures do not appear to be as great as previously believed, according to a report by engineers at Google.

The research looked at 100,000 parallel and serial ATA-type disks ranging from 80GB to 400GB in size. All the units studied were put into production in or after 2001. For the purposes of the study, a drive was considered to have failed if it was replaced as part of a repairs procedure. The research paper, written by Google engineers Eduardo Pinheiro, Wolf-Dietrich Weber and Luiz Andre Barroso concluded that models based on SMART (Self-Monitoring, Analysis, and Reporting Technology) parameters alone are unlikely to be useful for predicting individual drive failures.

"Surprisingly, we found that temperature and activity levels were much less correlated with drive failures than previously reported," the researchers wrote.

The authors said that while disk drives are generally very reliable they are also very complex and when drives fail the reasons are numerous.

"As a result, detailed studies of very large populations are the only way to collect enough failure statistics to enable meaningful conclusions," the researchers said.

They said that the test results painted a more complex picture than just finding a correlation between disk use and failure.

"First, only very young and very old age groups appear to show the expected behaviour. After the first year, the AFR (average failure rate) of high utilisation drives is at most moderately higher than that of low utilisation drives," said the engineers.

"The three-year group in fact appears to have the opposite of the expected behaviour, with low utilisation drives having slightly higher failure rates than high utilisation ones."

The engineers suggested that the disks behave in a "survival of the fittest" way with drives that manage to get through their "infancy" manage to be the least susceptible to failure.

The report found that high temperatures tend to affect old drives more than young drives. "We can conclude that at moderate temperature ranges it is likely that there are other effects which affect failure rates much more strongly than temperatures do," the report said.

The authors concluded that other effects may be more prominent in affecting disk drive reliability in the context of a professionally managed data centre deployment.

Rene Millman

Rene Millman is a freelance writer and broadcaster who covers cybersecurity, AI, IoT, and the cloud. He also works as a contributing analyst at GigaOm and has previously worked as an analyst for Gartner covering the infrastructure market. He has made numerous television appearances to give his views and expertise on technology trends and companies that affect and shape our lives. You can follow Rene Millman on Twitter.