What can companies learn from object storage pioneers?

Object storage is an increasingly popular way to store data, but beware the pitfalls

Cloud with various IT components inside

The shift to the cloud is encouraging enterprises to rethink their options on storage. According to a June 2019 study from IHS Markit, 56% of organisations said they plan to increase investment in object storage, putting it ahead of unified storage at 51%, storage-area networks at 48% and network-attached storage at 36%. Most object storage is in the cloud, with popular examples including AWS S3, Azure Blob Storage and Google Cloud Platform (GCP) Cloud Storage.

But shifting to a new storage architecture at the same time as the cloud move is not entirely painless.

At the beginning of the decade, Moneysupermarket.com, the consumer online comparison and information site for financial services, was using a combination of SQL databases and SAS analytics environment. By 2014, it had moved to AWS for website hosting and data analytics, including use of S3 object storage and Vertica data warehouse. By May 2019, it moved its data and analytics to GCP using the BigQuery data warehouse and Cloud Storage object storage. The website itself remains on AWS.

Harvinder Atwal, Chief Data Officer at MoneySuperMarket, tells IT Pro: "One of the good things about the cloud is the initial learning curve is very shallow: it's easy to start. But then you get to the point where it's very much steeper and you need to understand some of the complexities involved."

One example of those complexities is the introduction of object lifecycle policies. The idea is to define policies to manage objects throughout the time the organisation needs them. That might be to move them to cheap long-term storage such as AWS Glacier or to expire them all together. Getting these rules right from the outset can save costs.

"That's one of the things that maybe we should put a little more effort into from the very beginning," Atwal says.

Other advice for those moving to object storage in the cloud includes avoiding biting off more than the team can chew.

"I would not do the migration all in one go," Atwal says. "I think the bigger project and the more money and resources it uses, the more likely it is to fail. I would encourage people to think of their use case and application and build a minimal viable product around that."

It's worth getting advice about the transition from independent third parties, which the cloud platform vendors can recommend. For example, Moneysupermarket.com used a consultancy called DataTonic with its transition to Google Cloud Platform.

Lastly, there can be a cultural change in store for the IT department, Atwal says. "The IT function can be very traditional in its thinking around how you use data. They think you must cleanse it, put it into a relational schema and only then can users access it. But with data today, the value in analytics comes from actually being able to use data for many sources and join them together, and IT has to learn to ditch its historic mindsets."

Nasdaq, the tech stock market, began working with AWS in 2012. It stores market, trade and risk data on the platform using S3 and Glacier. It uploads raw data to Amazon S3 throughout the trading day, using a separate system running in the cloud, converts raw data into Parquet files and places them in their final S3 location. This way, the system is able to elastically scale to meet the demands of market fluctuations. It also uses Amazon Redshift Spectrum to query data to support billing and reporting, and Presto and Spark on Elastic MapReduce (EMR) or Athena for analytics and research.

"Migrating to Amazon S3 as the 'source of truth' means we're able to scale data ingest as needed as well as scale the read side using separate query clusters for transparent billing to internal business units," says Nate Sammons, assistant vice president and lead cloud architect at Nasdaq.

But getting the scale of analytics solutions right for the problem has been a challenge, he says. "We currently operate one of the largest Redshift clusters anywhere, but it's soon to be retired in favour of smaller purpose-specific clusters. Some of the custom technologies we developed [in the early days] have since been retired as cloud services have matured. Had technologies like Amazon Redshift Spectrum existed when we started, we would have gone straight to Amazon S3 to start with, but that was not an option."

The advantage of using S3, though, was that it made the organisation less concerned about individual machine outages or data centre failures, Sammons says. "If one of the Amazon Redshift Spectrum query clusters fail, we can just start another one in its place without losing data. We don't have to do any cluster re-sizing and we don't require any CPU activity on the query clusters to do data ingest."

Rahul Gupta, IT transformation expert at PA Consulting, says those exploiting object storage in the cloud should know that apparent scalability and elasticity does not remove the need to do some basic housekeeping on data.

"A lot of people feel storage is cheap, so they build systems with vast amounts of data and think the impact on cost is not that great. They push the data into S3, or an equivalent, and then once it's in there, they feel that they can impose structure on the data, which is not the right thing to do," he says.

He says that by understanding data structure upfront and creating governance such as role-based access, organisations will not have to revisit the architecture once the data grows.

Just because so many organisations are moving storage to the cloud, does not mean they all get the same value from the transition. The considerable investment cloud infrastructure, storage and analytics application will offer the greatest returns to those who understand the storage lifecycle upfront, create some governance rules around access and understand data structure from the outset.

Featured Resources

Managing security risk and compliance in a challenging landscape

How key technology partners grow with your organisation

Download now

Evaluate your order-to-cash process

15 recommended metrics to benchmark your O2C operations

Download now

AI 360: Hold, fold, or double down?

How AI can benefit your business

Download now

Getting started with Azure Red Hat OpenShift

A developer’s guide to improving application building and deployment capabilities

Download now

Recommended

Best NAS drives 2021
network attached storage (NAS)

Best NAS drives 2021

6 Jan 2021
BackupAssist teams with Wasabi to offer cheaper backup for businesses
backup

BackupAssist teams with Wasabi to offer cheaper backup for businesses

6 Jan 2021
AWS’ new S3 Storage Lens gives an in-depth view of cloud storage
Amazon Web Services (AWS)

AWS’ new S3 Storage Lens gives an in-depth view of cloud storage

19 Nov 2020
VMware and Nvidia working to enable next-gen hybrid cloud architecture
VMware

VMware and Nvidia working to enable next-gen hybrid cloud architecture

29 Sep 2020

Most Popular

How to move Windows 10 from your old hard drive to SSD
operating systems

How to move Windows 10 from your old hard drive to SSD

21 Jan 2021
What is the Raspberry Pi Pico?
Hardware

What is the Raspberry Pi Pico?

21 Jan 2021
How to recover deleted emails in Gmail
email delivery

How to recover deleted emails in Gmail

6 Jan 2021