YouGov cuts storage needs by 70% with MongoDB’s data compression

YouGov has reduced its storage needs by 70 per cent after migrating to the latest release of MongoDB's database.

The polling company collects between 2GB and 4GB of new data every hour, but at peak times the amount can be three times as high, and first turned to the NoSQL database in 2010 to store current and archived data which comprises respondents' answers to questions and their activity during an interview.

But with its recent upgrade to MongoDB 3.0, it used the compression in the database's WiredTiger engine to drastically reduce its storage requirements, which were growing as it captured ever more information.

The 3.0 release allows customers to choose between three engines to power their database - its standard read-intensive engine, a write-intensive engine and an in-memory engine.

WiredTiger powers the write-intensive engine, allowing developers to build applications that deliver between seven and 10 times better write-throughput performance while dramatically compressing data.

Jason Coombs, executive technical director at YouGov, said: "Given that we've moved to SSDs for storage, this saving is fantastic. That feature alone has pushed us to upgrade faster than we would have otherwise."

YouGov now uses MongoDB to power dozens of its applications, but the key system it supports is Gryphon, a survey system the company built itself.

All survey data enters MongoDB via Gryphon, before being sent to one of YouGov's other applications, like brand perception tool BrandIndex or consumer behaviour tracker Pulse.

Gryphon used to sit on YouGov's own database, called FastStore, but it couldn't scale globally, so the firm selected MongoDB five years ago over a rival product also being used within the company based on Microsoft SQL Server.

"That stack could not provide the speed of innovation or performance required by the Gryphon product," said Coombs.

Now, YouGov uses MongoDB Enterprise Advanced, a version of the database that includes advanced software, support, certifications, and licenses.

It also uses MongoDB Cloud Manager to run, automate and back up its database, benefitting from a cluster of five shards, two of which are in the US and two of which are in Europe.

"It's less about break and fix support, and more about proactive, consultative services and tools such as planning upgrades or advice on schema design for new apps," said Coombs.