A more cost effective alternative to Petabyte sized RAID arrays, the AmpliStor is easy to manage and is highly scalable. Its BitSpread technology has significantly lower overheads than RAID, performs better and provides higher levels of data protection.
Unstructured Big Data repositories are posing huge problems for RAID as it was never designed to scale into Petabytes. The cost benefits of large arrays consisting of cheap SATA hard disks are being eroded by RAID overheads and the potential for degraded data protection.
Lost capacity in massive RAID arrays is high and drive failures can cause serious problems. Rebuilding a Petabyte array will take days, quite possibly weeks, and if more failures occur while it’s degraded the consequences could be catastrophic.
It’s no good archiving data to reduce the size of first line storage as the whole point of Big Data is that it needs to be online and immediately accessible. This poses a new problem as non-recoverable read errors, or bit rot, is more likely to occur as drive capacities grow and access rates increase.
The AS36 storage nodes are equipped with twelve internal 3TB SATA drives and the latest Ivy Bridge Xeon E3 V2 CPUs
Amplidata’s AmpliStor is designed to handle the demands of unstructured Big Data. It scales easily into Petabyte capacities and doesn’t incur the same overheads as RAID.
Amplidata has developed an advanced version of erasure coding that it claims offers up to ten 9s for storage reliability. It provides much higher protection levels than RAID that can cope with multiple drive failures and has much faster repair times.
At its foundation is Amplidata’s BitSpread technology which runs on the AmpliStor controller appliances. It uses the concept of multiple, redundant representational equations which are stored instead of the data. The equations are used to construct the data and present it to an application.
BitSpread takes objects being stored and breaks them up into Superblocks. These are then broken down further into 4,096 Message Blocks which are used to calculate the equations.
The AmpliStor web interface is well designed and makes light work of provisioning storage
The equations are separated into equal sized groups which are distributed across multiple hard disks and storage nodes where the latter can be in different rack cabinets and even data centres. The number of redundant equations calculated will depend on the level of data durability required as defined by policies.
A policy comprises two numbers where the first defines the number of physical hard disks you want to spread the equations across. The second number defines how many drive failures within this group can be tolerated.
For example, a 16/4 policy requires equations to be spread across sixteen drives and up to four drive failures will be tolerated. For single data centres with four racks each with forty nodes, this would allow an entire rack failure to be tolerated without loss of data.
In the event of a failure, BitDynamics agents running on each storage node use the remaining equations to reconstruct the missing ones. A big advantage is processing overheads are confined only to the storage nodes and have no adverse impact on the controllers.
The primary access method to the Amplidata storage is via HTTP/ReST APIs where data is stored and retrieved using PUT and GET commands. For other applications, Amplidata supports gateways that allow storage to be presented as NAS shares.
During policy creation you enter two numbers that define the desired levels of data durability
Deployment is straightforward as the storage nodes and controllers have dual redundant Gigabit links to a switched network fabric. The controllers then present the object storage over a separate 10GbE network.
The minimum requirements for BitSpread are three AC1 controllers and three storage nodes. The price we’ve shown is for three Xeon E5-2600 powered AC1 controllers and eight AS36 storage nodes which deliver 288TB of raw capacity.
Amplidata’s web interface provides a status screen showing storage utilisation, system health and warnings for failed hardware. Policy creation is simple and, as you enter your numbers, it advises you of the storage overheads they will incur.
You can also define the maximum Superblock size and how you want the equations distributed. For namespace creation you assign a policy to a virtual container which is presented as object storage to applications.
Namespaces are used to apply policies to storage objects which are then presented to your Big Data applications
For performance testing we used an AmpliStor system comprising three AC1 controllers and 24 AS36 storage nodes providing 864TB of raw capacity. We created a namespace and assigned it a 20/4 data durability policy.
To gauge throughput we ran a Python script created by Amplidata which uses parallel HTTP PUT commands to write data objects to a namespace. The script also reports on elapsed time for each operation and throughput in MB/sec.
We started by configuring the script to write 1GB data objects across 128 parallel streams and ran them with one controller handing all 24 storage nodes. The Python script reported a high average throughput of 1,258MB/sec.
We reran the same script with all three controllers managing the storage nodes and saw a big boost in total average throughput to 3,191MB/sec. For our third test we took three storage nodes offline which created a triple drive failure within the namespace. The Python script now reported 2,931MB/sec showing what little impact this failure had on general performance.
If a single hard disk fails within a node you can decommission it from the web interface which causes the data to be repaired onto other available nodes and drives. For a standard 3TB drive this process would take around three hours as opposed to days for a typical high capacity RAID array rebuild.
AmpliStor looks capable of handling the demands of the next generation of Big Data applications. It scales easily as capacity can be expanded on demand by installing extra storage nodes as required and if you want more performance you simply add more storage controllers.
Chassis: 1U rack
CPU: 2 x 2GHz Xeon E5-2650
Memory: 32GB DDR3
Storage: 2 x 3GTB SATA, 1 x 240GB SSD
Network: 4 x 10GbE
Power: 2 x 750W hot-plug
Management: Web browser
AS36 Storage Node
Chassis: 1U rack
CPU: 2.3GHz Xeon E3-1220LV2
Memory: 8GB DDR3
Storage: 12 x 3TB SATA hard disks
Network: 2 x Gigabit
Power: 2 x 550W hot-plug