Big or small cloud providers: does size really matter?

man holding small grass floor with four blue clouds and one red cloud hovering above

When one considers a cloud service, two providers spring immediately to mind: Microsoft (with Azure) and Amazon (with EC2).

Along with these two behemoths, however, come tens of thousands of smaller companies offering services with the “cloud” label.

“I am not a number,” proclaimed Number six, “I am a free man”. With these smaller services, there's every chance that you'll be seen more as a customer than as just another amorphous entity … but is it all it's cracked up to be?

Is it a white label?

The first thing you must do is find out who actually runs the service. In many sectors of the IT industry a service provided by company A is actually a re-badged service of company B's, perhaps with a layer (often a very thin layer) of added value dropped on as some kind of motivation for you to sign up with it.

Naturally, added value generally goes hand-in-hand with added cost, so ask yourself whether it's worth it; for instance:If the supplier claims that it has a better relationship with its upstream provider, dig about and find whether the difference is really that significance.

Even if the provider is, say, ten times larger than you, it may still be a minnow in the eyes of the underlying service provider. Is the added-value service they're offering something you could get elsewhere, even from the upstream provider to some extent, or perhaps something you could do yourself? Is the added value actually just familiarity or locality? That is, are you dealing with the white-label supplier just because you know them, and forgetting to do a proper evaluation?

At our company we've had experience of the first and third of these examples, so they're most definitely not just hypothetical.

Where's the support?

We were once offered a service (albeit a more general network service, not a cloud-specific instance) with 24x7 support in the event of a system-down problem. The technical staff of the supplier totalled two, and there was no third party fall-back in place. Unsurprisingly, I declined. Large suppliers have big support teams, and smaller suppliers have smaller ones – QED

Do not, however, take this to mean that small suppliers can't support you properly. A well-focused, well-trained team looking after a modest portfolio of well-understood products can be far more efficient than a behemoth with umpty-squillion clients and a vast product portfolio.

In a small organisation, you need to be confident that they have reasonable staff retention (hands-on experience trumps a wad of documentation every time, and you don't get it in a support team with high staff churn) and technical competence. If you can be happy that this is the case, you've ticked the box.

And in fact I tend to be far more critical and inquisitive when I deal with large companies, because a big team is often a bad thing – particularly when your ticket gets picked up by one of the less capable people on the team. If you go with a big provider you should always ensure there's a comprehensive, properly implemented escalation path and that you know all about it. Better still, ensure you have a named account manager whom you can chase when the effluent hits the fan. I'd be retired and sipping Mojitos in the Bahamas if I had a fiver for each time I'd been landed with the middle-of-the-night engineer and had been able to rouse the account manager to kick some butt and actually get my problem fixed.

How performant is it?

Big cloud implementation equals resilient and powerful. It's as simple as that. With such competition for business between the big names, performance is king and if one of the providers were to have a significant performance or resilience issue their market share would drop like a stone in no time at all (let's face it, moving to another provider isn't a big deal if you're not doing anything requiring any major platform-specific features). You can generally work on the premise, then, that the big players will perform.

For example, I recently had reason to inquire about the resilience of a particular cloud service provider; the answer was that each data centre has three independent 10Gbit/s links, two of this, two of that, blah, blah, blah. I knew that would be the case, but my RFP form had a box that needed ticking and it was well and truly ticked.

With a smaller provider, you have to be a little more careful. Be absolutely specific about the requirements you have, and ask all the right questions regarding dedicated vs. shared CPU/RAM/disk capacity and the like – otherwise you could find yourself in a nasty mess later when the provider's client base has grown more quickly than it's been able to grow its infrastructure.

By all means ask them about its infrastructure, drilling right down to make and model level: for instance, it's one thing to know that all its blade servers are connected through 10Gbit/s Ethernet, but quite another to be aware that (say) the 16-port 10Gbit/s blades in their chassis switches are four-to-one contended and thus when flat out on all ports can only give you a quarter of what they claim.

Is it scalable?

This is a crucial point. Think of your requirement now, then consider an optimistic growth pattern. Then treble it. Then add a bit. Now see how the provider's infrastructure will cope – bearing in mind that you should be considering the same prediction for their other clients.

The huge players are adding kit at a mind-numbing rate in order to keep up with demand; will the smaller providers be able to do so? Physical equipment and hosting space cost money, but more importantly they have an overhead when it comes to installation and management. The latter should be modest (with automated installation technologies such as Microsoft's SCCM you can run up and test a cabinet full of servers in a couple of hours) but this assumes they actually use a sensible technological mechanism rather than just having a couple of blokes with CDs clicking “Next”.

What's the geography?

Cloud customers care about geography. The big players have clear geographies, with multiple data centres in each region and wads of bandwidth between them. Smaller players may well be limited to particular locations, and the links between them may be puny or even almost non-existent.

Be careful to understand the geography and how it fits your business purposes, particularly if moving from in-house to cloud will cause your data to cross a provincial or national border since you could be piling legal queries on top of the obvious access speed ones.

What's the SLA?

The final, but by no means less important, consideration is the service level agreement your provider is willing to conform to.

There's a simple methodology for evaluating an SLA:

Get a yellow highlighter and mark the target uptime figure. Work out the number of minutes per week/month/year the service is likely to be down, and multiply by two.
Get a red pen and cross out the bit that explains service credits.
Still holding your red pen, cross out any bits that have a guaranteed time to repair.
Now ask yourself if your adjusted downtime figure from the first step is acceptable. If not, don't sign up.
If the downtime figure is OK, ask the provider to agree that if it gets it spectacularly wrong three times in six months, you can terminate with 30 days' notice. If it says “no”, don't use it.

What I've just written probably sounds overly cynical, but actually it isn't. To take the steps in order:

What you care about is the downtime. Nothing else. So know what it's likely to be, and be conservative.
Service credits are a waste of time. When your CEO is bawling you out because his finance system is down for a week over year-end, he won't give a stuff that you're getting two months' free service.
Guaranteed time-to-repair figures are bollocks. Nobody can guarantee such a thing.
Think of the CEO and his year-end finance system outage; that is, when considering acceptable downtime, think of the worst possible case.
If you're getting dire service, you need to be able to get out.

Interestingly, the components of my SLA methodology fit different size providers differently. The uptime figures are far easier for a large provider to conform with, because in many cases they triple up on servers (one live, one hot standby, one staging for migrating to new software versions) and have highly available resources in all other components of the system. But on the contractual side you have a much better chance of getting a smaller provider to agree to your dire-performance termination clauses.

Oh, and stick to your guns. I've been known to drop a $2bn supplier from an RFP process simply because the SLA was a key evaluation criterion and they insisted that its standard SLA couldn't be flexed; interestingly even though the other two $2bn+ providers said that too, they did in fact agree to be flexible.

Conclusion

It's no surprise that there's no right answer. But in general, you're going to get a more robust but less personal service from the larger providers; you'll have a far more flexible service from the smaller guys but you absolutely must be completely clear on your requirements and the levels of service you consider acceptable and, more importantly, you're likely to receive.