Cloud KPIs: getting to grips with the appliance of science

Check engine light on a tachometer

As every hard-pressed CIO knows, the first law of data management states that as the automation of global business systems increases, so does our propensity and need to exert management layers upon those systems to monitor total performance.

In practical terms this often necessitates the use of Key Performance Indicators (KPIs) designed to measure current activities against “defined operational goals” on an ongoing basis.

In terms of physical form and function, KPIs can be ‘quantitative’ to represent a sales target figure for example, ‘practical’ when aimed at refinement of a business process perhaps, or ‘directional’ if a firm wants to channel its market focus into a new business vertical. They can also be just plain and simple ‘financial’ … and this, we hope, needs no real explanation.

So, as we now direct the analysis offered by KPIs to monitor the performance of applications and data storage services offered by hosted computing environments, what practical advice can we offer for using these tools in the cloud space?

In every case, a KPI’s defining parameters should make it comprehensible, measurable and actionable, otherwise it becomes meaningless and worthless and therefore not controllable in the first place. So a cloud customer’s first task in this regard is to look for total clarity in the service contract offered before it is agreed to.

While it might sound obvious to say “read the small print first”, this detail will form the DNA of your cloud KPI. Bad information (or data) leads to bad reports, leads to bad management, leads to bad decisions etc.

In order to live and breathe, a healthy KPI needs data to feed on. So tracking cloud-based applications requires that the app services themselves offer open channels for data capture and subsequent analysis. In the realm of the thoroughbred KPI, real-time graphical analysis of performance data is everything.

The KPI’s access to cloud application performance data should ideally be facilitated through a single data store or repository - and data transport channels should be clearly defined and uncluttered. But this a big technical ask. So who owns the responsibility for the architectural engineering of the software required to carry out this task? Is it the cloud customer who wants to bring the KPI to bear, is it the hosting provider or is it a third party specialist vendor?

Where can I buy a cloud KPI?

"Performance of a typical cloud based solution needs to be multi-layered; for example we need to monitor CPU, RAM, disk resources and network capacity within the cloud environment utilising tools such as Cacti, Nagios or IBM’s Tivoli,” suggests Peter Chadha, chief executive and founder of technology advisory firm DrPete.

Also available is Compuware’s CloudSleuth, a partner-driven free cloud community site where anyone can go to monitor the application performance and availability of cloud service providers using real-time data. This is a tool for anyone who is considering deploying or managing cloud applications - and also for checking who the best providers are and ensuring KPIs are in-line with expectations.

Compuware claims CloudSleuth is the only cloud community specifically built to spotlight the performance of the cloud’s federated infrastructure. This means that when analysing cloud applications, it is carefully mindful of the fact that service based software is typically composed of a web-based supply chain from multiple hosts, many of which may be outside the direct control of the core application’s owner.

Looking at this transactional maze of third party cloud services forming what some have called “borderless applications” and their challenges, Compuware’s director of IT service management Michael Allen argues that having the ability to take back control and maintain visibility is really important when measuring Key Performance Indicators. “How can you measure what you can’t see? This requires a different approach to manage the availability and performance of modern application that only new generation of application performance management solutions can tackle.”

Which KPIs should we use in the cloud?

Cloud providers will issue Service Level Agreements (SLAs) intended to cover the Quality of Service (QoS) that their hosting contracts pledge to deliver. But these SLAs are too often criticised for being paper-thin promises that merely exist to encourage customers to sign up to the cloud in the first place.

Given the concern that so many potential cloud customers voice over security and encryption in the cloud, one would not reasonably expect a data security KPI to be of much use.

Richard Moulds, VP of product strategy for Thales e-Security points to his firm’s survey of 4000 business managers entitled ‘Encryption in the Cloud’. Moulds reports that “nearly two thirds” of respondents say they do not know what cloud providers are actually doing in order to protect the sensitive or confidential data entrusted to them.

“We can infer that data protection in the cloud is not measured accurately by the majority of organisations as a KPI. Consequently, there is an enormous opportunity for cloud providers to differentiate themselves by demonstrating clearly to customers what they are doing to protect their data,” said Moulds.

“On one hand the use of encryption is easy to measure and a useful KPI - data is either encrypted or it is not - but as with many security technologies there are subtleties in the implementation. Organisations should look beyond the claims of cloud providers and create a KPI that measures against a list of standards relating to due care for cryptographic security. If your organisation ticks all the boxes against this list, encryption is being deployed well,” he added.

Precision engineering KPIs

So we can surmise that if used with an intelligent and holistic view of the de-coupled universe that they must monitor, cloud KPIs can be used to discover and map relationships between virtual machines and their physical hosts. If we then correlate an application’s performance with the behaviour of a given cloud infrastructure, we can detect “contention problems” on the physical host pinpointing the exact virtual machine and application overused the physical host’s resources (eg CPU, disks).

This is the view of Ted Lester, a senior director of professional services at Precise Software, a company that has come from the traditional SLA and APM (application performance monitoring) market to now focus on the cloud space. Lester also points to clusters as being prevalent in cloud environments and therefore a direct target for KPI-level analysis.

"It is crucial to know whether the [cluster] performance behaviour is uniform among all clustered instances or unique only to some of them. Knowing the load balancing behaviour determines whether the problem is in the application’s design (the degradation is common to all instances), or whether it is specific to the resources given to a single instance (degradation is experienced only by one instance). We should then collect and analyse the load balancing patterns, to compare load and response times for all clustered instances,” said Lester.

So where do we stand? We need cloud-focused KPIs for analysis into application performance, for analysis of inter-application performance and indeed intra-application performance where composite federal data elements feed service-based software instances.

Given this multi-dimensional framework that we know we need to lock down and gain insight into, shouldn’t we have been talking about this in depth a little earlier? Our industry ranking for analysis and discussion into cloud KPI issues is, until now at least, way below the SLA that we should have been aspiring to.