Datadog review

Does this networking monitoring tool live up to the hype?

A 3D rendering of a military robot dog
(Image: © Shutterstock)

CloudPro Verdict

Pros

  • +

    Great GUI; Extremely easy to get up and running; Highly flexible choice of what parameters you want to monitor and to what extent

Cons

  • -

    Could do with more than AWS in the commercial cloud integration realm.

System monitoring brings you a range of advantages if you do it correctly - from the ability to capacity plan right up to minimising hassle because you were able to react to an alert before the users starting calling.

Datadog is a cloud-based system and network monitoring service. This is an interesting because on the face of it you'd think that a bunch of servers constantly talking to a monitoring and management package would be too chatty to run over an Internet link (particularly given that the traffic flow is outbound, which if you're ADSL-connected is the slow half of your broadband link). My experience of doing global monitoring over a corporate WAN to a central host showed me that in fact the loading put on the links by monitoring traffic really isn't that much of a big deal.

Because Datadog is based out there on the web, the primary way of connecting devices for monitoring is to have an agent on said devices and have them call outbound to the Datadog server. That way you don't have to scare the pants off your security officer by telling them you need inbound management connectivity to all your hosts.

Once you're set up with the service (and there's a free trial to get you going) you'll need to deploy the Datadog agents to your various devices. I had a Yosemite-based Mac sitting on my desk and an Amazon Linux machine in my Amazon cloud setup. The Datadog getting-started wizard gives you a simple copy-and-paste command line to run in a shell window on each machine. So two quick instances of copy-paste-wait-a-minute later there I was with a pair of machines registered with Datadog. The setup wizard has an optional final step for those who, like me, use AWS and want to be able to integrate with its management systems.

Once you have systems being monitored, you can configure what you want to see and how you want to see it. The main page of the portal has a header bar with clickable sections, and the remainder of the screen shows the detail for the selected section. So you start with the Events section, which you can filter based on source, event priority and event status.

Then there's the Dashboards section, which you'll probably spend some time on because it's here that you can design your custom overviews on the systems you're managing. For normal dashboards you have the option of creating a “TimeBoard” (which leans toward graphs showing stats over time) or a “ScreenBoard” which is more about KPIs, incident counts and the like. There's also the concept of an “Integration Dashboard”, which is where the AWS management option mentioned earlier comes in, as it can hook into AWS at a level higher than simple server-based connectivity and tell you about your overall AWS (in my case EC2) status.

The Infrastructure tab isn't really what it suggests: it's more like a server-by-server list of what's going on in your world, with drill-down buttons for each entity. Moving quickly on, though, we have the Monitors section which is where you define what the system actually watches. You can do this by host (so you choose one or more hosts and define when to alert, what to say and to whom), by metric (which watches a particular measured value across your estate and reacts when it goes outside the tolerance you define) or by “integration” (we're back on the AWS specifics again – it's like the metric monitoring but at a higher level than just the servers).

You can also monitor process status, though you may need to add some extra configuration to the agent on the hosts whose processes you wish to monitor. Network service checks verify both HTTP or custom-defined network services (again some agent config tweaking will usually be needed).

The Metrics section lets you define the graphical aspects you want to see for your managed systems. As with the other sections the setup is easy to comprehend: you pick what you want to measure (disk I/O, network throughput, etc), then the systems you want to measure it on, then how you want to display it (e.g. one graph per host, per region, etc) and how you want to aggregate the data. Next along the top bar is the Team section, in which you define the people with whom you want to share access to the Datadog information, and then finally the Integrations section lets you work with things like the keys used for secure agent connections and the authentication into the Datadog API should you want to use it.

Before I fired up Datadog I was all ready to write it off as a bit of a mad idea (system monitoring in the cloud?!), but having used it I'm hooked. And y'know, that's not because it's a particularly fully featured system management package – it's not, after all. No, it's because it's just so well designed and so hugely usable. Getting started is easy. Installing the agents is easy and just works. Configuring dashboards is super-intuitive (looks like someone's been at the Advanced HTML 5 manual), and the same applies to configuring monitors and metrics. The AWS integration is a nice touch but I think they need to include more than just Amazon if they want this feature to be taken seriously, but to be fair to them they've done it thoroughly so let's hope they do the same for (say) Azure, Google or Rackspace.

Datadog isn't about to displace full-blooded in-house monitoring applications like SolarWinds, which have a bazillion features and are able to monitor everything to an obscene and often painful degree. But for a company that doesn't want to go to the nth degree of monitoring and simply wants to do a good and reasonably priced job of monitoring its servers to a fine-grained level of detail, it's a really good choice.

Verdict

A great system monitoring tool, which syncs up with AWS.