Mastering Syslog on small networks

There's a point when running your own small business network where all the monitors start to get you down. At the outset, with just one PC acting as a server, leaving a monitor on it seems natural. But once you have three or four machines doing some part of the continuous duty roster typical in a modern small network, four redundant monitors look awful - they're always the spares, and the screens are burned deeply with the images of your most frequently repeated, least critical error.

When the time comes to tidy up that nasty pile of monitors, several home truths tend to hit home all in one shot. The first is that many functions can now be undertaken very nicely by small headless special-purpose boxes.

Furthermore, the days of errors being all about fatal crashes and blue screens are largely behind us, and much of what happens in your LAN can only really be dealt with once you have more than just one dialog box's worth of data about it. Lastly, the Event Viewer is a bit of a desert island: not every application uses it, and not every error-reporting collector analyses what happens inside it (plus, those little boxes like routers, print spoolers and firewalls don't speak Event Viewer anyway).

This brings us neatly to the subject at hand here. You're probably already running more than four or five services (for example, a firewall, an email server, and some dedicated single-purpose black boxes like NAS storage or proxy servers), and all of them have moved on from failing every week to sitting in the midst of a stream of traffic that you need to analyse over a period of time. In the case of a set of servers monitored by an IT professional in an outsourced support relationship, this can be over a very long period of time.

This is where, in smaller networks, Syslog comes into its own. It's another one of those 30-year-old monsters that has its own internal logic, and a rich variety of impenetrable customs and standards that have to be unearthed in order to be useful. Despite an early start at the turn of the 1980s for the utility, the IETF (Internet Engineering Task Force) has only recently, in 2005, attempted to hold the chaos of different Syslog formats to some form of standard.

Never mind that lengthy gestation period, the blossoming of boxes able to spout Syslog messages onto your network raises the problem of what to do with them. A complete standard is a way off yet, but there are some utilities that will at least make it easier for you to contemplate your role as the Sorcerer's Apprentice with perfect reliability.

Options

Before we discuss those utilities and their use, there are a few other contenders for this role that should be identifi ed, if only to reject them. One is SNMP (Simple Network Management Protocol). This too falls under the ambit of the IETF and, in fact, the group has had a good few bites at refining what's in it and why it should interest you. The problem is, it was developed to run on devices even smaller than the ones we're likely to see in a modern network today. Cisco, the guru of the router marketplace, is very big on SNMP - which you should take as a sign that it's generally used and found in much larger, centrally planned networks.

Alongside SNMP (and, in some cases, actually making use of it for inter-machine interrogation) are the giant network management suites from IBM, Computer Associates, HP and their competitors. In this field, names like Tivoli, Insight Manager and CA IMS signify whole universes of software, spread across workstations and servers, reporting every last collectable statistic and foolishly erased application - and, most importantly for our purposes, also responding to queries sent out by a central management workstation.

This is the opposite philosophy to Syslog. In a Syslog world, the central logging machine does the listening and the satellites do the talking: there's no live query process to go back to something that's fallen silent or might have been interfered with. A Syslog source just fires its one-liner reports at an IP address, which it's been assured hosts a Syslog server.

There's also one missing link in the picture here: I mentioned that the Windows Event Log is a bit of an island in a network with a lot of other platforms. In fact, it's a bit of an island even in a homogenous network, because you get to it through the Event Viewer and there's no immediate consolidation - you can read events on another server well enough, but you can't sort together all the events of the same type or from the same source. Even though I've been talking about design concepts and constructions, there's one plug-in without which the idea of getting intimate with Syslog seems somewhat pointless, and that's NTSyslog nts yslog.sourceforge.net): this service reads Event Viewer entries and fires them at a designated Syslog server. Snare (www.pcpro.co.uk/links/snare), is a little bit more of the same.

Now we have all our error messages from all our devices arriving in the same place and subject to the same manipulation.

The Sorcerer's Apprentice

As any Disney fan will know, there's an animated version of this fable with the music provided by Dukas, as Mickey Mouse discovers that a good idea to fix a short-term problem can turn into a very bad idea once it encounters the march of time. Syslog reporting can be a good deal like that.

For example, I've a firewall that will report in Syslog format, and everything it considers an event - whether it's an intrusion attempt or a service without a redirector target - gets thrown into the Syslog file, hosted on a nearby Windows server. Believe it or not, I have that Syslog server set to close the old file and open a new one every 100MB (yes, megabytes). It typically opens three or four such files a week. Leave it for a month and you have roughly 2.8GB of Syslog messages - and that's just one firewall.

This is where the apprentice makes his mistake. When persistent problems strike, diagnosis of what causes the niggling failure can make you think that you should monitor every last packet and twitch that passes through the network or hits the device in question. For example, let's say your router or your print server crashes every morning at 5.30am. You don't want to get up to watch it, you're fairly sure there's nobody in the building from the CCTV tapes, and you want to start watching the device from about 3am. I've heard of a few hard-core techies watching this kind of situation with a copy of Ethereal (www.ethereal.com), but this isn't the solution for everyone. Ethereal produces volumes of data that makes Syslog look anaemically under-endowed and depends on long years of experience in assembling packet filters with which to cut down the blizzard of data. You don't want to see the parity bits in the head and tail of the IP encapsulation of the packet that hits your device 5,000 times in two seconds: what you want is your device to say "buffer overflow due to time zone error" (to quote one example that still scars me) just at that exact right moment.

It's far simpler to only monitor messages arising from a device with some brains included than it is to wash through all the traffic that incorporates that device. The process is made even more complex, in these times of smart central network switches, by the need to arrange your Ethereal monitor machine on a simple LAN hub with the thing it's monitoring. Even this relatively humble, almost plain-electrical requirement can have pitfalls - I've seen one network reduced to a crawl for months because someone elected to monitor with Ethereal and then forgot to take the server off the 10Mb hub they'd used to permit Ethereal to eavesdrop.

The lesson from my firewall, however, is that, like the Sorcerer's Apprentice, you can find that turning on all the reporting options simply drowns you in data. For even a small business network, Syslog generators can trivially compose a database that is in reality your largest data store, in terms of number of transactions and horsepower required, to extract a meaningful report.

Follow the rules

But we're here to get a handle on making these things actually work. It's the eternal trickle of repeating messages that have driven almost all the commonly available Syslog servers to apply rules to the messages they collect. You can (as is common in analysing Syslog output from web servers, for example) elect to make an enormous data-history store of all the messages, or you can throw away all the messages except the ones you have your doubts about. These are set up by way of rule dialogs, as the examples from the excellent Kiwi Syslog Daemon (www.kiwisyslog.com - which is for Windows, even though "Daemon" is a name usually reserved for the Linux world) show.

These utilities will offer you a variety of message conversion and storage options. While these formats are important, don't be distracted by the rich variety of genealogies to each format and structure. By far the most useful part of the message is inside the text narrative and, just to confuse the issue, most devices reporting via Syslog tend to put comma separations inside the text string. This double-counts the use of commas in some of the outer formats, so simpler attempts at hand-driven interpretation or cutting into columns in Excel tend to fall at the first hurdle. This has fostered a huge amount of work on the tedious process of sorting, searching and reporting on Syslog events.

Here, you'll find a rich mix of software made up of innumerable bites at the corporate record-keeping business. Rather like the Data Protection Act in the UK, the Sarbanes-Oxley Act in the US has a prosaic definition (see the summary at www.pcpro.co.uk/links/sarbanes) with some entirely unexpected consequences. It seems to cause a lot of US businesses to think that if they run a firewall or a spam filter, they actually have to keep every last byte of data, and Syslog is one of the chosen formats to feature heavily in this ambiguous and expensively onerous procedural requirement.

This makes it quite hard to find a Syslog message-analysis product that hasn't been swallowed up by the Sarbanes-Oxley explosion. Fortunately, some more widespread applications that feature Syslog crunching and delivery of drill-down capability are still free of the hype zone. Of particular note are Sawmill (www.sawmill.net) and ReportGen (www.reportgen.com/index.php) - in both cases they collect up Syslog formats and slice 'n' dice in response to questions you may pose.

If you're operating within the walls of an organisation whose view on Sarbanes-Oxley is from the "keep everything" school, tools like these will be vital. If, on the other hand, a degree of sanity has prevailed, you should be adopting a relatively simple strategy:

Keep a fairly shallow event pool. That means, don't run more than about seven days of records, from all your devices, until you're sure there's something you need to watch.

Master the art of filtering as close to source as possible. This means, look at what your device will report and chop off as many irrelevant classes of reporting as you can possibly affect early in the chain of reporters. Picking up gigabytes of informational-level trash and then spending hours analysing for the occasional fatal error is a slow way to find you've asked the wrong question.

Don't worry about thinking up triggers for situations you haven't encountered yet. That's what the shallow "all event" log is for, so you can go and look for things you know are actually happening, as distinct from that web-forum obsessive style of security wonk, who loads his systems up with paranoid port blocks and intrusion-detection systems based on the current flame wars between the nosiest pundits. Trawling for obscure stuff, which in all likelihood is being blocked by your ISP anyway, isn't the point of Syslog.

Think up tests. This isn't quite as easy for logging conditions from firewalls as it is when testing anti-virus software with the EICAR signature file. But, to borrow one friend's loud and immediate example of why live logging is so important, you should expect to see a log entry come in when you unplug your firewall from your DSL router or leased line termination - and another when you plug it back in.

That last test sounds so simplistic as to be almost laughable. In small business networks, it should be blindingly obvious that a cable's been unplugged and, with the way small companies rely heavily on comparatively small IT investments, it should come to light immediately. However, a lot of those smaller companies now have co-located servers in data centres, and the forest of racks in a data centre can be a real wild west experience. It's amazing how easily your paid-by-the-gigabyte personal Ethernet link in the hosting space can be hijacked by someone else's hardware engineer, for example.

Syslog is quite possibly the latest latecomer to the party of Unix 70's standards rehabilitated into the PC computing mainstream. The gradual ratification process still under way, and the solid selection of usable tools for smaller businesses outlined here, should help you find a way to live without that wobbly pile of monitors as your first line of defence.