IT Pro Panel: The worst IT horror stories
Our Panellists share some of their best tales from life on the front-lines of technology
Go to virtually any tech conference in the world, and once the convention hall doors shut for the evening, chances are good you’ll find groups of battle-hardened IT professionals swapping war stories in the hotel bar. Whether it’s botched database migrations, users with 13 different browser toolbars or road crews accidentally severing network cables, most techies have a sizeable collection of horror stories.
When you’re as experienced as the members of the IT Pro Panel, that collection becomes larger than most, and since we can’t swap these stories in person at the moment, for this month’s panel discussion, we got some of our Panellists together to virtually share some of their tales of woe.
Editor’s note: For this month’s feature, the names of our Panellists have been changed in the interests of privacy.
Security is a perennial problem for CIOs and CISOs alike, but most of the time it’s not crafty hackers that keep them up at night, or even malicious insiders - it’s well-meaning but sadly clueless staff members. Andy recounts an outbreak of an email virus within his organisation, which took two days to clean up and secure after they traced it back to the original source.
“We turned the email back on and were monitoring everything. All was good, until the same user that had caused the initial infection, opened a direct copy of the original email and clicked on the exact same link - my language was rather choice!”
“It always amuses me when you do a phishing exercise,” adds Greg, “and an employee interacts with the email, but the link doesn't seem to work, so they forward it to their phone; when the link still doesn't seem to work, they forward it to Gmail... You can't fix stupid, I guess!”
For Sandra, a career in the trenches of the helpdesk gave them a lifetime’s worth of examples of user error, including one who tipped coffee all over a floppy disk, then put it in the drive to make sure it still worked. “I have a real nostalgia for those days,” she says; “It was good fun, if occasionally frustrating.”
“I did see a guy lift the mouse from the table and literally point it at the screen when we told him to point the mouse at the button and click it,” says Alan. “He couldn't understand why the pointer was still not moving when he was clearly aiming it at the button!”
“A new project manager was sent into a steering group to give an update on various projects. One wasn’t going to plan due to the ‘flux capacitor’ which needed replaced and we were struggling to resource it,” Alex says. “She only realised when everyone who was of a certain generation started creasing up. Poor girl. We all became great friends though and it was good team building!”
Even technically-skilled staff aren’t immune to mistakes, though, and many of our panellists recall tales of engineers and operators bungling database updates and the like. Hank, for example, remembers an engineer who managed to change all of the user passwords in a database to the same password, thanks to a forgotten ‘WHERE’ clause.
Jordan shares a similar story, where a colleague called him in a panic to tell him that he’d accidentally executed a mass update on all users. “His excuse was that he sneezed, accidentally highlighted the ‘UPDATE’ statement (minus the ‘WHERE’ clause) and hit execute,” he says. “Must have been one hell of a sneeze!”
“I've worked in a few organisations that have been burned by the 'let’s name everything in test the same as production' scenario,” Greg says. “That's always a fun one to watch unravel as long as you aren't in the firing line! It always amazes me how some DBAs think they are above change management when it comes to 'little tweaks', given a simple messed-up SQL statement can cause a world of hurt!”
Hank also witnessed a supplier’s engineer accidentally drop an entire production database - an action which was then automatically replicated across all of the linked replica databases, resulting in many days spent trying to recover the data. “One of my many questions for their CTO was how that was even possible!”
For reasons like this, working with SQL always terrified Sandra, she says, and it was nerve-wracking every time she had to use it to make manual changes to a finance system. She never made a mistake with it, but one business analyst she worked with did, making the mistake of making an extract of the database, but naming it exactly the same as the original file and saving it higher up the search path - a fatal error when working with systems like IBM’s AS/400.
“Fortunately, once I realised what he had done, it was a quick fix to just nuke it,” she says. “I very nearly nuked him too! I also explained the utter idiocy of calling a file by the same name as a critical application master file!”
Speaking of nuking things, it can be frighteningly easy to cause all sorts of problems with the wrong button-press, as Greg found out when a third-party contractor was performing an infrastructure upgrade back in the early 2000’s.
“Mister contractor popped out for a toilet break,” he says, “and as he left the data hall, he punched the door release button, as we all do.... But what sort of idiot puts a SCRAM switch” - a manual kill-switch for shutting equipment down in case of emergency - “next to a door release button, without a protective cover?! Nighty night data hall! Scram switches are not known for their elegance in shutting stuff down, and this one left a bit of a mess to clean up!”
There is one adversary more feared than any bungling user, though; a menace that stalks the halls of data centres, leaving chaos and confusion in their wake, whom all IT staff tell tales of: the dreaded cleaner. Rare is the IT admin who can’t remember a time when a piece of crucial equipment unexpectedly went down, only to discover after much frantic troubleshooting than a cheerily oblivious cleaner had unplugged it in order to vacuum.
Greg was called in to deal with one such incident, when an entire radio production studio was taken offline, while Alan grappled with a similar issue affecting the editorial staff of a weekly magazine he was working with.
“For a couple of weeks, between 6 and 8PM with the deadline looming, the pictures department floor would go down for about 20 min, bang in the middle of the deadline,” he explains. “The cleaner was unplugging the switches for that floor, as it was the easiest socket she could reach. Of course, the pictures department did not put two and two together despite the lines going down at the exact moment the cleaner was hoovering.”
In fact, power management was something of a theme among our panellists’ stories. As it turns out, a shocking number of them had at various points discovered data centre equipment being run off of consumer-grade extension cables. Worse still, Hank discovered on one visit that the extension cable in question had snapped and been repaired with tape, noting “my case for moving that company's infrastructure to the cloud suddenly got a lot stronger!”
Alan, meanwhile, discovered in one client’s machine room an entire rack which was powered by a single 13-amp socket with not one but three daisy-chained extensions running out of it. There were even four unused 16-amp sockets in the room for this very purpose, but Alan reveals that they hadn’t even been connected up. When they eventually did activate them, they put them all on the same line, thereby negating any redundancy measures.
“Eventually I got it sorted, but as I depended on their in house electrician team, it took several iterations to get it right.”
“I remember working in an organisation once that suffered a power cut, knocking out teams and teams of data entry clerks,” Greg says. “The whole building was supposedly on UPS and we had two mothers of red-diesel generators sitting behind the batteries.”
“However, the dreaded world of the four-gang adaptors had placed most of those workstations on the dirty power circuits over time, so when the power kicked out they all died. When the power came back on, they all surged and threw the breakers, kicking the power out again ... repeat until false. Moral of the story - IT and facilities management really do need to work closely together!”
Another area where collaboration between facilities and IT is crucial is climate control. As any junior techie knows, servers are rather fussy about temperature, and inadequate cooling can lead to severe problems, to say the least. That’s why Sandra wasn’t pleased when the facilities manager at a previous job decided to move one of the major cooler units from the end of the data centre where her smaller data room was, to the other end of the facility.
“It was one of the hottest days of the year! We got a panicked call from finance to my team as the system crashed, having fried (pretty much literally) two disks,” she says. “It was a hot summer, so the only replacement unit we could get was a tiny condenser, so my kids got taken into work every three to four hours over the weekend, so we could empty it.”
Andy also had to make do with a jury-rigged cooling system built from a 50m extension lead and duct-taped vent tubes to create what he described as a “Dalek air conditioner”, while one particularly hot summer in 2003 led Greg to quickly learn how to use the open-source Nagios monitoring platform so he could keep an eye on the finance servers’ temperature without camping out in the data centre.
“In a previous role we had a small server room that hosted our internal QA environments,” Jordan says. “One morning, during the release of a test version of our software and with a couple of devs gathered around my desk to observe, I was booted from the RDP sessions I had open for no apparent reason. With colleagues pleading their innocence with regards to forcibly booting me from my connections, it was naturally time to check in the server room to investigate further.”
“What we found was an aircon unit spraying water directly onto the power outlets! The servers weren’t too happy about this but thankfully, with an aircon engineer on call within the hour and a quick trip to a local shop to buy a small electric fan to dry the carpet, the servers came back to life and we were able to continue. We learnt that day that, regardless of how confident you are about hitting deadlines and achieving sprint goals, sometimes you have to resign yourself to it being out of your control!”
As Jordan notes, sometimes there really is nothing you can do - as Andy found out more than twenty years ago when the twin mirrored servers that his business ran on simultaneously failed, saying: “UPS and surge protectors were unknown and when they went bang, they went in unison!” Thankfully, they had a fallback server, and a backup tape they could use to set it up. This tape was taken home by the finance director every night.
“That morning, the finance director was late coming in, having had to move his kids bike from in front of his car. He moved the bike and all was well.... until he drove over his briefcase - and the back-up tape!”
Sometimes, it really is your fault
Of course, as much as IT professionals curse the thoughtless actions of others, sometimes when things go wrong, there isn’t anyone to blame but yourself. Whether it’s an unpredictable freak accident or a genuine mistake, every tech worker has at some point made an error that they’ve had to deal with.
“One of the first tasks I had as a new security analyst was to image the laptops of some executives (including the CEO) to investigate a data leak,” says Frank. “I was happily imaging away, write blocker and power supplies all connected fine. Next minute, I managed to completely brick the CEO’s laptop hard drive.”
“I then find out that, being the CEO, he had ignored all the backup and document management guidance. So I had to sheepishly explain to him that we may have lost all his data.... not my best day!”
Few problems in IT are unique, and migration projects are one of the most frequently troublesome, but when Alex headed back to her hotel after wrapping up a user migration, she thought they had managed to pull it off smoothly - until she got the call saying that the call centre was in meltdown as customers called to complained that they couldn’t log into the service.
“Turned out the security credentials hadn’t migrated quite as well as we’d thought,” she said. “It also turned out that telling customers ‘the website will be unavailable this weekend but come back 9AM Monday to see the wonderful new site’ wasn’t the best idea, given the volume of traffic that clearly drove at said 9AM on the Monday morning.”
Problems like this are extremely common, and many IT departments will have experienced similar issues at one point or another. Occasionally, however, there are some stories that are utterly unparalleled - as Greg demonstrates.
“I was working for an international IT services org at the time, and we were installing a mid-range mini into a financial services client over the weekend. They had a data centre on the first floor of an office block; unusual, I know, but the client is - of course - always right, so no questions were asked.
These things are heavy. Real heavy. The creaking as the pallets were wheeled across the floor should have been the first clue. To be fair, the floor held up long enough for the installation to be completed, but in true Only Fools and Horses style, as the chaps were patting themselves on the back for a job well-done…”
“... to be honest,” Greg says, “it looked better in the GROUND FLOOR reception area anyway!”
If you're a senior IT decision-maker and you'd like to apply to be part of the IT Pro Panel, please email email@example.com.
Unlocking collaboration: Making software work better together
How to improve collaboration and agility with the right techDownload now
Four steps to field service excellence
How to thrive in the experience economyDownload now
Six things a developer should know about Postgres
Why enterprises are choosing PostgreSQLDownload now
The path to CX excellence for B2B services
The four stages to thrive in the experience economyDownload now