Skip to navigation
   
Simon Bisson & Mary Branscombe's Blog

It’s a good thing spammers aren’t smarter

By Simon Bisson & Mary Branscombe in Editorial

Posted in Identity, Security, Google, Internet on May 10, 2008 at 9:10 pm

Permalink | Author Profile

I find it easy to spot most of the phishing messages that hit my inbox, because there’s nearly always an egregious grammatical mistake in there somewhere. Real messages from banks may be full of logical errors (like a regular savings account with a headline rate of 7% that never tells you that actually it averages out nearer 4% because not all of the money gets to earn the high rate for the whole year), but the spelling is spot on.

And spammers are in such a hurry to put up the Web pages they want to earn ad money on, or use for drive-by downloads to increase the size of the botnet they use to spend most of the spam from zombie machines, that they often make stupid mistakes. If you’re checking 100 messages a day in your junk mail filter for anything real that got in there by mistake, I’m not sure if it’s any comfort to remember that spammers are only human. But Google finds it useful.

According to Matt Cutts of Google at Web 2.0, Web spammers often use templates and tools to build their pages. And fairly often they follow the commented-out instruction to ‘type your hidden text in here’ - but never delete that instruction. The tools they use to fill in forms are simplistic too; the captcha you have to complete to leave a comment here is enough to defeat most of them - but so is a box labelled email address with the instruction not to fill it in. When the bot adds whatever email address it’s abusing, you know you can just delete the comment. Simple maths or the instruction to type in a specific word are beyond bots - at least until Jeff Hawkins perfects Hierarchical Temporal Memory.

If you have a site, you need to think of things that raise the blood pressure of the spammers without doing the same to your users. It’s like being chased by any dumb but dangerous pack animal, says Cutts; you only have to run faster than the slowest person you’re willing to sacrifice. If your system is a little different from the default installation of whatever you use, the default attacks are less likely to work and the spammers may move on to slower prey.

Apart from the obvious advice to patch, patch and patch again, Cutts didn’t say much more - because every time you tell spammers how you’re spotting them, they get a chance to stop doing that. A lot of what Google knows about spam comes from the analysis it does of real Web pages, which lets it work out what things go together. If you know that timepiece and chronometer are synonyms for watch, those strangely-worded Rolex spams are easier to stop. You can see this classification in Google Sets and it’s used in Google Spreadsheets. The equivalent of Excel AutoFill does more than days of the week and months of the year, without you having to add the lists by hand; start with red, yellow and blue and Google Sets will add other colours. Start with lion, tiger, bear and you get other animals.

But you might also get wood, tin and cotton. That’s because Google Sets can’t always tell the difference between the list of animal names and the list of animal toys on the Web sites it looks at. It will learn; like spammers it will learn more quickly if someone tells it what it’s got wrong. But at this point, we get into a race between whether the anti-spam tools can learn faster than the spammers…

12345
Rated: 80% (1 votes)
Loading ... Loading ...

Previous Post | Next Post

 
 
Comments
This article has no comments yet.

Make a comment

* required

* required

We stop spam using reCaptcha.
Type the words below and click Submit Comment.

   
Tag cloud

wubi Xobni MacWorld 2008 Visual Studio SapphireSteel Mercury natural interface macbook Tom Hogan Adobe eu Large Hadron Collider Express Gate WWW digital signature transcoding video firewall media LHC RSA 2008 Nuance Nokia Internet CUDA EMC patent Opsware beta virtual desktop codec toshiba NexT CTO security paradox Netscan TNT battery Jeff Jones griffin. microsoft research network analytics mobility fire storage gaming Google Sets mobile data mobile data tariffs todo list timezones exabytes IT transformation NGSCB Previous Versions TechEd 2008 Gears evernote moscow user experience national museum of computing conferences Trolltech onboarding MING CIO business intelligence DisplayLink visualisation migration isps performance Tripit regulations AuthenTec credit crunch mobile Linux Windows Live ballmerbot Girl Geek Dinners HP wifi thin client 24 hours ProCurve open offload Trampoline mobile working Palm WinHEC geocaching processors fingerprint scanner hierarchical temporal memory Ask.com fibre Mozilla ADFS 2.0 interoperability phone management Tim Berners-Lee accessories Frauenhofer GPU mythbusters Jeff Hawkins Internet Explorer 8 deperimeterization Google IO CardSpace spam social networking NAS LiveID disk calit2 laptop terabytes legislation Toshiba Portege R500 OFCOM data centre Embarcadero SMB 2 payroll vulnerabilities pen computing hp microsoft research Silverlight software MRDA OQO identitity Bill Cheswick UMPC OEM bombe hacking Location Delphi adfs server cisco Mono automation Crossfader HTC turing Salesforce enterprise support cosmic rays anti-virus ucsd 3G xT9 installer voice recognition Firefox mash-up community GPS bbc iplayer streaming media colossus Credentica forensics service oriented enterprise networks robot utilities iPhone christmas business continuity quiz bandwidth BT green IT information cards geek tourism conference Corsair Ruby On Rails Lenovo ubuntu Google Greasemoneky Hugh Thompson fault business business technology automation winhec2008 smartphone power benchmark Beacon hold music mobile ofcom network Barracuda mscape education power cuts dual display WPF Hp 2710p Fire Eagle patch Tuesday wildfire exchange user interface productivity upgrade electricity price identity metasystem Facebook O'Reilly IT value VSSAdmin Tablet Kiosk Reqall telecoms Dell EEE Motorola isp politics biometrics O2 advertising identity theft Web 2.0 Ray Ozzie management numbers HSDPA privacy fraud yahoo HR automation Ruby Secunia windows 7 SSD whitelist Palladium distributed computing control panel Xen Tablet PC optical interconnects IT automation wireless USB developer .NET Vista i-mate Google Spreadsheets active digitiser amherst machine learning oracle cables flash office green printing 64-bit disk space cloud service google online applications co-processor mysql SP1 etech enterprise architecture HTML 5 CPU National Insurance ruggedized Verbatim T9 Intel sun Apple information Loki nvision08 high performance computing business technology optimisation browser Volume Shadow Copy case Internet Explorer traffic Enterprise 2.0 HMT history spam fighting MacBook Air greenplum AskEraser regulation virtualisation blog security theatre acquisitions TouchSmart bletchley park RBL geotagging bea lawsuit camera desktop. PC Seagate Trend Micro TSA RIA Live Mesh Linux Numenta licensing Gartner SSVAGENT.EXE images Microsoft Windows Server 2008 open source Asus MIX08 Wyse spin BBC power supply troubleshooting hardware geneva security OpenID CERN parallel computing IIW2008b fingerprint accelerator RAZR pgp email NVIDIA QWERTY CES merger SBS IBM IDF Dopplr DSL AMD Bill Gates cracking html Windows Mobile Moonlight provisioning
Advertisement
Advertisement
Advertisement