Skip to navigation
   
Simon Bisson & Mary Branscombe's Blog

It

By Simon Bisson & Mary Branscombe in Editorial

Posted in Identity, Security, Google, Internet on May 10, 2008 at 9:10 pm

Permalink | Author Profile

I find it easy to spot most of the phishing messages that hit my inbox, because there’s nearly always an egregious grammatical mistake in there somewhere. Real messages from banks may be full of logical errors (like a regular savings account with a headline rate of 7% that never tells you that actually it averages out nearer 4% because not all of the money gets to earn the high rate for the whole year), but the spelling is spot on.

And spammers are in such a hurry to put up the Web pages they want to earn ad money on, or use for drive-by downloads to increase the size of the botnet they use to spend most of the spam from zombie machines, that they often make stupid mistakes. If you’re checking 100 messages a day in your junk mail filter for anything real that got in there by mistake, I’m not sure if it’s any comfort to remember that spammers are only human. But Google finds it useful.

According to Matt Cutts of Google at Web 2.0, Web spammers often use templates and tools to build their pages. And fairly often they follow the commented-out instruction to ‘type your hidden text in here’ - but never delete that instruction. The tools they use to fill in forms are simplistic too; the captcha you have to complete to leave a comment here is enough to defeat most of them - but so is a box labelled email address with the instruction not to fill it in. When the bot adds whatever email address it’s abusing, you know you can just delete the comment. Simple maths or the instruction to type in a specific word are beyond bots - at least until Jeff Hawkins perfects Hierarchical Temporal Memory.

If you have a site, you need to think of things that raise the blood pressure of the spammers without doing the same to your users. It’s like being chased by any dumb but dangerous pack animal, says Cutts; you only have to run faster than the slowest person you’re willing to sacrifice. If your system is a little different from the default installation of whatever you use, the default attacks are less likely to work and the spammers may move on to slower prey.

Apart from the obvious advice to patch, patch and patch again, Cutts didn’t say much more - because every time you tell spammers how you’re spotting them, they get a chance to stop doing that. A lot of what Google knows about spam comes from the analysis it does of real Web pages, which lets it work out what things go together. If you know that timepiece and chronometer are synonyms for watch, those strangely-worded Rolex spams are easier to stop. You can see this classification in Google Sets and it’s used in Google Spreadsheets. The equivalent of Excel AutoFill does more than days of the week and months of the year, without you having to add the lists by hand; start with red, yellow and blue and Google Sets will add other colours. Start with lion, tiger, bear and you get other animals.

But you might also get wood, tin and cotton. That’s because Google Sets can’t always tell the difference between the list of animal names and the list of animal toys on the Web sites it looks at. It will learn; like spammers it will learn more quickly if someone tells it what it’s got wrong. But at this point, we get into a race between whether the anti-spam tools can learn faster than the spammers

12345
Rated: 80% (1 votes)
Loading ... Loading ...

Previous Post | Next Post

 
 
Comments

Comment by payday loans toronto - February 27, 2010 on 9:37 pm

I want to thank the blogger very much not only for this post but also for his all previous efforts. I found www.itpro.co.uk to be extremely interesting. I will be coming back to www.itpro.co.uk for more information.

Trackback by Ina Hathorn - February 9, 2012 on 5:26 am

sopa de caracol receta…

[…]here will get into the voice of the respective small varieties who claimed for currently being […]…

Make a comment

* required

* required

We stop spam using reCaptcha.
Type the words below and click Submit Comment.

   
Tag cloud

desktop. PC IM Netscape SKU claims Moonlight ClipMate private cloud Palm toshiba Crossfader biometrics tennis hacking Treo Pro open source data trends display CES office 965 DOSBox bolt Active Directory docking station macbook Bill Gates network data tariff g-1 business oracle BT cables management Vista etech remove back Visual Studio laptop Safari iPhone HMT phone management Girl Geek Dinners identitity html tablet NexT disk space utilities MacWorld 2008 ProCurve NVIDIA android ANR Facebook uninstall NGSCB Previous Versions mobile data tariffs isps M&A emulator O2 RIM telecoms cloud computing screen Large Hadron Collider service oriented enterprise cisco people legislation Apple WinHEC hibernation Skyfire EMC Salesforce futura web2expo conferences Internet anti-patterns CUDA T9 Tim Berners-Lee lost server Tripit MAX netbooks Mono direct access developer todo list web 2.0 expo rtm Asus anti-trust ucsd business technology optimisation mobile network mobility annotation culture Vodafone user interface business intelligence Wimbledon radeon virtualisation Smartbook disaster recovery dual boot workflow GPL insert SIM EEE Secunia education Ray Ozzie webkit windows server 2008 r2 virtual desktop Tablet PC Toshiba Portege R500 3G installation exchange benchmark troubleshooting Tom Hogan backhaul flash drive acquisitions Qualcomm server SBS hp microsoft research Trolltech screencam installer how do I get the back off? Internet Explorer 8 hard drive wubi national museum of computing IT value Chrome system center RSS search GPU cam london security paradox teched beta test quiz MIX venture capital Opteron Location OFCOM .NET Gears infrastructure Hp 2710p DSL HSPA flash outlook keyboard support CERN visualisation lawsuit competition high performance computing offload utility greenplum mainframe fingerprint NAS Verbatim smartphone business model history hdmi mysql Credentica magic mscape power Microsoft wes secure multiple monitors griffin meaning flex interoperability icons GPS instant messaging LiveID Frauenhofer goview WPF innovation VSSAdmin Wyse case Linux Windows 7 vs Windows Vista images OQO merger Istanbul encryption open Gartner switch DisplayLink evernote storage bbc iplayer citrix LHC yahoo bugs Itanium information rights management gamer aws battery life Fire Eagle IO cellcrypt applications ipv6 Motorola MWC ipsec upgrade monitor it pro Ask.com virus turing streaming media amherst mobile broadband green printing collaboration Google Spreadsheets Trend Micro software cold fusion moblin Google IO TSA target user experience electricity price calit2 Hugh Thompson mms 2009 windows 7 Ruby mythbusters windows legacy ruggedized Loki office 2010 fingerprint scanner browser ADFS 2.0 Nuance Adobe mobile ofcom network IBM Embarcadero data loss prevention timezones drivers tele atlas business technology automation deborah adler Palladium ubuntu system management pre-boot d2c Greasemoneky data centre transformation hierarchical temporal memory IT policy BES enterprise architecture vmware sun navigation g-2 old software design city numbers information ultraportable wifi camera rc regulation thin client geneva Mini-Note market share Silverlight p2v database OEM SMB 2 project MacBook Air future in review AskEraser lockdown robot international roaming isp dvi QWERTY netbook RIA media center verdana microsoft research moscow AMD politics HSDPA xT9 Protected View search microsoft security essentials usb mobile working CardSpace Clear RX Jeff Hawkins WEI information cards mobile CPU maps AIR HTC security theatre WWW privacy setup Opera phone settings whitelist DLP anti-virus wave Pal connectivity voice recognition rich client Windows Live designer bletchley park mash-up application compatibility enterprise voice Beacon CTO bea deperimeterization Dopplr Lenovo Windows Mobile augmented reality optical interconnects ontier machine learning geotagging OpenID Sony Intel data centre fonts ec2 Mark Hurd 2009 firewall media appzero IT automation fire Delphi co-processor Dell christmas IDF Google Sets identity metasystem DOS geocaching relocation cloud amazon accelerator ballmerbot atom RAZR gabriola cosmic rays licensing 2.0 Windows Server 2008 Mercury regulations Eee PC Tombstone Objects logitech wildfire disk semiotics netiquette green IT gameboard IIW2008b SP1 consolidation Netscan mapping Volume Shadow Copy power saving congestion charge SSD social engineering TechEd 2008 navteq natural interface Corsair migration O'Reilly Windows Server performance exabytes FUD conference downturn wireless USB parallel computing Ruby On Rails development MING RSA 2008 winhec2008 i-mate bug SapphireSteel advertising hyper-v active digitiser security server sprawl credit crunch Acrobat Pro traffic pixetell macro Nokia MRDA ATI patch Tuesday colossus context processors appstore mobile Linux ikea october Java power supply Bill Cheswick Barracuda forensics TouchSmart Enterprise 2.0 Express Gate social networking ports Opsware Live Mesh business continuity catalyst iPass HP spam RBL Bing Mozilla police routing task bar distributed computing data loss Jeff Jones Xen Google video gaming Xobni Magny-Cours geek tourism codec fibre Reqall analytics hardware demo09 safend web bombe identity theft CIO eu vulnerabilities HTML 5 Web 2.0 nvision08 64-bit office politics transcoding IT transformation community spam fighting thermo AuthenTec power cuts Quest training networks cracking control panel clean install Firefox fault malware Seagate accessories Tablet Kiosk dual display adfs no signal pgp public cloud BBC ribbon T-Mobile apps patent BitLocker hold music UMPC twitter demo terabytes Trampoline Internet Explorer productivity MIX08 cloud service google online applications Numenta beta BlackBerry pen computing email bandwidth
Advertisement
Advertisement