Skip to navigation
   
Simon Bisson & Mary Branscombe's Blog

It

By Simon Bisson & Mary Branscombe in Editorial

Posted in Identity, Security, Google, Internet on May 10, 2008 at 9:10 pm

Permalink | Author Profile

I find it easy to spot most of the phishing messages that hit my inbox, because there’s nearly always an egregious grammatical mistake in there somewhere. Real messages from banks may be full of logical errors (like a regular savings account with a headline rate of 7% that never tells you that actually it averages out nearer 4% because not all of the money gets to earn the high rate for the whole year), but the spelling is spot on.

And spammers are in such a hurry to put up the Web pages they want to earn ad money on, or use for drive-by downloads to increase the size of the botnet they use to spend most of the spam from zombie machines, that they often make stupid mistakes. If you’re checking 100 messages a day in your junk mail filter for anything real that got in there by mistake, I’m not sure if it’s any comfort to remember that spammers are only human. But Google finds it useful.

According to Matt Cutts of Google at Web 2.0, Web spammers often use templates and tools to build their pages. And fairly often they follow the commented-out instruction to ‘type your hidden text in here’ - but never delete that instruction. The tools they use to fill in forms are simplistic too; the captcha you have to complete to leave a comment here is enough to defeat most of them - but so is a box labelled email address with the instruction not to fill it in. When the bot adds whatever email address it’s abusing, you know you can just delete the comment. Simple maths or the instruction to type in a specific word are beyond bots - at least until Jeff Hawkins perfects Hierarchical Temporal Memory.

If you have a site, you need to think of things that raise the blood pressure of the spammers without doing the same to your users. It’s like being chased by any dumb but dangerous pack animal, says Cutts; you only have to run faster than the slowest person you’re willing to sacrifice. If your system is a little different from the default installation of whatever you use, the default attacks are less likely to work and the spammers may move on to slower prey.

Apart from the obvious advice to patch, patch and patch again, Cutts didn’t say much more - because every time you tell spammers how you’re spotting them, they get a chance to stop doing that. A lot of what Google knows about spam comes from the analysis it does of real Web pages, which lets it work out what things go together. If you know that timepiece and chronometer are synonyms for watch, those strangely-worded Rolex spams are easier to stop. You can see this classification in Google Sets and it’s used in Google Spreadsheets. The equivalent of Excel AutoFill does more than days of the week and months of the year, without you having to add the lists by hand; start with red, yellow and blue and Google Sets will add other colours. Start with lion, tiger, bear and you get other animals.

But you might also get wood, tin and cotton. That’s because Google Sets can’t always tell the difference between the list of animal names and the list of animal toys on the Web sites it looks at. It will learn; like spammers it will learn more quickly if someone tells it what it’s got wrong. But at this point, we get into a race between whether the anti-spam tools can learn faster than the spammers

12345
Rated: 80% (1 votes)
Loading ... Loading ...

Previous Post | Next Post

 
 
Comments

Comment by payday loans toronto - February 27, 2010 on 9:37 pm

I want to thank the blogger very much not only for this post but also for his all previous efforts. I found www.itpro.co.uk to be extremely interesting. I will be coming back to www.itpro.co.uk for more information.

Trackback by Ina Hathorn - February 9, 2012 on 5:26 am

sopa de caracol receta…

[…]here will get into the voice of the respective small varieties who claimed for currently being […]…

Make a comment

* required

* required

We stop spam using reCaptcha.
Type the words below and click Submit Comment.

   
Tag cloud

goview eu business continuity wave Bing VSSAdmin Opsware android phone settings data centre merger ANR Treo Pro private cloud CTO dvi insert SIM Silverlight navteq context server disk g-2 power NVIDIA mobile ofcom network Chrome SKU flash drive BitLocker gaming installer hibernation magic venture capital Pal bbc iplayer T-Mobile display hierarchical temporal memory fault market share 3G Eee PC information cards todo list Web 2.0 Tablet Kiosk gamer HSDPA moscow robot Windows Mobile Loki flash i-mate monitor Adobe business technology automation Large Hadron Collider Bill Gates colossus co-processor bugs fingerprint conferences MacBook Air media enterprise Xobni open Opera OEM calit2 etech griffin Asus ubuntu high performance computing RBL active digitiser video open source RSS search greenplum Mini-Note visualisation Acrobat Pro UMPC CIO Dell data loss prevention multiple monitors mapping cloud firewall SP1 augmented reality switch control panel aws how do I get the back off? upgrade secure 2009 microsoft security essentials infrastructure Greasemoneky xT9 instant messaging ikea thermo mainframe Hp 2710p exchange geocaching fire beta Crossfader hyper-v identity metasystem HSPA lockdown sun conference transcoding malware Location flex IDF national museum of computing Protected View mobile working codec bombe 2.0 patent GPL Java trends routing EMC 64-bit hp microsoft research distributed computing Facebook netbooks webkit ports screen ec2 installation utility security theatre Reqall politics design EEE ontier quiz public cloud Verbatim Live Mesh Netscape Sony LHC optical interconnects patch Tuesday ipv6 acquisitions BT ruggedized exabytes storage bandwidth Lenovo nvision08 fonts Fire Eagle Embarcadero culture ultraportable collaboration CardSpace disk space business technology optimisation Itanium biometrics developer Windows Server 2008 Tombstone Objects Trolltech demo09 citrix Credentica application compatibility education Mercury business model cold fusion futura catalyst data centre transformation cosmic rays 965 cloud computing pen computing Ruby adfs data tariff Mozilla database DOSBox HMT Trend Micro Barracuda smartphone BBC toshiba g-1 legacy AuthenTec mythbusters cables mobile data tariffs hardware productivity cam privacy MAX Microsoft backhaul power saving CERN GPU system center web bug battery life social networking Gartner RAZR web 2.0 expo TouchSmart performance DisplayLink disaster recovery CES screencam geneva laptop workflow people Google Sets Trampoline Bill Cheswick IM fingerprint scanner BlackBerry spam fighting MING geotagging Secunia usb appstore WWW moblin OFCOM accessories regulation service oriented enterprise CUDA advertising Hugh Thompson office 2010 DLP deborah adler Netscan LiveID email machine learning MIX Numenta streaming media annotation Palladium Safari server sprawl information Palm city networks semiotics regulations TSA safend SSD london beta test virtualisation software migration macro iPhone management Jeff Hawkins netiquette twitter Firefox identity theft anti-virus benchmark virus christmas NexT credit crunch docking station maps user interface mysql ADFS 2.0 accelerator applications Linux WEI turing Dopplr mscape OpenID Wimbledon SBS windows 7 licensing Jeff Jones navigation iPass SMB 2 traffic hold music forensics macbook mms 2009 teched virtual desktop phone management target Internet Explorer 8 HTC HP data loss isps RIA Seagate social engineering whitelist police Corsair enterprise architecture hacking relocation connectivity Google Spreadsheets hdmi wifi keyboard Nokia Smartbook user experience vulnerabilities mobile Istanbul natural interface consolidation CPU ipsec information rights management project NGSCB Xen training october Tablet PC anti-patterns Windows 7 vs Windows Vista power supply Nuance gabriola d2c winhec2008 BES geek tourism ProCurve amherst amazon tennis M&A Toshiba Portege R500 microsoft research wildfire Opteron media center ucsd O2 old software tele atlas Skyfire mobile network utilities Google support netbook IT value TechEd 2008 processors T9 timezones Vista direct access icons history identitity Qualcomm mobile broadband MacWorld 2008 ribbon Mono business pgp security images rtm IBM numbers rich client spam claims office thin client legislation evernote QWERTY Tom Hogan lawsuit IT transformation Visual Studio browser windows desktop. PC appzero Motorola search verdana Previous Versions Salesforce Moonlight emulator cloud service google online applications Wyse Gears mobility bea web2expo dual display cisco MWC Windows Server anti-trust parallel computing radeon RIM system management data DOS drivers bolt lost server MIX08 voice recognition pre-boot html electricity price business intelligence yahoo atom oracle Internet case Girl Geek Dinners wes ATI demo Tim Berners-Lee congestion charge future in review Ray Ozzie WinHEC tablet IIW2008b AIR logitech Clear RX apps office politics offload remove back community AMD Apple Intel setup Delphi Vodafone Ask.com O'Reilly interoperability GPS international roaming p2v wireless USB windows server 2008 r2 wubi meaning FUD vmware green IT analytics uninstall Volume Shadow Copy HTML 5 camera AskEraser IO rc encryption RSA 2008 no signal ClipMate MRDA Internet Explorer cellcrypt deperimeterization Google IO isp green printing Active Directory DSL gameboard Beacon .NET clean install Windows Live cracking Quest dual boot task bar Express Gate competition Magny-Cours fibre downturn IT automation mobile Linux it pro security paradox Tripit pixetell IT policy bletchley park NAS Ruby On Rails power cuts network troubleshooting Mark Hurd designer outlook innovation development voice ballmerbot SapphireSteel terabytes Enterprise 2.0 hard drive telecoms Frauenhofer mash-up WPF OQO
Advertisement
Advertisement