Skip to navigation
   
Simon Bisson & Mary Branscombe's Blog

Analytics get distributed, parallel and mathematical

By Simon Bisson & Mary Branscombe in Editorial

Posted in data warehouses, analytics, Applications, Storage on October 16, 2008 at 7:47 pm

Permalink | Author Profile

We had a very interesting conversation today, talking about the next generation of business analytics with folk from Greenplum. The most interesting piece of their story was just how their application works with data.

With no legacy to build on, the Greenplum engineers could take a very different architectural approach. Traditional databases use a single store, and a single query engine. Greenplum’s tools break data up into parcels, sharing it across every machine in their data processing network. A central server keeps track of where the data is held, and manages queries - which can be broken up and delivered to the appropriate servers, the results being assembled by the controller. Supercomputer aficionados will immediately spot that Greenplum are using a shared-nothing approach, where queries can run in parallel on sections of the data - speeding things up considerably. Having a master controller handling scheduling means you can even use unmatched hardware for your data servers.

Complex joins can be handled in a similar manner, with queries moving data between servers and assembling results on many different processors. With quad core a commodity, and six and eight following close behind, it’s not going to be difficult to build a powerful data processing farm (and use the same hardware for other tasks when you don’t need high level analytics).

There’s another spin out of the architecture - you can mix different query types in one analytic operation. With Greenplum’s tools you can mix SQL with Google’s MapReduce, and even throw in the R statistical language for complex mathematical operations. Modelling is an important piece of business analytics, and means that Greenplum’s tools are able to compete with high-end analytical tools like SAS. There are plenty of interesting use cases here - perhpas you’re currently working with massive data sets that take a week to process and a day or so to feed into predictive models. With Greenplum you’ll be able to load the data in parallel and run your statistical models on the data - giving you a considerable speed advantage.

Moore’s Law has hit the wall. Intel’s spectacular U-turn showed as much, as clock speeds dropped and the number of cores went up. That’s left software developers with something of a challenge -

12345
Not yet rated
Loading ... Loading ...

Previous Post | Next Post

 
 
Comments
This article has no comments yet.

Make a comment

* required

* required

We stop spam using reCaptcha.
Type the words below and click Submit Comment.

   
Tag cloud

deborah adler HMT business model Windows Server 2008 teched MWC Ask.com business continuity city Ruby On Rails Internet Explorer 8 p2v laptop IDF docking station demo macbook Barracuda SSVAGENT.EXE virus voice recognition training management html data centre system management OpenID ClipMate uninstall etech it pro trends Large Hadron Collider co-processor SKU National Insurance T9 Firefox wifi Opera mobile working ipv6 mobile lawsuit Tablet Kiosk streaming media numbers connectivity rtm telecoms data usb Trampoline RSS search forensics regulations installation 2.0 maps annotation private cloud insert SIM business Nokia Intel business intelligence Palm cam lockdown design Trolltech enterprise architecture national museum of computing utility police Express Gate power saving iPhone navigation TNT HTML 5 Netscape BitLocker ruggedized NGSCB flash drive Tripit MIX networks Sony network AskEraser london conference Facebook images patent migration Mark Hurd hard drive Palladium CPU Safari ballmerbot TouchSmart HTC AuthenTec todo list business technology automation AMD Location mash-up ANR NVIDIA AdaLovelaceDay09 Opteron O2 Reqall xT9 Hugh Thompson Vista T-Mobile netbook RSA 2008 mobile data tariffs Skyfire rich client Magny-Cours augmented reality mapping Bill Cheswick beta test credit crunch Vodafone wireless USB applications codec venture capital open UMPC Ray Ozzie SSD bombe data loss EMC hp microsoft research Web 2.0 robot Toshiba Portege R500 WPF ATI power cuts october 2009 task bar business technology optimisation GPU WEI history whitelist deperimeterization HSDPA fibre OFCOM database mobile ofcom network international roaming active digitiser Tablet PC windows 7 green IT CIO IO phone management adfs geotagging Mozilla TechEd 2008 IBM wubi WWW ports Istanbul mobility gamer Verbatim visualisation Google SBS routing media data tariff O'Reilly Jeff Jones Opsware CUDA performance cold fusion cracking exchange software catalyst identity theft dual display OEM Crossfader target no signal server sprawl BlackBerry toshiba Treo Pro IM .NET culture security paradox user experience productivity Internet Explorer 965 fault sun EEE IT policy biometrics Enterprise 2.0 Secunia nvision08 data centre transformation Delphi machine learning amazon multiple monitors pixetell interoperability market share ubuntu optical interconnects education RIA licensing voice QWERTY vmware consolidation Quest regulation development i-mate Google Spreadsheets radeon mobile Linux workflow spam enterprise CERN innovation greenplum BES identitity Bill Gates MIX08 g-2 Ruby safend Beacon RBL advertising cosmic rays ucsd Gears WinHEC system center beta Windows Server congestion charge backhaul Silverlight cables aws SapphireSteel Tombstone Objects Lenovo mscape security theatre bbc iplayer ultraportable ProCurve Internet eu microsoft research winhec2008 hierarchical temporal memory Jeff Hawkins Visual Studio Clear RX Xen isps wave Apple pre-boot Pal anti-patterns SP1 icons colossus Nuance M&A Wimbledon future in review people web goview disaster recovery wes offload IT transformation Windows Live Active Directory gameboard installer vulnerabilities digital signature gaming collaboration citrix drivers MAX TSA Google Sets identity metasystem display tele atlas service oriented enterprise accelerator bletchley park conferences desktop. PC pen computing turing BBC Salesforce firewall distributed computing fingerprint Girl Geek Dinners Volume Shadow Copy Previous Versions high performance computing Live Mesh office green printing MacBook Air flex CTO bea social engineering case NexT hardware email cisco relocation bug instant messaging privacy traffic outlook fingerprint scanner moscow infrastructure RAZR hold music Frauenhofer dvi g-1 camera tennis acquisitions social networking Dopplr cloud computing netiquette Dell geek tourism Linux virtual desktop Chrome ontier griffin merger Trend Micro support Asus thin client cloud legislation mythbusters Windows 7 vs Windows Vista evernote isp storage Microsoft Numenta web2expo screencam IIW2008b transcoding d2c electricity price wildfire IT automation ipsec Itanium analytics control panel troubleshooting Seagate fire Fire Eagle developer information cards Mono GPS Moonlight amherst politics NAS thermo parallel computing natural interface BT macro benchmark Tim Berners-Lee spam fighting bandwidth media center IT value MacWorld 2008 DLP Corsair Mini-Note community flash hacking yahoo how do I get the back off? designer exabytes Google IO CES twitter oracle SMB 2 Motorola Eee PC windows server 2008 r2 competition processors terabytes utilities MRDA VSSAdmin user interface power rc disk public cloud android secure Windows Mobile accessories information Tom Hogan calit2 battery life keyboard downturn monitor cellcrypt DSL OQO switch office politics MING hdmi power supply magic logitech windows LHC browser phone settings cloud service google online applications disk space security open source timezones direct access geneva ADFS 2.0 Embarcadero christmas video anti-virus claims Credentica Wyse demo09 dual boot patch Tuesday geocaching CardSpace lost server encryption LiveID HP 64-bit Acrobat Pro RIM ec2 smartphone Adobe mobile network mysql remove back web 2.0 expo pgp Greasemoneky appzero virtualisation navteq screen Mercury server Gartner DisplayLink Xobni 3G hyper-v netbooks mms 2009 Loki quiz mainframe Netscan Hp 2710p upgrade project
Advertisement
Advertisement