Skip to navigation
   
Simon Bisson & Mary Branscombe's Blog

Analytics get distributed, parallel and mathematical

By Simon Bisson & Mary Branscombe in Editorial

Posted in data warehouses, analytics, Applications, Storage on October 16, 2008 at 7:47 pm

Permalink | Author Profile

We had a very interesting conversation today, talking about the next generation of business analytics with folk from Greenplum. The most interesting piece of their story was just how their application works with data.

With no legacy to build on, the Greenplum engineers could take a very different architectural approach. Traditional databases use a single store, and a single query engine. Greenplum’s tools break data up into parcels, sharing it across every machine in their data processing network. A central server keeps track of where the data is held, and manages queries - which can be broken up and delivered to the appropriate servers, the results being assembled by the controller. Supercomputer aficionados will immediately spot that Greenplum are using a shared-nothing approach, where queries can run in parallel on sections of the data - speeding things up considerably. Having a master controller handling scheduling means you can even use unmatched hardware for your data servers.

Complex joins can be handled in a similar manner, with queries moving data between servers and assembling results on many different processors. With quad core a commodity, and six and eight following close behind, it’s not going to be difficult to build a powerful data processing farm (and use the same hardware for other tasks when you don’t need high level analytics).

There’s another spin out of the architecture - you can mix different query types in one analytic operation. With Greenplum’s tools you can mix SQL with Google’s MapReduce, and even throw in the R statistical language for complex mathematical operations. Modelling is an important piece of business analytics, and means that Greenplum’s tools are able to compete with high-end analytical tools like SAS. There are plenty of interesting use cases here - perhpas you’re currently working with massive data sets that take a week to process and a day or so to feed into predictive models. With Greenplum you’ll be able to load the data in parallel and run your statistical models on the data - giving you a considerable speed advantage.

Moore’s Law has hit the wall. Intel’s spectacular U-turn showed as much, as clock speeds dropped and the number of cores went up. That’s left software developers with something of a challenge -

12345
Not yet rated
Loading ... Loading ...

 

   
Tag cloud

gameboard Microsoft O'Reilly accessories xT9 BBC wildfire hold music electricity price html competition apps software how do I get the back off? CTO security appstore Opteron bombe disaster recovery gabriola AskEraser power cuts HP monitor data tariff troubleshooting T9 amherst pre-boot bletchley park Google Spreadsheets goview macbook active digitiser demo09 Chrome service oriented enterprise networks open source SP1 Apple virtualisation HTC RIM microsoft security essentials rich client system center thermo wireless USB timezones iPhone cloud Windows 7 vs Windows Vista Crossfader interoperability mysql instant messaging beta test identity metasystem clean install email mobile data tariffs developer Asus ATI anti-virus pen computing 64-bit ontier GPS Protected View installation Eee PC turing bbc iplayer drivers Google IO tele atlas performance culture Windows Mobile beta vmware infrastructure user interface Salesforce whitelist firewall geek tourism demo london server sprawl windows Ruby evernote IT value Silverlight ruggedized Beacon forensics Mark Hurd wes WPF utilities business technology optimisation Greasemoneky quiz mobile ofcom network Ask.com logitech hp microsoft research 3G hard drive insert SIM distributed computing market share deborah adler Tablet PC Netscape oracle navigation Bill Cheswick WinHEC aws Credentica OQO identitity QWERTY Tombstone Objects Opera maps mscape Google Sets geotagging yahoo workflow OFCOM teched database laptop voice switch Istanbul NAS web2expo fingerprint scanner Palm business technology automation Web 2.0 telecoms disk Jeff Jones cloud computing ribbon deperimeterization Verbatim Mercury design todo list Vista Motorola bugs flash drive information rights management Dell ec2 patch Tuesday social engineering public cloud cloud service google online applications information cellcrypt accelerator Palladium collaboration conference setup Secunia mms 2009 MWC HTML 5 cold fusion claims streaming media Visual Studio moscow mash-up Barracuda video task bar venture capital Wyse ucsd RIA disk space trends Mozilla hdmi android netiquette management media center WEI IIW2008b bandwidth 965 ANR malware Adobe AIR application compatibility screen Enterprise 2.0 toshiba camera EMC twitter Fire Eagle processors mythbusters cables migration ballmerbot virtual desktop safend Nuance Embarcadero isp Ray Ozzie nvision08 mapping office winhec2008 case netbook appzero open business continuity anti-patterns macro T-Mobile mobile working annotation IBM spam fighting old software IT policy OEM visualisation merger fault TSA Reqall calit2 GPU ikea patent moblin CERN office 2010 Quest atom system management Facebook ipsec Active Directory windows 7 Gears NGSCB network BT g-2 licensing microsoft research analytics Firefox acquisitions police Seagate Mini-Note lawsuit context regulation hardware future in review Xobni remove back Large Hadron Collider vulnerabilities griffin citrix lost server TouchSmart Itanium BES people mobile Linux isps data control panel MacBook Air mobile fire business Internet Explorer 8 Java CPU Trend Micro RBL verdana privacy natural interface connectivity catalyst gaming OpenID information cards EEE relocation private cloud g-1 october MacWorld 2008 Magny-Cours windows server 2008 r2 Internet Explorer futura spam exchange rtm CIO Bing exabytes i-mate Windows Server 2008 colossus terabytes security theatre DOS SKU flash Qualcomm it pro Corsair green IT Skyfire hyper-v MAX ProCurve RSS search bug browser biometrics offload HSPA CES desktop. PC FUD pgp voice recognition IM regulations HSDPA keyboard ultraportable codec Tom Hogan advertising LiveID fibre cam power LHC wifi ports cosmic rays downturn CardSpace sun emulator identity theft etech development installer cracking user experience business intelligence ClipMate VSSAdmin power saving target Trampoline thin client wubi rc training hacking display Moonlight Lenovo SBS applications Pal dual display MING dual boot GPL screencam Safari benchmark BitLocker robot meaning Linux backhaul Clear RX green printing wave Numenta business model server O2 Acrobat Pro dvi flex d2c optical interconnects christmas eu WWW 2009 lockdown semiotics legislation routing productivity SapphireSteel education national museum of computing Treo Pro ubuntu Windows Live traffic security paradox DisplayLink 2.0 MIX08 tablet docking station adfs Tim Berners-Lee pixetell city CUDA augmented reality Nokia IO tennis data loss prevention battery life Gartner mobile broadband IT transformation international roaming power supply Toshiba Portege R500 politics Wimbledon HMT community M&A Previous Versions Google consolidation RSA 2008 innovation congestion charge media Intel Bill Gates images search support Ruby On Rails DOSBox smartphone ipv6 Tablet Kiosk transcoding high performance computing fonts Sony SSD DSL numbers mobility data centre AuthenTec encryption usb cisco Location Netscan Vodafone co-processor gamer web 2.0 expo social networking Girl Geek Dinners hibernation office politics Dopplr IT automation upgrade anti-trust enterprise architecture virus machine learning mobile network designer Trolltech SMB 2 Hp 2710p phone settings Windows Server Opsware Tripit parallel computing fingerprint NVIDIA MIX secure DLP conferences MRDA data loss outlook web geneva history project Smartbook enterprise multiple monitors storage Mono TechEd 2008 Volume Shadow Copy Internet greenplum hierarchical temporal memory bolt netbooks Jeff Hawkins icons BlackBerry data centre transformation Hugh Thompson radeon AMD IDF webkit UMPC Frauenhofer uninstall phone management utility iPass p2v magic credit crunch Express Gate geocaching direct access Delphi bea Loki legacy mainframe RAZR ADFS 2.0 Xen NexT amazon no signal Live Mesh .NET navteq
Advertisement
Advertisement