Skip to navigation
   
Simon Bisson & Mary Branscombe's Blog

Analytics get distributed, parallel and mathematical

By Simon Bisson & Mary Branscombe in Editorial

Posted in data warehouses, analytics, Applications, Storage on October 16, 2008 at 7:47 pm

Permalink | Author Profile

We had a very interesting conversation today, talking about the next generation of business analytics with folk from Greenplum. The most interesting piece of their story was just how their application works with data.

With no legacy to build on, the Greenplum engineers could take a very different architectural approach. Traditional databases use a single store, and a single query engine. Greenplum’s tools break data up into parcels, sharing it across every machine in their data processing network. A central server keeps track of where the data is held, and manages queries - which can be broken up and delivered to the appropriate servers, the results being assembled by the controller. Supercomputer aficionados will immediately spot that Greenplum are using a shared-nothing approach, where queries can run in parallel on sections of the data - speeding things up considerably. Having a master controller handling scheduling means you can even use unmatched hardware for your data servers.

Complex joins can be handled in a similar manner, with queries moving data between servers and assembling results on many different processors. With quad core a commodity, and six and eight following close behind, it’s not going to be difficult to build a powerful data processing farm (and use the same hardware for other tasks when you don’t need high level analytics).

There’s another spin out of the architecture - you can mix different query types in one analytic operation. With Greenplum’s tools you can mix SQL with Google’s MapReduce, and even throw in the R statistical language for complex mathematical operations. Modelling is an important piece of business analytics, and means that Greenplum’s tools are able to compete with high-end analytical tools like SAS. There are plenty of interesting use cases here - perhpas you’re currently working with massive data sets that take a week to process and a day or so to feed into predictive models. With Greenplum you’ll be able to load the data in parallel and run your statistical models on the data - giving you a considerable speed advantage.

Moore’s Law has hit the wall. Intel’s spectacular U-turn showed as much, as clock speeds dropped and the number of cores went up. That’s left software developers with something of a challenge -

12345
Not yet rated
Loading ... Loading ...

 

   
Tag cloud

T-Mobile mythbusters how do I get the back off? international roaming firewall utilities mysql privacy etech ikea mobile ofcom network Secunia fire magic migration ports RIM context data loss prevention IT automation windows 7 parallel computing 965 congestion charge safend icons application compatibility processors rtm CPU Web 2.0 Intel Netscan Tom Hogan Loki griffin benchmark HSDPA Silverlight Windows Mobile developer designer Gartner Wyse logitech twitter Internet Explorer 8 tele atlas macro IIW2008b secure hyper-v setup uninstall exchange LiveID virtual desktop power saving SapphireSteel Verbatim high performance computing media center beta test fingerprint insert SIM dvi disk space exabytes innovation disk ribbon semiotics robot no signal SKU IO natural interface Beacon analytics performance christmas iPass greenplum anti-trust Hp 2710p Ask.com fibre whitelist distributed computing NexT fingerprint scanner images gaming Tablet Kiosk ATI software web Tombstone Objects instant messaging mobile data tariffs cisco deperimeterization windows server 2008 r2 Apple database hacking london wildfire Windows Live Express Gate hardware geek tourism Fire Eagle mscape Pal MAX outlook deborah adler monitor ipv6 Chrome GPL electricity price mobile working Vista macbook interoperability HMT camera bombe FUD Trampoline conferences 64-bit turing information cards hibernation Linux mapping lawsuit Tripit Bill Cheswick Sony business technology automation hard drive web2expo adfs culture Jeff Hawkins Windows 7 vs Windows Vista battery life pre-boot ballmerbot merger data loss Numenta SMB 2 GPU flash drive display voice recognition bea Adobe Salesforce IT transformation Tim Berners-Lee maps beta streaming media DisplayLink development bandwidth MWC BlackBerry Greasemoneky OEM iPhone ANR Windows Server virus lockdown Facebook ClipMate Tablet PC TSA hold music security Smartbook information voice security paradox tablet anti-virus wifi isp isps visualisation identitity apps applications office 2010 Palm SBS data Live Mesh venture capital Eee PC MacBook Air Gears Reqall Java Xen business model pen computing politics cables VSSAdmin web 2.0 expo Magny-Cours productivity history g-1 terabytes information rights management mobile Linux cloud ADFS 2.0 MING netbooks Opera timezones active digitiser mash-up Qualcomm thermo patent open consolidation encryption Jeff Jones SSD support LHC OpenID NAS identity metasystem g-2 transcoding october bug codec market share server sprawl lost server mobile yahoo OQO ipsec MIX usb wubi Mono licensing browser wes bbc iplayer data tariff microsoft security essentials NVIDIA TechEd 2008 OFCOM AMD co-processor Netscape BBC DOSBox Google biometrics machine learning Skyfire gamer phone management catalyst AIR cam 2.0 identity theft winhec2008 tennis Mercury pixetell SP1 IT value office demo wave DSL Windows Server 2008 geocaching Google Spreadsheets goview task bar Bing rc O'Reilly teched WWW Wimbledon TouchSmart collaboration keyboard legacy mobility competition regulations installer Nuance Location Google Sets docking station business continuity enterprise IDF Ray Ozzie social engineering relocation augmented reality xT9 d2c storage android bugs ucsd telecoms Visual Studio social networking demo09 Xobni management CERN clean install ec2 optical interconnects accessories search atom aws spam fighting Toshiba Portege R500 Active Directory installation futura Girl Geek Dinners regulation people ultraportable email radeon hp microsoft research server direct access amazon multiple monitors Internet Itanium RIA AuthenTec calit2 target drivers cellcrypt WPF sun upgrade old software Mozilla Mini-Note future in review bletchley park GPS cloud service google online applications Frauenhofer wireless USB network DOS fault green IT T9 acquisitions .NET ontier system management Corsair eu Delphi Opteron data centre transformation UMPC NGSCB bolt IBM patch Tuesday Quest Dopplr video BES gameboard control panel EEE troubleshooting Credentica community Palladium business technology optimisation mms 2009 i-mate mainframe CIO Volume Shadow Copy hierarchical temporal memory CUDA fonts power cuts smartphone pgp cold fusion user interface RAZR malware moscow media Enterprise 2.0 hdmi RSS search city legislation green printing Dell gabriola advertising citrix disaster recovery screen verdana downturn moblin claims Safari desktop. PC flash oracle CardSpace traffic Asus CTO Trolltech Moonlight credit crunch dual display cloud computing emulator evernote Hugh Thompson Ruby html vmware Firefox accelerator remove back toshiba numbers user experience education mobile broadband thin client cosmic rays it pro spam QWERTY WinHEC microsoft research Barracuda Previous Versions Motorola meaning EMC netbook police webkit M&A switch screencam RBL 2009 HP cracking Ruby On Rails Opsware Nokia trends system center case colossus laptop data centre networks training MRDA Internet Explorer workflow conference HTC WEI Lenovo 3G DLP connectivity business intelligence MIX08 MacWorld 2008 public cloud ProCurve Crossfader quiz Embarcadero nvision08 power rich client Clear RX geneva Vodafone RSA 2008 windows power supply appstore business ruggedized Treo Pro IM O2 offload vulnerabilities CES infrastructure HTML 5 IT policy anti-patterns Acrobat Pro virtualisation AskEraser dual boot service oriented enterprise open source BitLocker backhaul HSPA forensics todo list amherst Protected View ubuntu utility Istanbul navigation BT routing project national museum of computing navteq Mark Hurd Google IO Seagate Microsoft security theatre enterprise architecture design phone settings Trend Micro flex mobile network geotagging Bill Gates netiquette annotation private cloud office politics p2v appzero Large Hadron Collider
Advertisement
Advertisement