Skip to navigation
   
Simon Bisson & Mary Branscombe's Blog

Analytics get distributed, parallel and mathematical

By Simon Bisson & Mary Branscombe in Editorial

Posted in data warehouses, analytics, Applications, Storage on October 16, 2008 at 7:47 pm

Permalink | Author Profile

We had a very interesting conversation today, talking about the next generation of business analytics with folk from Greenplum. The most interesting piece of their story was just how their application works with data.

With no legacy to build on, the Greenplum engineers could take a very different architectural approach. Traditional databases use a single store, and a single query engine. Greenplum’s tools break data up into parcels, sharing it across every machine in their data processing network. A central server keeps track of where the data is held, and manages queries - which can be broken up and delivered to the appropriate servers, the results being assembled by the controller. Supercomputer aficionados will immediately spot that Greenplum are using a shared-nothing approach, where queries can run in parallel on sections of the data - speeding things up considerably. Having a master controller handling scheduling means you can even use unmatched hardware for your data servers.

Complex joins can be handled in a similar manner, with queries moving data between servers and assembling results on many different processors. With quad core a commodity, and six and eight following close behind, it’s not going to be difficult to build a powerful data processing farm (and use the same hardware for other tasks when you don’t need high level analytics).

There’s another spin out of the architecture - you can mix different query types in one analytic operation. With Greenplum’s tools you can mix SQL with Google’s MapReduce, and even throw in the R statistical language for complex mathematical operations. Modelling is an important piece of business analytics, and means that Greenplum’s tools are able to compete with high-end analytical tools like SAS. There are plenty of interesting use cases here - perhpas you’re currently working with massive data sets that take a week to process and a day or so to feed into predictive models. With Greenplum you’ll be able to load the data in parallel and run your statistical models on the data - giving you a considerable speed advantage.

Moore’s Law has hit the wall. Intel’s spectacular U-turn showed as much, as clock speeds dropped and the number of cores went up. That’s left software developers with something of a challenge -

12345
Not yet rated
Loading ... Loading ...

Previous Post | Next Post

 
 
Comments
This article has no comments yet.

Make a comment

* required

* required

We stop spam using reCaptcha.
Type the words below and click Submit Comment.

   
Tag cloud

phone settings tele atlas data centre politics identity theft p2v enterprise BT cloud service google online applications Loki mobile network atom utilities people HP 2009 Tablet Kiosk service oriented enterprise RBL ikea developer appzero Crossfader Internet Explorer 8 christmas distributed computing education Google culture Wyse battery life Adobe application compatibility uninstall semiotics Embarcadero voice CIO camera networks Nuance Tom Hogan exabytes Tripit Large Hadron Collider terabytes office politics .NET Vodafone BES ipsec thin client dual boot co-processor NAS EEE data centre transformation Safari CES toshiba designer bbc iplayer encryption database Location disk utility macbook mysql eu claims disk space RIA T9 NGSCB fingerprint scanner geneva routing Visual Studio DSL laptop Jeff Jones MING visualisation instant messaging SMB 2 bea Windows 7 vs Windows Vista Palladium griffin business intelligence Corsair Silverlight mobile ofcom network IDF workflow Delphi storage Trolltech hardware power teched relocation turing web2expo whitelist MIX T-Mobile lost server twitter ports IT policy fault 2.0 it pro collaboration gamer thermo social engineering mythbusters control panel AMD Numenta ProCurve NVIDIA security paradox Greasemoneky management old software Web 2.0 pgp WWW no signal exchange webkit Linux screencam Toshiba Portege R500 IT value beta international roaming browser training phone management pre-boot BBC g-1 Windows Mobile gaming Istanbul Internet Explorer mobile working parallel computing IBM HSPA Gears drivers cisco geocaching accelerator Facebook gameboard identitity microsoft research Frauenhofer congestion charge netbooks M&A g-2 offload national museum of computing Itanium numbers optical interconnects Palm office winhec2008 CERN hacking cellcrypt Internet CardSpace VSSAdmin oracle business technology optimisation netiquette wes natural interface cold fusion Ask.com hard drive Express Gate bandwidth Windows Server O2 public cloud CPU traffic mscape design machine learning information rights management emulator green IT legislation SSD business model migration Tablet PC Opsware windows server 2008 r2 Reqall gabriola MWC dvi UMPC SapphireSteel docking station biometrics LiveID Mozilla usb etech competition october switch anti-patterns Mono mobile Dopplr Firefox Motorola business technology automation Microsoft context green printing police IO Active Directory GPU upgrade rich client cosmic rays Windows Server 2008 deborah adler iPhone BlackBerry Salesforce business pixetell innovation IT transformation SP1 security history DisplayLink windows 7 Netscape vmware dual display i-mate user experience open Tombstone Objects robot CTO hp microsoft research mapping Pal ribbon flash processors Xobni Chrome codec bletchley park future in review social networking HTML 5 amazon private cloud Seagate Quest advertising mobile data tariffs Clear RX acquisitions rc mms 2009 direct access community deperimeterization 965 bolt OEM installation secure navteq Lenovo ubuntu DLP installer anti-virus greenplum Credentica monitor fibre rtm windows vulnerabilities Wimbledon iPass productivity 64-bit citrix AIR apps AskEraser venture capital hibernation analytics Verbatim TouchSmart outlook Girl Geek Dinners SBS maps hdmi user interface open source Hp 2710p Bill Cheswick netbook geek tourism consolidation 3G mainframe adfs aws clean install media pen computing web 2.0 expo privacy trends enterprise architecture licensing MacWorld 2008 display Asus xT9 Xen wave london Google IO Skyfire DOSBox power saving virtualisation HMT fingerprint performance CUDA MIX08 DOS support forensics RIM microsoft security essentials timezones Google Sets security theatre OpenID bombe ANR ultraportable WEI cloud computing logitech Gartner Java colossus cam market share Previous Versions android wubi conference downturn project search Jeff Hawkins meaning mobile Linux case how do I get the back off? calit2 EMC tablet d2c Intel ruggedized Bill Gates lockdown demo spam fighting mash-up Trend Micro wireless USB Netscan MAX video SKU malware cloud Smartbook Qualcomm Volume Shadow Copy IM geotagging HTC ontier mobility data credit crunch setup html Acrobat Pro Mark Hurd active digitiser GPS bugs remove back icons augmented reality firewall patch Tuesday server sprawl yahoo moscow screen Eee PC Secunia cables network todo list spam Enterprise 2.0 nvision08 media center RAZR Barracuda ucsd hold music fire ballmerbot Tim Berners-Lee Google Spreadsheets beta test telecoms GPL MRDA IT automation mobile broadband Moonlight WPF identity metasystem LHC regulation Dell hyper-v data loss prevention AuthenTec WinHEC power cuts ADFS 2.0 futura ec2 BitLocker navigation power supply cracking Magny-Cours TechEd 2008 software connectivity server evernote disaster recovery Opera conferences MacBook Air goview Hugh Thompson Protected View isps Ruby On Rails lawsuit virtual desktop amherst data loss appstore Mercury streaming media QWERTY merger tennis Live Mesh information cards fonts flash drive quiz OQO wifi safend benchmark RSA 2008 high performance computing isp images system management flex macro Treo Pro voice recognition transcoding Opteron Trampoline moblin Windows Live information verdana Ray Ozzie Sony magic IIW2008b wildfire troubleshooting OFCOM catalyst hierarchical temporal memory keyboard legacy regulations ATI RSS search radeon Fire Eagle backhaul applications web email ClipMate interoperability ipv6 desktop. PC system center Bing anti-trust Apple city data tariff Vista TSA patent target Beacon business continuity multiple monitors bug Ruby smartphone demo09 HSDPA office 2010 annotation virus sun Mini-Note NexT electricity price insert SIM accessories O'Reilly task bar development FUD Nokia infrastructure
Advertisement
Advertisement