Skip to navigation
   
Simon Bisson & Mary Branscombe's Blog

Analytics get distributed, parallel and mathematical

By Simon Bisson & Mary Branscombe in Editorial

Posted in data warehouses, analytics, Applications, Storage on October 16, 2008 at 7:47 pm

Permalink | Author Profile

We had a very interesting conversation today, talking about the next generation of business analytics with folk from Greenplum. The most interesting piece of their story was just how their application works with data.

With no legacy to build on, the Greenplum engineers could take a very different architectural approach. Traditional databases use a single store, and a single query engine. Greenplum’s tools break data up into parcels, sharing it across every machine in their data processing network. A central server keeps track of where the data is held, and manages queries - which can be broken up and delivered to the appropriate servers, the results being assembled by the controller. Supercomputer aficionados will immediately spot that Greenplum are using a shared-nothing approach, where queries can run in parallel on sections of the data - speeding things up considerably. Having a master controller handling scheduling means you can even use unmatched hardware for your data servers.

Complex joins can be handled in a similar manner, with queries moving data between servers and assembling results on many different processors. With quad core a commodity, and six and eight following close behind, it’s not going to be difficult to build a powerful data processing farm (and use the same hardware for other tasks when you don’t need high level analytics).

There’s another spin out of the architecture - you can mix different query types in one analytic operation. With Greenplum’s tools you can mix SQL with Google’s MapReduce, and even throw in the R statistical language for complex mathematical operations. Modelling is an important piece of business analytics, and means that Greenplum’s tools are able to compete with high-end analytical tools like SAS. There are plenty of interesting use cases here - perhpas you’re currently working with massive data sets that take a week to process and a day or so to feed into predictive models. With Greenplum you’ll be able to load the data in parallel and run your statistical models on the data - giving you a considerable speed advantage.

Moore’s Law has hit the wall. Intel’s spectacular U-turn showed as much, as clock speeds dropped and the number of cores went up. That’s left software developers with something of a challenge -

12345
Not yet rated
Loading ... Loading ...

 

   
Tag cloud

culture i-mate spam it pro Netscape christmas history Toshiba Portege R500 Istanbul hyper-v workflow Mark Hurd pixetell open collaboration cam appzero Credentica Wimbledon screencam rtm IO routing Verbatim cisco consolidation Salesforce MIX NGSCB Ray Ozzie hdmi ANR geocaching outlook office 2010 mobile broadband xT9 rc fingerprint scanner navteq moscow phone management windows server 2008 r2 accessories data centre transformation demo mobile ofcom network Web 2.0 QWERTY fire ucsd NexT utilities open source LiveID Windows Server netbooks secure netbook Jeff Jones Mercury isp system center Windows Mobile Google IIW2008b cosmic rays Wyse vmware Skyfire logitech media center disaster recovery griffin ruggedized Visual Studio atom Internet Explorer 8 Opsware Nokia Embarcadero Trend Micro Qualcomm Google IO Ask.com virtual desktop connectivity HTML 5 Frauenhofer lockdown hierarchical temporal memory BT gabriola wildfire gaming isps GPS T-Mobile encryption WEI OEM FUD forensics data loss SSD cloud service google online applications power saving security paradox drivers identity metasystem Tom Hogan acquisitions web 2.0 expo vulnerabilities Firefox task bar laptop mash-up ipsec Gears tablet evernote wes anti-virus ontier iPhone clean install email Itanium development AIR EMC numbers CUDA images Secunia Tablet PC android timezones 2.0 NVIDIA mobile network ec2 quiz backhaul navigation Apple benchmark switch Windows 7 vs Windows Vista aws trends display SKU design mobile Linux BitLocker legislation T9 moblin data loss prevention video Live Mesh search Pal visualisation Tablet Kiosk mainframe p2v installation police cloud thin client business continuity licensing Linux Corsair meaning VSSAdmin ballmerbot exchange national museum of computing mobile working bolt mobile data tariffs Numenta IM SP1 virtualisation offload advertising colossus HSPA natural interface toshiba IT transformation Large Hadron Collider mobility mms 2009 spam fighting sun CPU 965 BlackBerry Opteron Opera Gartner HP todo list performance camera CardSpace uninstall MIX08 radeon distributed computing maps screen augmented reality Palm O'Reilly TSA ClipMate networks fonts multiple monitors congestion charge mapping DOS Nuance Smartbook Vista hibernation phone settings CIO wubi security Motorola ADFS 2.0 demo09 patent BBC how do I get the back off? microsoft research IBM power cuts identitity green IT interoperability OQO GPU privacy HMT catalyst enterprise business technology optimisation CES optical interconnects teched DLP wifi Hp 2710p cold fusion Sony .NET WinHEC docking station Beacon geneva voice recognition Delphi security theatre web semiotics EEE telecoms data centre M&A lawsuit user experience cracking ATI old software futura flex GPL yahoo lost server Silverlight bea ribbon Express Gate enterprise architecture business model Tim Berners-Lee business technology automation MAX AskEraser business intelligence geotagging Palladium merger iPass Eee PC MacBook Air beta test co-processor smartphone safend Previous Versions Volume Shadow Copy windows 7 cloud computing flash bbc iplayer conference education browser Protected View processors flash drive installer Clear RX Netscan malware private cloud CERN Windows Live IT automation pgp business IT value software citrix SBS no signal DisplayLink etech firewall gamer RIA traffic office politics Safari electricity price community eu Trampoline infrastructure Google Spreadsheets Dopplr mobile media ipv6 remove back server sprawl bugs ubuntu transcoding network exabytes SapphireSteel WWW dual boot webkit user interface annotation tennis database Trolltech DOSBox g-2 relocation Loki downturn ports social networking streaming media RSS search DSL developer designer instant messaging wireless USB amazon terabytes data tariff nvision08 Magny-Cours high performance computing thermo twitter Mini-Note claims rich client green printing context Internet regulations adfs robot icons data Internet Explorer case direct access verdana Tombstone Objects g-1 Lenovo Moonlight usb bletchley park Fire Eagle service oriented enterprise target project 3G Dell Adobe emulator bandwidth Mozilla Xobni training monitor Facebook greenplum html oracle deborah adler support appstore Asus keyboard cellcrypt troubleshooting parallel computing cables winhec2008 RSA 2008 MacWorld 2008 AuthenTec gameboard Location tele atlas magic fault disk HTC politics OpenID pre-boot dual display MING Google Sets anti-patterns O2 Enterprise 2.0 Acrobat Pro wave IT policy battery life machine learning Windows Server 2008 upgrade windows patch Tuesday regulation social engineering Xen beta TechEd 2008 Vodafone RBL october mscape LHC identity theft goview information Intel Ruby anti-trust WPF pen computing venture capital hardware insert SIM Tripit ProCurve amherst dvi application compatibility Jeff Hawkins AMD active digitiser Chrome ultraportable 2009 deperimeterization productivity codec market share people Java public cloud Bill Gates storage credit crunch netiquette MWC accelerator conferences UMPC 64-bit international roaming power supply Greasemoneky hp microsoft research hard drive information rights management NAS virus RAZR desktop. PC Reqall analytics Mono OFCOM Microsoft MRDA mythbusters migration Crossfader Active Directory Seagate calit2 BES macro control panel apps SMB 2 whitelist IDF macbook Treo Pro fingerprint legacy web2expo Ruby On Rails management Hugh Thompson voice system management innovation future in review setup biometrics bug Bill Cheswick bombe hold music fibre london Barracuda hacking d2c disk space office geek tourism city CTO power ikea server competition TouchSmart Girl Geek Dinners utility turing Quest microsoft security essentials information cards HSDPA RIM applications Bing mysql
Advertisement
Advertisement