Skip to navigation
   
Simon Bisson & Mary Branscombe's Blog

Analytics get distributed, parallel and mathematical

By Simon Bisson & Mary Branscombe in Editorial

Posted in data warehouses, analytics, Applications, Storage on October 16, 2008 at 7:47 pm

Permalink | Author Profile

We had a very interesting conversation today, talking about the next generation of business analytics with folk from Greenplum. The most interesting piece of their story was just how their application works with data.

With no legacy to build on, the Greenplum engineers could take a very different architectural approach. Traditional databases use a single store, and a single query engine. Greenplum’s tools break data up into parcels, sharing it across every machine in their data processing network. A central server keeps track of where the data is held, and manages queries - which can be broken up and delivered to the appropriate servers, the results being assembled by the controller. Supercomputer aficionados will immediately spot that Greenplum are using a shared-nothing approach, where queries can run in parallel on sections of the data - speeding things up considerably. Having a master controller handling scheduling means you can even use unmatched hardware for your data servers.

Complex joins can be handled in a similar manner, with queries moving data between servers and assembling results on many different processors. With quad core a commodity, and six and eight following close behind, it’s not going to be difficult to build a powerful data processing farm (and use the same hardware for other tasks when you don’t need high level analytics).

There’s another spin out of the architecture - you can mix different query types in one analytic operation. With Greenplum’s tools you can mix SQL with Google’s MapReduce, and even throw in the R statistical language for complex mathematical operations. Modelling is an important piece of business analytics, and means that Greenplum’s tools are able to compete with high-end analytical tools like SAS. There are plenty of interesting use cases here - perhpas you’re currently working with massive data sets that take a week to process and a day or so to feed into predictive models. With Greenplum you’ll be able to load the data in parallel and run your statistical models on the data - giving you a considerable speed advantage.

Moore’s Law has hit the wall. Intel’s spectacular U-turn showed as much, as clock speeds dropped and the number of cores went up. That’s left software developers with something of a challenge -

12345
Not yet rated
Loading ... Loading ...

 

   
Tag cloud

dual display calit2 benchmark evernote social engineering downturn IDF OQO Clear RX fingerprint scanner Wimbledon innovation patch Tuesday business technology optimisation anti-trust Opteron bbc iplayer Internet Explorer 8 Bill Cheswick flash T-Mobile microsoft research Java virus streaming media meaning DSL display virtualisation Trolltech logitech mobile secure trends Mark Hurd data loss Windows Live GPL 2.0 Chrome Trend Micro congestion charge mobile Linux web2expo SapphireSteel netbooks power supply troubleshooting vulnerabilities Web 2.0 wildfire etech CUDA RSA 2008 isp catalyst MIX Qualcomm rich client cellcrypt backhaul deperimeterization connectivity IT policy deborah adler mobile working RSS search project open source forensics NGSCB server sprawl Loki SSD O'Reilly IBM data centre moscow regulation user experience hardware mapping education Active Directory beta desktop. PC html biometrics mash-up AIR RIM Silverlight offload Tripit electricity price business migration office 2010 design exchange CTO citrix SP1 quiz ultraportable phone settings mysql Ruby On Rails telecoms pen computing bug android target DLP twitter mainframe disaster recovery infrastructure LiveID Google Spreadsheets hold music Delphi media FUD storage system management Nuance future in review advertising magic ribbon optical interconnects Opsware Motorola amazon routing conferences ATI Verbatim acquisitions cloud computing Hp 2710p utility emulator windows 7 wave MIX08 Palm ec2 mythbusters service oriented enterprise flash drive remove back Toshiba Portege R500 icons robot Bill Gates amherst goview international roaming community MacBook Air IT transformation iPass gabriola Windows Server information business model power cuts IIW2008b Dell workflow Safari web fault d2c Smartbook usb flex server active digitiser Gartner RBL natural interface IM Xobni Moonlight sun video Pal EEE wifi disk OEM politics networks collaboration fibre Skyfire tele atlas MAX business technology automation DOS management consolidation g-1 hibernation RIA Ray Ozzie WPF AskEraser data laptop GPU DOSBox information rights management BlackBerry conference productivity Tombstone Objects Numenta EMC Firefox Vista mms 2009 thin client old software culture national museum of computing high performance computing DisplayLink traffic mobile data tariffs Xen Windows Server 2008 T9 hard drive power saving it pro wubi browser clean install pre-boot Itanium virtual desktop adfs ClipMate private cloud O2 hdmi BBC phone management interoperability database mobile ofcom network Treo Pro atom eu IT value BitLocker battery life smartphone apps Magny-Cours enterprise navteq codec MING Google Sets yahoo anti-virus cosmic rays Salesforce Microsoft installer ANR radeon police spam multiple monitors software office Quest Girl Geek Dinners 2009 media center anti-patterns distributed computing Reqall macbook accessories Facebook winhec2008 tablet demo Jeff Hawkins CardSpace hyper-v Bing camera augmented reality BT Adobe annotation developer task bar cracking processors navigation parallel computing ipv6 disk space Vodafone development claims Internet Explorer OFCOM ubuntu Volume Shadow Copy london upgrade Ruby verdana lawsuit hierarchical temporal memory Palladium SMB 2 venture capital VSSAdmin lost server oracle setup turing drivers geek tourism WWW SBS pixetell mobility ikea images whitelist encryption gaming gamer relocation p2v visualisation switch Hugh Thompson installation Internet security theatre cloud Asus office politics M&A WinHEC NAS ballmerbot co-processor fonts public cloud dual boot todo list credit crunch rc gameboard MWC licensing Corsair power ontier email numbers maps semiotics Acrobat Pro Barracuda beta test Protected View bletchley park LHC Frauenhofer bea history ipsec business continuity application compatibility geneva vmware netbook teched Live Mesh Nokia Opera geocaching HP 3G how do I get the back off? information cards Mini-Note Credentica bombe security Mono Tim Berners-Lee cloud service google online applications GPS Linux Gears Lenovo open case tennis 965 ucsd search cisco context griffin mscape i-mate no signal designer green IT Visual Studio NVIDIA outlook analytics aws .NET fingerprint fire october user interface windows server 2008 r2 uninstall security paradox Crossfader ProCurve data tariff screencam Beacon Google Jeff Jones IO Greasemoneky social networking wireless USB regulations geotagging greenplum cam CERN thermo docking station iPhone wes machine learning lockdown Intel SKU Eee PC competition ADFS 2.0 Wyse MacWorld 2008 AMD safend Location terabytes keyboard transcoding macro control panel green printing Dopplr network isps cables screen netiquette dvi spam fighting performance Mercury futura malware BES HSPA Fire Eagle Netscape bugs moblin people QWERTY RAZR firewall applications Trampoline insert SIM city christmas ruggedized direct access utilities training merger Tom Hogan Previous Versions web 2.0 expo Seagate TechEd 2008 Express Gate timezones AuthenTec bandwidth identitity MRDA system center Netscan CIO voice recognition enterprise architecture Tablet Kiosk exabytes nvision08 hacking Windows 7 vs Windows Vista CPU Ask.com legacy Istanbul Windows Mobile instant messaging Large Hadron Collider data centre transformation HTC 64-bit pgp rtm Embarcadero Mozilla TouchSmart TSA identity theft colossus Google IO xT9 business intelligence ports g-2 data loss prevention support cold fusion Apple webkit market share HTML 5 NexT appstore Secunia demo09 hp microsoft research HMT privacy Tablet PC WEI HSDPA Sony accelerator monitor appzero mobile broadband legislation patent IT automation Enterprise 2.0 OpenID microsoft security essentials mobile network toshiba windows bolt UMPC identity metasystem CES voice
Advertisement
Advertisement