Skip to navigation
   
Simon Bisson & Mary Branscombe's Blog

Analytics get distributed, parallel and mathematical

By Simon Bisson & Mary Branscombe in Editorial

Posted in data warehouses, analytics, Applications, Storage on October 16, 2008 at 7:47 pm

Permalink | Author Profile

We had a very interesting conversation today, talking about the next generation of business analytics with folk from Greenplum. The most interesting piece of their story was just how their application works with data.

With no legacy to build on, the Greenplum engineers could take a very different architectural approach. Traditional databases use a single store, and a single query engine. Greenplum’s tools break data up into parcels, sharing it across every machine in their data processing network. A central server keeps track of where the data is held, and manages queries - which can be broken up and delivered to the appropriate servers, the results being assembled by the controller. Supercomputer aficionados will immediately spot that Greenplum are using a shared-nothing approach, where queries can run in parallel on sections of the data - speeding things up considerably. Having a master controller handling scheduling means you can even use unmatched hardware for your data servers.

Complex joins can be handled in a similar manner, with queries moving data between servers and assembling results on many different processors. With quad core a commodity, and six and eight following close behind, it’s not going to be difficult to build a powerful data processing farm (and use the same hardware for other tasks when you don’t need high level analytics).

There’s another spin out of the architecture - you can mix different query types in one analytic operation. With Greenplum’s tools you can mix SQL with Google’s MapReduce, and even throw in the R statistical language for complex mathematical operations. Modelling is an important piece of business analytics, and means that Greenplum’s tools are able to compete with high-end analytical tools like SAS. There are plenty of interesting use cases here - perhpas you’re currently working with massive data sets that take a week to process and a day or so to feed into predictive models. With Greenplum you’ll be able to load the data in parallel and run your statistical models on the data - giving you a considerable speed advantage.

Moore’s Law has hit the wall. Intel’s spectacular U-turn showed as much, as clock speeds dropped and the number of cores went up. That’s left software developers with something of a challenge -

12345
Not yet rated
Loading ... Loading ...

 

   
Tag cloud

business technology optimisation Opsware QWERTY privacy information conferences Web 2.0 NVIDIA codec camera greenplum keyboard logitech pixetell licensing installer webkit Google Spreadsheets parallel computing software WWW search cracking dvi Netscan Gartner advertising future in review amazon GPU Loki Istanbul adfs data centre innovation private cloud insert SIM Mark Hurd Crossfader BBC Vista moblin Itanium ucsd EMC information rights management SSD isps benchmark ontier wave anti-patterns Opteron interoperability fire open source server sprawl Magny-Cours gameboard trends SMB 2 AuthenTec DisplayLink laptop ClipMate Bill Cheswick annotation monitor beta test catalyst timezones malware Tablet Kiosk migration MIX telecoms office pre-boot london HSPA Linux Hp 2710p business model cables Pal media center navteq IO enterprise architecture forensics troubleshooting disaster recovery 2009 teched mobile ofcom network usb netbooks NexT data loss numbers Tablet PC whitelist performance RIA Trend Micro exchange business Google Jeff Hawkins LHC Location relocation terabytes ADFS 2.0 hierarchical temporal memory credit crunch education isp Vodafone Mercury Qualcomm transcoding system management gamer politics CardSpace Seagate Sony cellcrypt Protected View wildfire cloud computing mobile Tim Berners-Lee remove back meaning Credentica mobile working Reqall merger Quest security paradox hard drive CUDA green printing RIM demo data spam virus cosmic rays hp microsoft research 965 windows 7 security theatre downturn upgrade Tom Hogan power cuts BT RSS search Mini-Note Barracuda browser T-Mobile Safari p2v mobile broadband competition Nokia MacBook Air clean install Smartbook Bing Trampoline fibre Nuance LiveID ipsec infrastructure service oriented enterprise navigation public cloud Ruby On Rails target management Ask.com geneva Windows Live Bill Gates turing Ruby cisco Google IO people bombe storage open green IT regulations gaming workflow Volume Shadow Copy IBM HP emulator support ballmerbot virtual desktop Verbatim 2.0 IT transformation xT9 html WEI identity theft direct access todo list productivity Salesforce ports ATI venture capital anti-virus flash drive Frauenhofer ikea CTO MIX08 IDF images EEE Jeff Jones mobility networks ipv6 desktop. PC winhec2008 griffin system center christmas business technology automation office politics verdana mythbusters Visual Studio Previous Versions information cards wes goview enterprise 64-bit HTML 5 T9 Windows 7 vs Windows Vista g-2 optical interconnects network Opera hdmi Secunia hacking phone settings power supply exabytes AMD tablet streaming media citrix electricity price netbook icons collaboration MING power geocaching instant messaging Ray Ozzie Java Treo Pro IT automation Internet disk space netiquette .NET mysql business continuity Silverlight backhaul police user interface ribbon ruggedized tele atlas amherst old software offload eu hold music server acquisitions CERN RSA 2008 oracle toshiba Dell how do I get the back off? high performance computing fingerprint semiotics futura media Wyse beta lawsuit VSSAdmin analytics vulnerabilities 3G international roaming distributed computing connectivity applications data centre transformation flex secure dual boot bugs no signal Enterprise 2.0 CES DSL Tripit patch Tuesday docking station user experience installation web2expo Skyfire Apple OpenID Windows Mobile Adobe ANR cold fusion NAS bletchley park GPL social engineering city Facebook market share virtualisation cloud service google online applications Firefox identitity OEM rc FUD bea robot HTC AIR web 2.0 expo encryption fonts geek tourism hibernation natural interface vmware Dopplr machine learning Live Mesh data tariff disk task bar Netscape Large Hadron Collider rich client congestion charge Greasemoneky lost server Asus drivers development RAZR Palm mash-up mms 2009 moscow DOS hardware Windows Server 2008 Express Gate dual display safend security nvision08 hyper-v identity metasystem ec2 pen computing routing iPass phone management gabriola OFCOM cloud database Internet Explorer ProCurve flash WPF maps geotagging social networking BlackBerry culture co-processor claims evernote context radeon BitLocker TouchSmart SapphireSteel outlook active digitiser yahoo IM quiz Motorola HSDPA twitter case application compatibility biometrics visualisation MRDA RBL SBS deborah adler sun Trolltech processors office 2010 iPhone Acrobat Pro Embarcadero Clear RX etech CIO training power saving switch accelerator MWC Active Directory appzero Xobni OQO patent TSA HMT g-1 community thermo calit2 regulation apps WinHEC Windows Server macro deperimeterization Hugh Thompson Gears Mozilla M&A Beacon UMPC fault spam fighting setup windows server 2008 r2 Delphi wifi Eee PC DOSBox anti-trust mainframe smartphone CPU october project it pro Xen i-mate bbc iplayer MacWorld 2008 thin client history SKU email traffic screencam augmented reality wireless USB IT policy designer NGSCB mapping aws appstore mobile network Intel data loss prevention TechEd 2008 IIW2008b SP1 display Toshiba Portege R500 bolt Fire Eagle conference pgp cam uninstall O'Reilly Mono battery life control panel ultraportable mobile Linux firewall mobile data tariffs design screen MAX utilities national museum of computing wubi O2 multiple monitors web legacy business intelligence macbook tennis microsoft security essentials android Numenta colossus AskEraser rtm Wimbledon legislation microsoft research Microsoft Girl Geek Dinners lockdown IT value demo09 Lenovo magic Palladium accessories consolidation bandwidth atom GPS voice recognition Corsair d2c BES Internet Explorer 8 developer video ubuntu mscape Google Sets Tombstone Objects Moonlight Chrome fingerprint scanner DLP utility bug voice windows
Advertisement
Advertisement