Skip to navigation
   
Simon Bisson & Mary Branscombe's Blog

Analytics get distributed, parallel and mathematical

By Simon Bisson & Mary Branscombe in Editorial

Posted in data warehouses, analytics, Applications, Storage on October 16, 2008 at 7:47 pm

Permalink | Author Profile

We had a very interesting conversation today, talking about the next generation of business analytics with folk from Greenplum. The most interesting piece of their story was just how their application works with data.

With no legacy to build on, the Greenplum engineers could take a very different architectural approach. Traditional databases use a single store, and a single query engine. Greenplum’s tools break data up into parcels, sharing it across every machine in their data processing network. A central server keeps track of where the data is held, and manages queries - which can be broken up and delivered to the appropriate servers, the results being assembled by the controller. Supercomputer aficionados will immediately spot that Greenplum are using a shared-nothing approach, where queries can run in parallel on sections of the data - speeding things up considerably. Having a master controller handling scheduling means you can even use unmatched hardware for your data servers.

Complex joins can be handled in a similar manner, with queries moving data between servers and assembling results on many different processors. With quad core a commodity, and six and eight following close behind, it’s not going to be difficult to build a powerful data processing farm (and use the same hardware for other tasks when you don’t need high level analytics).

There’s another spin out of the architecture - you can mix different query types in one analytic operation. With Greenplum’s tools you can mix SQL with Google’s MapReduce, and even throw in the R statistical language for complex mathematical operations. Modelling is an important piece of business analytics, and means that Greenplum’s tools are able to compete with high-end analytical tools like SAS. There are plenty of interesting use cases here - perhpas you’re currently working with massive data sets that take a week to process and a day or so to feed into predictive models. With Greenplum you’ll be able to load the data in parallel and run your statistical models on the data - giving you a considerable speed advantage.

Moore’s Law has hit the wall. Intel’s spectacular U-turn showed as much, as clock speeds dropped and the number of cores went up. That’s left software developers with something of a challenge -

12345
Not yet rated
Loading ... Loading ...

Previous Post | Next Post

 
 
Comments
This article has no comments yet.

Make a comment

* required

* required

We stop spam using reCaptcha.
Type the words below and click Submit Comment.

   
Tag cloud

radeon HTML 5 Trolltech appstore future in review GPL AIR Active Directory malware CERN IT automation iPhone telecoms HMT cloud computing business upgrade development BlackBerry fibre Corsair laptop outlook Vodafone Seagate networks NGSCB BT wildfire international roaming patch Tuesday thin client vmware Hugh Thompson netiquette IT value web 2.0 expo aws Motorola project quiz Clear RX firewall Skyfire tele atlas MING competition Jeff Jones OpenID lockdown pre-boot geek tourism Beacon VSSAdmin instant messaging thermo TSA Google Spreadsheets Silverlight SP1 Pal database DLP data centre transformation CIO workflow TouchSmart fingerprint Dopplr Bing identity metasystem open AMD acquisitions O'Reilly Wyse Lenovo cloud annotation fire calit2 Tim Berners-Lee Previous Versions police d2c cam phone settings Toshiba Portege R500 Firefox conferences Magny-Cours LHC dual boot phone management display system center Google security theatre ruggedized toshiba disk terabytes virtual desktop tablet Google Sets colossus griffin visualisation isps culture desktop. PC GPU windows server 2008 r2 vulnerabilities power supply merger p2v windows 7 RSS search Barracuda verdana M&A CPU spam fighting fingerprint scanner Palm Adobe Quest user experience IO ucsd lost server LiveID optical interconnects data centre task bar search HP demo09 ec2 Internet Explorer Safari greenplum Girl Geek Dinners SSD Crossfader Windows Mobile HSPA mythbusters meaning innovation RIM moblin Visual Studio Intel cellcrypt mobility Eee PC Internet Explorer 8 WinHEC netbook AuthenTec ATI mobile NAS Dell adfs Greasemoneky regulations logitech mobile network pen computing T-Mobile rc O2 drivers DSL BitLocker voice recognition timezones it pro futura whitelist Ruby On Rails history relocation bombe Protected View Qualcomm Windows Live mapping 3G cloud service google online applications Tablet PC FUD Secunia no signal browser .NET Fire Eagle Bill Cheswick co-processor green printing winhec2008 WWW Numenta teched Asus applications WPF wubi city credit crunch BES rtm october ubuntu business continuity virtualisation ipsec london Acrobat Pro GPS security paradox webkit IIW2008b ADFS 2.0 web2expo twitter people SKU WEI hacking how do I get the back off? SMB 2 geneva mms 2009 market share IT transformation T9 apps cracking Tablet Kiosk management Trend Micro AskEraser demo Opsware dvi goview social networking web licensing streaming media service oriented enterprise pgp power cuts hibernation usb codec performance microsoft security essentials Windows Server flash drive claims open source android NVIDIA mash-up hdmi keyboard html NexT IDF data tariff cables mobile broadband data loss prevention gaming MRDA Sony data CUDA cosmic rays uninstall system management Verbatim legislation lawsuit SBS media center ClipMate Facebook Web 2.0 Express Gate MAX forensics wifi voice enterprise bugs camera Wimbledon disk space yahoo switch screencam mysql hardware appzero IM Treo Pro deperimeterization Mini-Note information rights management information cards ontier Netscan Enterprise 2.0 citrix BBC privacy backhaul Ruby conference i-mate robot moscow amherst mscape hierarchical temporal memory social engineering christmas old software congestion charge design iPass Microsoft mobile Linux hp microsoft research regulation server collaboration installation spam Gartner developer network fault MIX08 monitor 2009 bug RAZR Mercury maps offload business intelligence direct access Smartbook 2.0 pixetell virus Opteron CES HSDPA connectivity Mono tennis magic exchange wave community transcoding anti-trust office 2010 64-bit traffic ultraportable g-2 Windows 7 vs Windows Vista Live Mesh Mark Hurd wireless USB Apple CTO flex ribbon Palladium Internet mobile working todo list distributed computing CardSpace mobile data tariffs IBM geocaching nvision08 dual display identity theft EEE support smartphone server sprawl fonts RBL politics application compatibility target Google IO QWERTY Trampoline microsoft research Nuance designer catalyst Reqall mainframe OEM Nokia processors electricity price emulator Bill Gates clean install disaster recovery green IT power user interface office politics UMPC trends evernote beta test 965 OFCOM deborah adler context enterprise architecture RSA 2008 HTC infrastructure utilities accelerator secure bandwidth encryption icons Xen software semiotics turing case analytics ikea rich client ports hard drive Salesforce venture capital Mozilla multiple monitors Tripit Netscape hold music images eu setup TechEd 2008 storage patent cold fusion Chrome windows Java parallel computing national museum of computing benchmark office Tom Hogan security Itanium Embarcadero media anti-virus productivity battery life MIX ballmerbot atom EMC public cloud ANR high performance computing data loss troubleshooting docking station utility numbers Credentica screen geotagging Ask.com DOS RIA safend mobile ofcom network macro bletchley park advertising anti-patterns gabriola gamer accessories MacBook Air MWC oracle sun Jeff Hawkins DOSBox biometrics identitity navigation ProCurve legacy natural interface interoperability Opera Tombstone Objects active digitiser downturn ipv6 isp bolt Moonlight exabytes business model Vista MacWorld 2008 Hp 2710p insert SIM information Istanbul Location Volume Shadow Copy bbc iplayer training remove back business technology automation OQO amazon navteq augmented reality g-1 video netbooks hyper-v Windows Server 2008 email education Linux control panel beta Ray Ozzie installer Loki Gears Frauenhofer etech migration cisco power saving machine learning Large Hadron Collider wes private cloud SapphireSteel bea DisplayLink Xobni IT policy xT9 flash consolidation macbook routing gameboard Delphi business technology optimisation
Advertisement
Advertisement