Skip to navigation
   
Simon Bisson & Mary Branscombe 's Blog

CUDA - let the GPU take the strain

By Simon Bisson & Mary Branscombe in Editorial

Posted in Processors, Silicon, Applications, Business, Server on February 9, 2008 at 7:39 pm

Permalink | Author Profile

The barracuda is the wolf of the sea, a slim silver dart that hunts in deadly packs. It’s perhaps not surprising that NVIDIA has taken part of its name for its GPU-based supercomputing tools.

On a recent trip to the US, Mary and I met up with some of the folk behind CUDA at NVIDIA’s Sunnyvale headquarters. It was a fascinating conversation - if only because I used to write scientific computing software, and something like CUDA would have sped up my work massively. When a problem takes days to solve, something using something like CUDA to accelerate processing makes a lot of sense.

Prior to CUDA, NVIDIA had tried to use GPUs for compute, but had run into architectural problems. Things changed with their series 8 GPU, which was very different to anything they’d built before, being designed for compute as well as graphics. That’s lead to some tradeoffs - there’s silicon on the GPUs that’s unused when it’s used as an accelerator (and vice versa). However NVIDIA makes so many chips, there’s not really any financial issue, it all comes out of the economies of scale.

CUDA is more than just a set of chips - it’s a language framework for working with GPUs, that can andle both sequential and parallel code together. Developers don’t need to learn anything you, and the framework gives programmers explicit - and simple - interfaces for running parallel code on NVIDIAs GPUs. There is a long term goal of providing tools for automating parallelism, but at this point you still need to work out what code can be parallelised yourself. The result is code that’s very simple with much less code, as CUDA handles repetitive calculations for you.

Simplicity comes from the hardware as well, as it manages threads for you. All you need to do is define the tasks the GPU will handle, and manage their interactions. The GPU then runs the calculations over the data, with groups of processors on different functions at the same time. As RAM is directly attached to the GPU there’s no need to use the PC’s own memory for caching data.

The numbers coming out of CUDA are impressive. Working with the VMD/NAMD molecular dynamics tools researchers at the University of Illinois have seen a 240X speed-up in the VMD ion placement tool, and an 8 to 12X speed up in NAMD. With an eye on greener computing, they’re also finding that CUDA gives them 1W/Gflop!

If you want this sort of power for your applications (and it’s remarkably suitable for large financial applications) you can by NVIDIA’s Tesla systems. There are work station versions, along with deskside offload processors. However the version we were most impressed with comes as a 1U rack mount unit, containing 4 GPUs. Connected to a PC or a server via 5 Gbps PCI-Express connections this is the way to give your data centre applications a significant speed up, with significantly lower power requirements.

While Tesla may not yet meet NVIDIA’s aim of providing a Teraflop in a 1U unit, it certainly speeds things up. Oxford University researchers have used it to get a 149X speed up LIBOR risk analysis for an 89X improvement on performance/Watt. That’s a good deal in anyone’s book - especially if you’re working with today’s fractious financial markets.

Add one to my list for the IT Santa!

–Simon

12345
Rated: 90% (2 votes)
Loading ... Loading ...

Previous Post | Next Post

 
 
Comments
This article has no comments yet.

Make a comment

* required

* required

We stop spam using reCaptcha.
Type the words below and click Submit Comment.

   
Tag cloud

video HTC dual display regulations mobile patent Volume Shadow Copy oracle mash-up Future in Review RIM MING business security theatre interoperability Beacon utilities open Fire Eagle ADFS 2.0 hierarchical temporal memory AuthenTec cisco streaming media 3G exchange identity metasystem OQO politics Trend Micro HMT enterprise architecture service oriented enterprise gaming Vista software open source Greasemoneky office Nokia security paradox EEE CTO Corsair TouchSmart .NET management processors amherst optical interconnects CES active digitiser bandwidth NAS user interface World Wide Telescope hacking AskEraser CPU flash server hp microsoft research VSSAdmin hold music advertising etech Tripit OpenID Ask.com ruggedized spam Crossfader mobile working AMD ballmerbot biometrics SBS HR automation MacBook Air firewall merger RSA 2008 fingerprint scanner WPF Mozilla Firefox HP Intel consumerization of IT wifi fingerprint Seagate SP1 TNT iPhone timezones Bill Cheswick Internet Explorer Moonlight anti-virus information Google IO i-mate data BES sun Internet ucsd CardSpace Tablet PC cloud service google online applications disk space Google Spreadsheets Facebook Adobe Internet Explorer 8 Windows Mobile 4x HD Barracuda mscape Xobni quiz TechEd 2008 productivity fire html bbc iplayer Windows Server 2008 Web 2.0 Motorola exabytes MacWorld 2008 Toshiba Portege R500 conferences deperimeterization co-processor isp O'Reilly fraud visualisation 64-bit pen computing accelerator identity theft cracking phone management Girl Geek Dinners Dopplr payroll Gears patch Tuesday Enterprise 2.0 mobile ofcom network migration virtual desktop network UMPC high performance computing RAZR TSA smartphone virtualisation browser DisplayLink NVIDIA IBM vulnerabilities Silverlight SMB 2 whitelist robot wireless USB accessories BlackBerry Bold Tablet Kiosk acquisitions Jeff Hawkins bea mysql support automation todo list Trolltech onboarding Reqall lawsuit Bill Gates Mono GPS Hp 2710p mobility geotagging CalIT2 Apple mobile Linux Secunia wildfire Credentica yahoo green IT Palm enterprise O2 HD GPU Loki privacy Location business intelligence christmas mobile data tariffs traffic Microsoft Jeff Jones Visual Studio Numenta toshiba Hugh Thompson MIX08 security conference CUDA Previous Versions RIA community forensics performance HTML 5 provisioning geocaching National Insurance EMC Verbatim Google Sets machine learning green printing thin client hardware beta
Advertisement
Advertisement