Get ready for HPC in the mainstream

One of the interesting side benefits of that famous Moore's Law, made famous by Intel's Gordon Moore is that from now on, for a while at least, it will be possible to get a very clear and explicit 'feel' for what it actually means.

Until the arrival of the multicore processor, the advances in semiconductor technology were increasing abstruse and meaningless to the majority of people in IT, even experts in server and infrastructure architectures. Now, a simple statement of the Law's impact demonstrates its profound effects: by the end of this year we will have at our disposal single socket, x86 architected devices sporting eight processor cores, and by the end of 2012, we will have single-socket, x86 architected devices with 128 cores.

Such a jump begs a very simple but profound question - just what are the server and systems software vendors going to do with all that power? One thing is for certain, server hardware design is going to have to change to accommodate the advances. Even more certain, systems software and infrastructure architecture will need to change drastically if the available performance and theoretical functionality is to be made available to end users. Current systems architectures have reached maturity - which is another way of saying they are now very close to being past their sell-by date. So whether they like it or not, users are about to become obliged to look at new systems architectures in order to exploit the available technology.

Using HPC technology to solve problems

In practice, the search is a simple one, for most of the answers already exist in the world of High Performance Computing (HPC). For years this has been a rarefied corner of IT where specialist technologies are exploited to achieve specialist results, ranging from weather forecasting and climate modelling through to plotting trajectories of deep-space probes. But over recent years specialist HPC technologies such as vector processors have been largely supplanted by commodity devices such as Intel and AMD x86 processors, though the systems architecture has been significantly different to that found in mainstream business systems.

That architecture, parallel processing, is what has allowed commodity processors to become as dominant in HPC systems as in business servers. What is more, it is now very rare for these processors to be designed into purpose-built, specialist hardware. Instead, the majority of systems in the famous 'Supercomputer Top 500' list are now constructed from standard rack servers - often thousands at a time - clustered together to create a unified systems entity.

It is that technology which is set to form the basis of the next generation of mainstream systems architectures, where everything is built around clusters of multiple nodes. The word 'node' in itself requires a little explanation, as the use of a growing number of multicore processors in a single server box makes it difficult to judge what constitutes a single node. Kyril Faenov, Microsoft's general manager of High Performance Computing, defines it as "that which is managed by a single memory controller, and a cluster is any system with more than one node." So, in today's money, that means a dual-socket server running dual-core processors per socket is a single node. Moore's Law eill determine what it represents in next week's money,

The idea of a single `processor' is the fundamental notion that will have to fade away, so using a single memory controller as the yardstick gets away from the problems that will come as the number of cores rise along with the number of sockets in the server. And these numbers are likely to grow even faster than a straight forward application of Moore's Law would suggest. Although the world currently seems glued to the x86 architecture, even in the multicore space, different processor technologies and architectures are starting to appear, and it is quite possible that one these may prove to be a better option for next-generation systems. With the coming parallel processing environments one of the key changes will be a removal of any dependencies between any particular processor architecture and any operating system.

Intel itself is not immune from such developments, already having the experimental Polaris chip architecture, which has a start point of 80-cores per chip. This is just one example of what some in the HPC sector are already calling 'many-core processors', with other contenders being the likes of ClearSpeed, with 96 cores, nVidia's G80 with 128 cores and Cisco's Metro with 188 cores. Currently these are all in design or development, but it gives a clear indication of what is to come in mainstream server hardware - ever larger numbers of (quite possibly more simple) processor cores. This is increasingly comparable to what is already found in the development of HPC systems.

The key to exploiting such hardware technology will be, as with HPC systems, the systems software. This will be the major issue for all mainstream business systems users in the not too distant future for it will represent an unavoidable and significant change to operating systems, management software and, above all, applications code. The move to large volumes of many-core processors will force a move to parallel processing and parallel applications development. The only question facing software vendors is by how much they can reduce the pain that users will have to face. And some pain is inevitable if they are to get themselves into a place where they can fully exploit the extra performance and functionality possible with a parallel processing environment.

Taking advantage of processor development

Applications software vendors are starting to realise that the coming changes in hardware offer them significant potential but, before they can get to it, they also build a mountain of changes that they and the users must first climb together. As a leading applications and operating system vendor Microsoft probably faces the biggest example of that mountain - the huge vested interests of a vast army of existing and entrenched users. To be fair to the company, it is facing up squarely to the task, not least by hiring a well-known parallel processing `mountaineer' in the form of ex-Cray chief scientist Burton Smith.

He outlines the company's - and its users' - fundamental problem in simple terms. Parallel processing marks a sea-change in the way applications are architected and coded, so every user faces two clear choices: one approach is what he calls `the Apple Approach', while the other is based on the way parallel technology can be packaged.

The Apple Approach is, in his book, the clean and simple option of making a step function change from one technology to the other, with little or no thought to the impact on, or consequences for, existing users. In one way it can be very effective, but in most other ways its effect on users would be predominantly negative. Its effect on the vendor community would probably be dramatic, as many existing users would simply decide against change and stay with what they have until that became no longer tenable economically or operationally.

The alternative is what Smith is charged with developing, particularly the ability to package up parallelism so that it can be delivered to users in a form more in keeping with their current operational experiences. Such packaging will need to accommodate the issues presented by Moore's Law, that the number of processor cores in an individual chip will continue to go up. This must be considered together with the architectural approaches now starting to be more widely employed, such as virtualisation and the movement towards SOA-based infrastructures. This means a fundamental break with what Smith calls the Von Neumann Assumption - that there is only one program counter that allows for the proper ordering and scheduling of program variables.

Now there will be more than one counter - in fact it could be an indeterminate number depending upon both total and priority demands on available system resources at any point in time. So the long term Microsoft goal is to build a packaged environment in which parallelised applications code can run without any specific dependencies upon the number - or even type - of processor cores being utilised.

Such a packaged environment will, in his view, need to manage legacy applications code as well as adapted, parallelised legacy code which may be possible using such tools as the new C# and Visual Basic enhancements in the LINQ (Language Integrated Query) .NET Framework extensions. It will also need to support a much wider range of programming styles that parallelism brings with it, such as functional and transaction styles, data parallel and task parallel styles, message passing and shared memory styles and implicit and explicit styles.