Cray shows the value of Secret Sauce

Cray XC30

Why would it interest you that Cray's results show a steady and well above-market level of growth for the last three years? If there was ever a remote ivory-tower company, it would be the one that fit's Cray's surface description.

Founded in 1976, with special abilities in supercomputing that go back to practically the dawn of information technology, Cray spent some time in the acquisition and merger wilderness as part of middle-sized fish SGi, before coming back out into the direct limelight in the last few years. While the ownership may have changed, and the system design moved on a good distance from the old "computer disguised as a hotel foyer sofa" days, Cray has at least stuck to the knitting: back in 1976 the primary business was supercomputing, and so it is now.

Which is pretty peculiar, when you think about what supercomputing has been all about. The most recently remarkable advances in that field have been around Nvidia's CUDA architecture, which exploits the extreme calculation speeds achieved in graphics cards, for doing (partly) non-graphics related maths work at speeds far in excess of those achievable with a general-purpose CPU burdened with all the other parts of a classical operating system. This has been a case of a kind of game of Jenga on the part of systems designers: The initial tower of wooden blocks was built to support ever-escalating appetites for faster and faster games, with screen sizes and frame rates accelerating well beyond the limits of human perception right into speeds where they were more useful than the supercomputers of the day.

Now, the supercomputing marketplace has built that tower ever higher, with the attendant risk of it toppling over: it's become custom and practice for the compute work to be done by optimised add-on cards, generally hosted in generic rackmount server hardware. Intel has even made its own add-on compute card which has no role as a graphics card anymore.

What is the toppling pressure? The shift from predictive models, which take a short list of starting conditions and values, and then recompute them over and over again within a tiny walled garden of hyperspeed hardware, to models informed by altogether too much data. Cray's example is a baseball team which has a Cray: I have tried to explain that as case studies go, this one doesn't travel especially well but the basic premises of the example are still relevant. A ton of data comes whistling in as the match progresses, and there are pieces of advice the machine can prioritise which depend on complete oversight of not just all that momentary data, but also the historical long tail.

This is not the kind of job that can easily be cut up between thousands of tiny compute chips on a graphic board. Flooding data past in a structure amenable to the style of a query being presented becomes an absolutely vital skill.

This is where Cray thinks it has a distinct advantage. What other projects in the supercomputing field leave as the "secret sauce" of the implementation and operations teams, Cray brings right into the system design, just like it always has done since 1976. All through that long haul in the shadows, it has worked on ways to take the secretness out of the sauce.

What is this? First of all the analogy is to a dinner-party game best played drunk, then all of a sudden we are on to cookery. What's the theme here?

My first couple of pointers on this topic came from two very different sources (not sauces!) the first being an absolute evangelist for the One True Way in supercomputing. He was going to build a fluid flow modelling system, using completely standard white-box machinery, lots of different lumps of Linux, and Infiniband to glue it all together. We looked forward to a massive cardboard unboxing day and lots of screwing and plugging and so forth, only to be told there was no chance: Everything had to be put together by the reseller, because if they left their "generic" components in the hands of the uninitiated, none of it would work. The second incident came while a nice professor at Cambridge was effusively thanking the team who keep the work queue going, feeding diverse data sets and problems into the multi-rack, megascale supercomputer put together for him by Dell & Intel. "these are the guys that add the secret sauce" he said.

Cray's point is: Why is that secret, and why is it added invisibly and unaccountably by humans? The interconnection capability isn't magic, or for gurus only. It is meant to be pretty dull, shovel-work, making sure the investment gets given enough data to really start to pay back. What is revealed by this thinking is that where a lot of computing these days is handled under a "general magic" heading, with the nerds being justified by way of what they guard, make possible, or are able to fix on a bad day.

Cray believes its approach eliminates the rather open-ended approach common to this type of rollout. The interconnect, in Cray parlance, is as clearly defined as the compute resource, nd the speed at which a whole pile of data is turned into a reference or used to skew a result is a predictable quantity not just left to chance in the config of an unpopular or invisible low-level component. As supercomputing shifts from being a rare, prediction-oriented mathematical dance on the head of a pin, to a daily grind in trying to gain market advantage over your peers, this kind of shift from unknown magic, over to known performance statements, makes a lot of sense to a lot of buyers.

Which takes the mystery out of its results, too.