SCW_OCTNOV13

analysis and opinion

Supercomputing the reality behind the vision

Andrew Jones explains why, when it comes to high-performance

computing, science/dollar is not the same as Flops/dollar

might easily conclude that HPC is about getting as much money as possible from your funding agency or board, and then buying the most Flops capacity (crudely ‘calculations per second’) possible. It’s even better if this is done by choosing a computer system that is in some way unique – we like ‘serial number 1’. We then proudly issue press releases declaring it as the biggest supercomputer in [xyz] – where the category [xyz] is carefully chosen and defined such that your supercomputer is at the top of the pile in [xyz]. Tis game of getting the most Flops possible

A

has been boosted in recent years by the emergence of new processors, with a greater proportion of silicon devoted to calculating units: graphics processing units (GPUs), especially from Nvidia; and Intel’s Xeon Phi (which my brain still defaults to calling MIC because I’ve known it as that for so long). GPUs and Phi (is the plural Phis?) promise maybe an order of magnitude more Flops for a given dollar or power budget than traditional processors. So, a big budget, a data centre full of racks

with as many cores as possible, and plenty of GPU/Phi cards wedged in to get that Flops capacity [score?] as high as possible. Now what? Well, this pile of silicon, copper, optical

fibre, pipework, and other hardware makes an imposing monument that politicians can cut ribbons in front of and eager managers can give tours around. But something else is needed to make that pile of processed sand, metal and supporting gubbins into the powerful multi- science instrument that the funding agency sought, or the engineering design capability that convinced the company management. Tat something else is a complex ecosystem of system architecture, soſtware, and people. Well designed and implemented system

22 SCIENTIFIC COMPUTING WORLD

casual observer from outside the high-performance computing (HPC) community watching our events, news sites and discussions

architecture is required to make sure Flops engines (whether GPU, Phi or CPU) can do useful work. I’m not going to delve into that, only to say it is the art of balancing the desires of capacity, performance and resilience against the frustrations of power, cooling, dollars, and space. Characteristics such as having most of the Flops promise residing in GPUs or Phi co-processors, or larger than average scale, or ‘serial number 1’, all make this more interesting. But even perfectly architected hardware is

powerless without soſtware. Soſtware is the magic that enables the supercomputer to do scientific and engineering simulations. Of course, it is not really magic, even if it sometimes seems that way. Soſtware is a complex collection of applications (maths, science and engineering knowledge craſted into bits), middleware (to

IF WE WANT THE

BEST CAPABILITY OF PEOPLE... THEN WE WILL HAVE TO INVEST IN DEVELOPING AND FUNDING THEM

make the entire ecosystem chug along smoothly) and tools (to fix it when it doesn’t). In fact, whisper it loudly, soſtware is infrastructure – yes, infrastructure. It needs investment to create and maintain, it takes time to build and usually provides capability for a multitude of use cases and hardware platforms. Soſtware can [should] be a highly engineered asset that, in many cases, is worth far more than the lump of tin that usually attracts the ‘infrastructure’ label. Application soſtware encapsulates some

existing understanding of the relevant maths, science and engineering of a problem or system. Tis virtual knowledge engine is combined with an understanding of the hardware and cooperating soſtware resources

(e.g. communication libraries) into a set of methods and processes that enable a user to study and predict the behaviour of the [science/ engineering] problem or system, or to test that encapsulated understanding. Hopefully, the keen-eyed reader will have

noticed the critical word in that preceding paragraph. It was ‘user’. Delivering science insight or engineering results from this powerful tool of hardware and soſtware requires users. In fact, it requires an ecosystem of people: the scientists/engineers who understand how to apply the tool effectively; computational scientists and HPC soſtware engineers to develop and optimise the application soſtware; HPC experts to design, deploy and operate the hardware and soſtware systems; and professionals to develop a HPC strategy, match requirements with solutions, procure capabilities, and ensure a productive service. Just as we need a roadmap for hardware

technology and a recognition that soſtware needs long-term investment, we also need a long-term plan for the people. We need to invest in this part of the ecosystem too. Te component units (that’s us lot) have a long preparation time (education) together with a plethora of exits-from-useful-service (from the predictable such as retirement, to the unpredictable and fast- acting such as a better job offer). And, because the demand for HPC and the complexity of HPC is growing, we need more people of varying skill sets. If we want the best capability of people with sufficient capacity, then we will have to invest in developing and funding them appropriately when in place. Getting a HPC capability to deliver the best

science or engineering is harder than just Flops/ dollar or Flops/Watt – or to put it another way, science/dollar is not the same as Flops/dollar. But when the ecosystem of hardware, soſtware and people is properly resourced and balanced, our causal outside observer might not see HPC at all – just an incredibly powerful scientific instrument or a capability-defining engineering design and validation tool.

Andrew Jones is VP of The Numerical Algorithms Group’s HPC expertise, services and consulting business. He is active on twitter as @hpcnotes

@scwmagazine l www.scientific-computing.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52