SCW_OCTNOV13

towards exascale

William Gropp, director at the Parallel Computing Institute, deputy director for Research at the Institute for Advanced Computing Applications and Technologies, and Thomas M. Siebel, Chair in Computer Science at the University of Illinois Urbana-Champaign

I

n order to reach exascale, a more explicit focus on programming for performance is required and any idea that we can delegate

that problem to programming models or soſtware tools is misguided at best. We’ve never been able to achieve this and there is no evidence that we would be able to in the future. People have been trying this approach for a long time and occasionally we do see demonstration cases that can work, but in general it’s been a very hard process, particularly when the types of adjustments that need to be made to improve the speed of codes are at odds with keeping those codes clear. Because they will require more specialised data structures to be optimised, Exascale systems are only going to make this situation far more complicated. As an industry, we’ve been plodding

along pretending that someone else will deal with the performance issues, but we need to recognise that performance is part of correctness, rather than something we hope for. An underperforming code represents a big problem and just as we have programming constructs to help us with correctness we need performance constructs to help us with

correctness. Tis will also address many of the performance ‘surprises’ that people see today – whether it is OS noise and performance irregularities or interactions between different programming systems, such as between MPI and OpenMP. In the case of MPI and OpenMP programming, there’s really nothing wrong with either of these as they stand, but the tools that were supposed to work on top of them never materialised. Te idea that replacing them will fix our problems is not the only way to go, and in fact wouldn’t necessarily address the real problem: performance. Apart from those of us who find it intellectually stimulating, parallel programming is never done for fun! It’s done because it has become a necessity, and it doesn’t make sense that greater emphasis hasn’t been placed on building tools for this area. Of course, there are tools that attack some

of these problems, for example, by rewriting a lot of code. Domain-specific languages, for example, are an attempt to do this by enabling programmers to express what they want to do

EXASCALE SYSTEMS WILL MAKE THIS SITUATION MORE COMPLICATED

at a higher level of description. Because there is a narrower focus, the higher level languages are less general purpose than MPI. While this offers more knowledge about what’s going on, the complier may have trouble discovering what it needs to do. Tese are undoubtedly steps in right direction, but they are only steps – few tools are interoperable or general purpose enough. Te other problem with

domain-specific languages is the word ‘domain’. I prefer to look at them as data structure specific languages, because that’s essentially what they are. Matlab, for example, does not

apply to any single scientific domain. Rather, it’s a matrix language. Regular grid languages can also apply to any scientific domain that requires a regular grid. We’ll be on the right road as soon as we can

learn to make these tools interoperable, and can view them as languages that handle parts of algorithms and data structures needed for a particular part of a calculation. But we need to make performance part of programming in order to get there.

Bill Dally, chief scientist and SVP of Research at Nvidia P

ower efficiency is challenging because the magnitude of the gap to be closed is large and the amount of gain that

we can expect from better semiconductor processes is much smaller than in the past. Today the most energy-efficient supercomputers – those at the top of the Green 500 list – are based on Nvidia Kepler GPUs and have a power efficiency of about 2Gflops/W. To get to exascale within 20MW (a stated goal), we must achieve 50Gflops/W, a 25-fold improvement. It’s as if we had to improve the efficiency of a car that gets 20mpg to get 500mpg. To make this 25-times gap even more

difficult, the gains we are now getting from process improvements have been greatly reduced. Back in the days of voltage scaling, a new generation of process technology gave about a 60 per cent reduction in energy. Today, each new generation gives only about a 20 per cent reduction in energy. Over the

www.scientific-computing.com l

three generations between now and exascale process technology will only give about a 2.2-times improvement leaving about 12-times to be achieved by other means. Programming with massive parallelism

is likewise challenging because it requires a change to how programmers think and program. Today a large supercomputer, like the Titan machine at Oak Ridge National Laboratory, requires roughly 10 million threads – independent pieces of work – to execute in parallel to keep busy. An exascale machine will require 10 billion threads to keep busy. Tis thousand- fold increase in parallelism requires rethinking how many applications are written. It will require what is called ‘strong scaling’ where we increase the parallelism

@scwmagazine

more rapidly than we increase problem size. I am optimistic,

AN EXASCALE MACHINE WILL REQUIRE 10

BILLION THREADS TO KEEP BUSY

however, that we will rise to the challenge of programming with massive parallelism by creating better programming tools that automate much of the task of mapping an abstract parallel program to a particular machine architecture. Such tools will use auto-tuning to find optimal mappings, enabling the 1000-fold increase in parallelism without burdening the programmer.

Ultimately, the gaps of energy efficiency and parallel programming will be closed by a myriad of small steps. Improved technologies are being reported every year in the circuits, architecture, and parallel programming conferences.

OCTOBER/NOVEMBER 2013 37

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52