SCW_APRMAY12

supercomputing software challenges Another problem with exascale, adds Aman,

‘Scientists don’t always apply the same scientific rigour to the behaviour of their software that they do to their academic research’ Dr David Henty, HPC Training and Support at EPCC

and in many instances, it is a combination of factors that limits performance. Te level of effort required to take full

advantage of multicore and especially heterogeneous computing is intimidating, Dunning’s team adds. Scaling applications to tens or hundreds of thousands of compute cores oſten requires rethinking of the algorithms and, at times, even the fundamental computing approach. Effectively using many- core heterogeneous processors poses even greater problems. Although the advantages of employing modern computing technologies have been amply demonstrated by the research community, only well financed companies can typically afford to rewrite and revise their large base of legacy applications unless absolutely forced to by the competition.

The need to retool For exascale soſtware, humans are the limitation in how quickly people can be trained on new techniques says SGI’s Aman. We must reskill programmers and this is a generational effort. What we learn in university is what we generally use for a lifetime; it’s difficult to retool. By and large, we still write large- scale soſtware in old generation languages. ‘When we talk to the engineering team at NASA, which runs some of the largest supercomputers, we find that they do most programming with Fortran, with some PHP around the edges.’ As you get further away from CPUs, it’s more

difficult to figure out how the soſtware should be written to get the maximum out of all the layers: the OS, compilers and applications. In soſtware, we’re not necessarily writing applications for the hardware five years away; we’ll figure out later what to do with it.

36 SCIENTIFIC COMPUTING WORLD

is that we must expect something to break ‘every five minutes’. Te MTBF of a memory chip might be a million hours, and at the desktop/laptop level an error is not likely in the computer’s lifetime, but in an exascale system with petabytes of memory, do the math: a DIM will fail catastrophically every other hour. Te soſtware can’t just stop and restart a huge job that’s been running for days or longer. We must build in resiliency where the system can detect that a memory chip is going bad and exclude it from the pool. Te soſtware must be more resilient to other similar hardware failures; we must eliminate all single points of failure. And let’s not forget errors in thousands upon thousands of disk drives. Most applications today run in lockstep

and are synchronised, but this approach won’t work on exascale machines with millions of cores, so says Dr David Henty, HPC Training and Support at the EPCC (Edinburgh Parallel Computing Centre) in the UK. Obviously, we have to break problems down into many smaller parts. Furthermore, it’s not just how programs are written, as standard OS aren’t designed to rapidly switch among large numbers of tasks and also aren’t ‘parallel aware’. Tat is, they might be able to de-schedule a job if it’s waiting for data from disk, but aren’t aware of the fact it might be waiting for messages from another processor. In the past 15 years, he adds, we’ve been

programming machines in essentially the same way. A big challenge will be using multiple programming models in the same program. Te community first standardised on MPI and then with threads such as OpenMP, and this has allowed us to move forward during this time. Now we need new standards addressing how to program accelerators, such as with directives – but we also need standards for these directives, which are just emerging.

Compilers try different approaches Today’s compilers typically take a conservative approach that is certain to work and therefore don’t examine multiple approaches. Compilers presently can ask the user for information about special cases, such as with directives, but in a large program running on an exascale system, you won’t know what information the compiler needs. While techniques for programming a few thousand cores are still working adequately, they won’t scale up to exascale. A number of projects are addressing this, one of which is CRESTA (Collaborative Research into Exascale Systemware, Tools and Applications), which is based at Te Edinburgh Parallel Computing Centre (EPCC).

It is investigating intelligent compilers where program tuning is included, which examines the code, tries different approaches, runs them and then picks the best results. As well as being built into a compiler, this ‘intelligence’ could also be implemented in a higher level program that auto-tunes the code by compiling and running many versions. As for writing programs, adds EPCC’s

David Henty, scientists don’t always apply the same scientific rigour to the behaviour of their soſtware that they do to their academic research. Tis opinion is backed by a Princeton study of scientific programming trends presented last November at SC11[1] which states: ‘…scientists have hypotheses on which portions of their code are hot [where considerable execution time is spent], but more oſten than not do not test these hypotheses. Consequently, scientists may not be targeting the important sections of their program.’ Tus, he continues, when developing

programs, snap decisions shouldn’t be made. It’s very important to understand the limitations and problems in your code. What is it in your program that hinders scaling? It’s oſten not what you think. Are there load

TODAY’S SOFTWARE

SIMPLY WON’T SCALE UP TO RUN ON MILLIONS OF CORES

imbalances? Synchronisation issues? Too many messages needed? You need to do experiments in the soſtware to convince yourselves of what’s actually going on. Learn to use tools, such as performance analysis utilities, to get a good understanding. Take advantage of knowledge of new techniques before fully committing to a method. Ten try incremental methods to make the code work faster, oſten with a mixture of OpenMP and MPI or implementing specific routines in a new model such as using accelerators or new languages like UPC or co- array Fortran.

What’s happening today? Meanwhile, what are ISVs doing today to deal with these issues? To find out, I spoke with one of the largest suppliers of scientific soſtware, Te MathWorks. Jos Martin, principal soſtware engineer, comments: ‘We’re not utterly focused on exascale because that’s too big for our customers at this time. Instead, we’re one step back but we face the same issues.’ He adds that scientists have no interest in writing soſtware that talks directly to large clusters and that his

www.scientific-computing.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52