SCW_OCTNOV13

towards exascale

Pete Beckman, director of the Exascale Technology and Computing Institute at Argonne National Laboratory

T

here are layers of challenges that stand between us and exascale, and while the industry is focused on reaching that goal,

we still have a long way to go. At the very top of what needs to be addressed is programming. A great deal of uncertainty still exists about how to express billion-way parallelism as our current models have been designed for a way of programming that is, in essence, ‘equal work is equal time’. Tis approach means that scientists divide their applications into portions, distribute these portions across the entire machine, and then assume that each one will execute in the same amount of time as all the others. Te problem is that we know exascale machines are not going to deliver standardised execution due to factors such as power requirements, load balance, thermal issues and the programming paradigm of how to express these massive amounts of parallelism. Right now, OpenMP does not provide a

IT CAN TAKE 10 YEARS OR MORE TO MOVE TO A STABLE ENVIRONMENT

solution to that, and while MPI is perfect for sending messages between nodes, it doesn’t solve the paradigm. We need to figure out how to express parallelism and at the same time build a run time that handles the load balancing – we can’t simply assume that everything will be equally partitioned. Tere are fantastic ideas floating around the industry, but one thing we’ve learned in the scientific community is that it can take 10 years or more to move to a stable environment of languages, compilers and tools. Vendors will need to

come together to promote a single programming

environment that can be stable across all platforms. MPI is a good example of companies coming together to agree on a single standard that everyone is confident in. However, in this space, vendors are still competing with their own unique technologies and approaches. Nvidia has the Cuda programming model, for example.

OpenACC is another option, but it is currently in a state of change. And people are attempting to roll OpenCL features into OpenMP. Te ideal would be for the industry to agree on a standard, and that universities teaching courses on scientific computing would all use this same model. Tis would take us a step closer to exascale. Being able to do more than just a stunt

exascale run by 2018 seems like a stretch of the imagination. Someone could buy hardware and demonstrate something very large, and there’s a lot of pride that comes with that, but it wouldn’t impact scientists who want to move to the next level of their work. We’re further away from exascale than I would have hoped, and part of that delay is down to funding. A more realistic goal for reaching that level of compute is 2020. Tere will be the early adopters who use heroic efforts to add massive parallelism to their code but, until we have a stable programming environment that can be purchased from any vendor, we will struggle to have large communities of scientists using these platforms.

John Goodacre, director of technology and systems, CPU group at ARM

I

t’s easy to see the main exascale hurdle simply as the need to deliver an increase in operations per watt, and you can see a

number in the community simply looking at the most power efficient implementations of the technology that can replace the old components in their existing system; a new GP-GPU fabricated in the latest 28nm; using a low power ARM processor; finding a low power interconnect. Approaching the challenge through such incremental changes, however, is unlikely to achieve the target efficiencies. Te main hurdle is moving the thinking outside of the current system architectures, and beyond the constraints imposed by those technologies. We need to consider all aspects of the

system: the choice of fabrication technology; nanotechnology integration; the design of the silicon system, not just in terms of the processing, memory and I/O of a packaged part, but how that part could be optimised into a sub-system of parts to deliver the required compute density. Te reliability and manageability of such a system can’t be ignored

www.scientific-computing.com l

as the compound effect of mean time between failures (MTBF) cripples a system containing millions of parts. Ten all this hardware innovation must be designed in harmony with the soſtware. You can’t assume an application that scales to a few thousand cores today will scale to a million and beyond, and you can’t assume that the latency and bandwidth expectation of that soſtware will simply work if you reduce the costs of communication. In the next 12

months we’ll see the first compute devices that have adopted a low-power processor integrated design, along with integration of the latest connectivity I/O. Tese integrated devices will be able to demonstrate the efficiency saving that system on chip (SoC) design and I/O integration can bring to a system, removal of abstract interfaces between processing and I/O and the reduction of Watts/Op. Te realisation of the now hugely

@scwmagazine WE NEED

increased relative cost of this communication will drive the next phase of holistic system design and we’ll see both the optimism of more efficient compute at increased density, and the pessimism that applying all the component level optimisation will not be enough to reach exascale. Te challenge is that the community is full

TO CONSIDER ALL ASPECTS OF THE SYSTEM

of highly specialised individuals. Te processor design engineer knows little about the interface characteristics of the latest connectivity I/O, and the algorithm writer knows little about the advanced memory models that could be enabled by nano- technology integration of new memory types supported by 3D IC

integration. Once a holistic view can be applied to the problem, then I predict the scalability challenges of exascale will start to fall into place; the order of magnitude benefits each aspect of the system when added together brings to the solution will overcome the challenges.

OCTOBER/NOVEMBER 2013 35

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52