SCW_OCTNOV15

high-performance computing

Lessons from GPUs GPUs have been widely adopted in HPC over the past five years, and much of this is to do with the large-scale grassroots efforts that Nvidia has spent on developing its programming language Cuda. Driving adoption of the

technology required significant investment of both money and time, but Nvidia made smart decisions about partnering with many universities to encourage adoption of the technology by academics. Tis was in addition to working with research centres and university HPC centres across the globe to expand the Cuda user base. Roy Kim, group product

marketing manager at Nvidia said: ‘Tere are different classes of developers within HPC; they are not all the same. Tere are some developers that really want to get their hands on and tune for performance on the GPU. Tere are others who want to get performance quickly and get on-ramp as soon as possible and really focus more on the science of their application. ‘So we have OpenACC and Cuda.

that is going to be good for your application, or you are going to have to maintain that application across multiple architectures over time,’ concluded Snell. To get a sense of the scale of

the problem, the US Lawrence Livermore National Laboratory uses large, integrated physics programs that contain millions of lines of source code and tens of thousands of loops, in which a wide range of complex numerical operations are performed. Although these are intended for military applications in the US nuclear weapons programme, there are civilian applications of high-performance computing – for example in the oil and gas industry – that also employ soſtware with hundreds of thousands if not millions of lines of code. Any changes in hardware or parallel programming methods will make it very difficult to achieve high performance without disruptive platform-specific changes to this type of application soſtware.

www.scientific-computing.com l

Cuda is really targeting the first set of developers and OpenACC is targeting the second set.’ Kim said: ‘Parallel programming

is hard. It’s not easy to have someone think about hundreds or even thousands of threads in parallel. Tat is the challenge that the modern HPC developer has. Cuda solves a big chunk of that programming issue, which is why it became such a pervasive programming model within HPC. It made HPC programming easier.’ Kim concluded: ‘OpenACC

really offloads a lot of the burden of parallel programming to the compiler and the complier does most of the heavy liſting for the developer. For developers that want to get acceleration on a GPU quickly, OpenACC is the right path.’

Beyond Cuda and OpnACC Te problem facing HPC developers today as they look at soſtware scalability is compounded as computer technology moves closer towards exascale. While

@scwmagazine

most users can make use of tools such as Cuda and OpenACC today, if they wish and if they have the skills and knowledge; those at the most extreme end of HPC must look to methods which can take them beyond the current levels of parallelism and node performance. Whether they be the top

supercomputers at the US National Laboratories or corresponding institutions in Europe and Asia, they all face a similar challenge of scaling and supporting applications at an unprecedented scale. Tis is driving development

of programming models that can support the technology and promote increased parallelism and

want to lose that. It would be akin to burning your entire library and starting anew.’ Hazra also explained that because

programmers were using similar techniques, such as threading and vectorisation to parallelise code, investigating speed-up on Xeon Phi would not be a wasted exercise because the improvements would still impact performance even if the speed-up of the code on Xeon Phi ended up being insufficient. Hazra said: ‘Tat is a huge

economic gain for companies that have applications and do not have the money to throw away on months of work’. He also stressed that this meant programmers could

LIVERMORE USES PHYSICS PROGRAMS THAT CONTAIN MILLIONS OF LINES OF SOURCE CODE AND TENS OF THOUSANDS OF LOOPS

portability of code. Programming models are typically designed to make increasing performance easier for developers, but rapidly changing processor architectures and the increasing complexity of platforms that will support exascale applications are significant barriers to the design of future implementations of these models. Tis is compounded by the

fact that the next generation of supercomputers will increasingly rely on concurrency and complex memory hierarchies while maintaining a sufficient level of interoperability with today’s applications.

Intel: preserve the legacy code Perhaps Intel has the easiest job in maintaining its established user base. Tis is because it is extending languages, models, and development tools from the Xeon CPU family across to the Xeon Phi. Rajeeb Hazra, VP of Intel’s

architecture group and GM technical computing said: ‘Tere is a lot of legacy code; there is a lot of knowledge about how to write such code that has been built up in the soſtware industry; and we did not

use Xeon CPU as a development environment for Xeon Phi code using AVX instructions for example. Nevertheless application

developers still have a lot of work ahead of them if they wish to maintain codes through the next generation of HPC as it will increasingly rely on parallelism and concurrency to achieve performance gains.

How to scale software? Kim said: ‘Soſtware scalability is a key issue that the industry is grappling with.’ He explained that supercomputers are getting wider or more parallel rather than getting faster through increased processing power. Kim said: ‘Tere is scalability

within a server node and there is scalability across nodes. Within a node, having accelerators like GPUs gives the node high performance and lots of parallelism and that is where you are using things like OpenACC or Cuda. ‘MPI has been around for a long

time and there is some overhead in terms of both processing and memory footprint but that is pretty lightweight, so it is the HPC developer’s tool of choice for

OCTOBER/NOVEMBER 2015 23

➤

Welcomia/Shutterstock.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44