SCW_APRMAY12

supercomputing software challenges

job is to add language features in Matlab to make life easier. As a specific instance, he refers to the

PARFOR parallel FOR loop in the Parallel Computing Toolbox and also points to the Princeton study of scientific computing.[1] It states: ‘Te most dominant numerical computing language in use was Matlab – more than half the researchers programmed with it… Only 11 per cent of researchers utilised loop-based parallelism, where programmer- written annotations enable a parallelising system to execute a loop in parallel. Te most common form of loop-based parallelism was the use of the parfor construct in Matlab, which enables execution of multiple iterations of a loop in parallel and requires that the iterations access/update disjoint memory locations.’ Martin continues by emphasising that we

must learn a new way to program and that accelerators, such as GPUs, have changed the model. For instance, we rely heavily on constructs to do interesting things such as memory allocation, heaps, stacks and other fundamental program constructs. GPUs, however, don’t have many of these constructs. We also realise that we are running on many threads with many constraints. GPU vendors have been trying to relax these constraints, but they also face hardware limitations. Everyone is looking for easy ways to write these programs. One key member of the Numerical

Algorithms Group’s training team, Dr Ian Bush, comments: ‘Tere is very obviously a gap between where academic institutions finish and where efficient use of supercomputing resources can begin. In essence, there are two pieces needed to bridge this gap. First, programmers need to be taught about the range of tools and techniques, such as OpenMP and MPI, that are available to make optimal use of HPC. Te second is to teach them the ways of deciding when and how to apply which tool.’ Tere is no doubt that hardware innovations

are happening at a rapid pace, but they also present configuration and management challenges that demand a different approach says Matthijs Van Leeuwen, CEO of Bright Computing. System admins oſten fall back on familiar ways with their new clusters – using cluster management toolkits and heavy scripting to build and then manage their systems and these new technologies – and work their way through a learning curve in the process. Unfortunately this practice robs them of the significant productivity gains that could be realised by using an integrated solution. Aside from drastically reducing time to set up and use their clusters, they needlessly sap their own productivity and system

38 SCIENTIFIC COMPUTING WORLD

requires expert understanding of both the application and hardware that is being targeted. Companies that write and run their own

performance moving forward. Tere is also a huge opportunity cost here: the vast amount of time they spend scripting and keeping these tools synchronised is usually at the expense of focusing on other priorities that could take more advantage of the advances in hardware. Te hardest part is re-architecting soſtware

to use parallel algorithms instead of serial algorithms, comments Sumit Gupta, director of Tesla Product Marketing at Nvidia. Tis is a common task that the developer must do, no matter for CPUs, GPUs or even FPGAs. Auto-parallelising compilers can help in the form of the recently announced OpenACC GPU directives compilers from PGI, Cray and CAPS. But even for these compilers to be effective,

the developer has to at least re-architect his soſtware to use data structures that expose parallelism to the compiler. Tis is something done by big research labs, major ISVs and companies like oil and gas firms where performance is critical. Tis is why these are also the first to adopt GPUs; their code is already ready for the massive parallelism that GPUs offer. Te challenge is for the vast

majority for whom the best approach forward is to adopt OpenACC GPU directives. Tis method does not require major code changes, but everything the developer can do to expose parallelism in the data and in the algorithms gives more speedups. With the end of frequency

scaling and the rise of heterogeneous computing, much soſtware is being leſt behind with performance stagnating, says Oliver Pell, VP of Engineering at Maxeler Technologies. Increasing numbers of companies are rewriting their code to embrace heterogeneous computing, but this is a process that really

soſtware can evaluate the costs of changes compared to the business value or TCO benefits, but for ISVs the situation can be less straightforward. It might not be possible to charge extra for a version of their soſtware that takes advantage of heterogeneous computing, so they are leſt with just the cost side which makes it unattractive to invest in that area. For the end user, maximising performance

is increasingly going to be a reason to need control over your own soſtware rather than using third-party options that don’t fully exploit the capabilities of your hardware. On the other hand, vendors who are able to supply integrated soſtware and hardware solutions will see their customers benefiting significantly from the greater performance these solutions can deliver.

Further information:

CRESTA cresta-project.eu

Bright Computing www.brightcomputing.com

Edinburgh Parallel Computing Centre www.epcc.ed.ac.uk

The MathWorks www.mathworks.co.uk

Maxeler Technologies www.maxeler.com

National Center for Supercomputing Applications www.ncsa.illinois.edu

Numerical Algorithms Group www.nag.co.uk

Nvidia www.nvidia.co.uk

SGI www.sgi.com

A temporary performance gap Te NCSA team summarises the entire matter nicely: For the next several years, we will see an increasing gap between the peak performance of a computer and the realised performance on real science and engineering applications. Te number one computer on the Top500 list will still be touted by the institutions that deploy them, but scientists and engineers may grow frustrated when trying to use them to do their work. Te advances in computing power as written on paper or with simplistic measures will not match those ‘on the ground’. Eventually, however, investments in soſtware – especially if those investments increase as planned at the US Department of Energy and National Science Foundation, also in the US – will begin to decrease the gap. As this happens, the fidelity of computational models will dramatically increase and it will enable computational scientists and engineers to model the complex, real- world systems of paramount importance to society.

Reference 1. Prakash Prabhu et al., ‘A Survey of the Practice of Computational Science’, Proc. 24th ACM/IEEE Conference on High Performance Computing, Networking, Storage and Analysis (SC11), Nov’ 2011.

www.scientific-computing.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52