This page contains a Flash digital edition of a book.
HPC 2013-14 | Tuning soſtware for HPC


Staying ahead of the curve


With hardware developments continuing apace, industry experts discuss how to meet the challenge of tuning software


Pavel Ivanov and Maxim Krivov, of ttgLabs, discuss tuning for heterogeneous systems


In the development of industrial or scientific computational HPC packages, optimisation still remains a real challenge. Its primary objective is not just achieving an application performance gain but rather minimising the computing costs. For instance, if a computer system has not yet


been deployed at the enterprise, an optimisation of the target package will allow the installation of a smaller scale cluster, reducing both the initial and maintenance expenditures dramatically. When computations are carried out almost continuously and for an extended period of time, tuning the package can significantly reduce energy costs. Efficient optimisation results in even more noticeable savings if HPC resources are consumed from clouds like Amazon EC2. Since cloud HPC resources are paid on a ‘per


usage’ basis, the acceleration of computational packages by even 10 per cent is immediately converted into a reduction of total payments. Te final economy differs from case to case, but in large computational projects it easily reaches


22


hundreds of thousands of dollars. Today, the concept of optimisation of computationally intensive code is frequently associated with, or even completely depends on, the transition to graphics accelerators and coprocessors, such as traditional GPU coprocessors, new Intel Xeon Phi, or built-in GPU graphics cores like AMD APU, Intel HD Graphics or Nvidia Tegra. Te use of accelerators and coprocessors


pursue the same goal – namely an increase in the rate of computations. Terefore, customers that see no need for optimisation of computationally intensive modules usually do not pay attention to GPU-based computing. However, if they have undertaken a move to a heterogeneous architecture (i.e. CPU+GPU), the objective was to get a substantial performance gain.


Libraries of components Tere are four main paradigms of soſtware adaptation and optimisation for heterogeneous clusters. Te simplest, while an extremely effective approach, is to use a library of the off-the-shelf components that form the basis of computationally intensive modules. Typically, these libraries are domain-oriented – image and audio signals processing, linear algebra, or machine vision, for example. In such cases, the developer has only to code the algorithm using the provided functions or classes while the


burden of soſtware optimisation and tuning up to a specific accelerator will be shouldered by the developers of the library. Te most striking examples of this approach


are well-known interfaces BLAS and LAPACK, which are implemented in libraries such as cuBLAS by Nvidia, CuLa by EM Photonics, MAGMA, Intel MKL, and AMD ACML. Te major disadvantage of this approach is its limited applicability. If the main part of the algorithm can’t be coded using the selected library, all the advantages of this approach will be lost as the developers will have to make adjustments to all target accelerators by themselves. Tis approach also encourages the usage


of components, but these are customised programming primitives rather than off-the- shelf functions. Unfortunately, they implement only basic operations, such as transform, reduce, scan, sort, etc., and, occasionally, data containers. Tus, the programmer has to code the required method using a variety of well-optimised universal primitives, most of which can be easily adjusted to a specific task. Te advantage of this approach is that, for example, major sorting algorithms are already implemented so a lot of heterogeneous programming problems such as the selection of the best calculator for a specific task, load balancing, or generation of several ‘branches’ tailored to various types of


Agsandrew /Shutterstock.com


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36