This page contains a Flash digital edition of a book.
heterogeneous computing

with Intel chips, but it has plans to offer the first Opteron-based solutions that combine AMD and SeaMicro technology in the second half of this year. If the trend of taking large-volume chips

from the mobile phone/tablet space and adapting them to HPC systems continues, as it likely will, we might also see some interesting developments coming up. Just two examples are ARM, which has its own GPU called the MALI, and Qualcomm, whose Snapdragon processors have their Adreno GPU. Tese devices are currently optimised for graphics applications, but remember that Nvidia got its start in the graphics acceleration business as well. If the HPC market grows large enough, it could entice these chip vendors to add features needed for HPC.

processors such as the ARM, FPGAs, AMD GPUs and the Intel MIC (Many Integrated Core) architecture. In addition, Cuda 5 also supports full GPUDirect, meaning that GPUs can communicate directly among themselves not only on a card, but now GPUs on different cards can also communicate directly. Finally, in this latest version of Cuda, Nvidia has taken its Nsight (a plug-in for Visual Studio) and ported it to Linux and the Mac. It now allows third- party GPU library object linking.

Experience with both x86 and GPUs AMD has also been active in the GPU space and today has APUs (Accelerated Processing Units, which are combination CPU/GPU chips) ‘but they currently have no ECC and are not ideal for HPC,’ notes John Fruehe, director, product marketing, server at AMD. Tese are client-side APUs, with the Llano being more akin to a low-end server APU. ‘However, people are developing larger

systems with these chips and these APUs will start appearing in the server space once we have ECC in 2013,’ adds Fruehe. What will make AMD attractive in this market? ‘For Intel, the world is an x86 problem and every problem has an x86 solution; for Nvidia, every problem has a GPU solution. We have experience with both x86 and GPUs – we’re the only people talking seriously in both technologies.’ As for the advantages of ARM, Fruehe

likewise points out that it is not yet 64 bits and the soſtware ecosystem doesn’t exist to anywhere close to the extent that it does for the x86: ‘It will be much easier for us to drive down the power curve and have an x86 ecosystem

than for ARM to go up the power curve as they add cores and bandwidth. It will be a better option to go with AMD devices targeted at that space and have the x86 soſtware ecosystem.’ It’s also interesting that AMD recently

acquired SeaMicro, whose low-power supercompute fabric enables heterogeneous systems with thousands of processor cores, memory, storage and I/O traffic. SeaMicro’s patented technology, called CPU input/output (I/O) virtualisation (IOVT), reduces the power draw of the non-CPU portion of a server by eliminating 90 per cent of the components from the motherboard, leaving only three components: the CPU, DRAM and the Freedom ASIC. Tis latter device is needed because SeaMicro currently uses off-the-shelf Atom chips and thus cannot design its own SoC devices with integrated Atom cores; this separate ASIC handles the storage and networking virtualisation. Even so, CPU I/O

virtualisation allows SeaMicro to shrink the motherboard to the size of a credit card. For example, the SM 10000 family of servers integrates 768 Atom x86 cores packaged in 64 compute cards, top-rack switching, load balancing and server management in a single 10 RU system. It’s unusual that AMD is now selling a system

A common code base It’s worth mentioning that developing soſtware for these heterogeneous systems will not get easier in the short term. In the past, explains Wen-Mei Hwu of the Illinois Microarchitecture Project utilising Advanced Compiler Technology (IMPACT), people had to write a different kernel for each GPU type for a specific application. It thus becomes very costly for soſtware developers to do multiple versions, especially as maintenance costs blow up. He adds that there is an entire movement to







Illinois Microarchitecture Project projects.aspx





eliminate the need for developers to replicate their code base. Towards this goal, the IMPACT Project developed the Multicore Cross-platform Architecture (MXPA), an OpenCL runtime and compiler which enables cross-architecture performance from a single, unified codebase. Tis technology was recently sold to MulticoreWare for commercialisation, and Dr Hwu is now the firm’s CTO. He notes the advantages of MXPA: it enables multicore x86 performance comparable or superior to existing implementations of the OpenCL language, it can extend the performance of OpenCL applications to multicore platforms without dependencies on a client- installed OpenCL runtime or exposure of uncompiled source code, and it can retarget arbitrary hardware platforms through a programmable specification to transform C-language intermediate representation to targeted C compilers and threading libraries.

JUNE/JULY 2012 39

Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52