This page contains a Flash digital edition of a book.
heterogeneous computing


A blending of strengths


With the adoption of new device technologies and widespread use of heterogeneous systems, the nature of HPC systems and even the definition of a server are changing. Paul Schreier explains


W


hen it comes to heterogeneous computing, in the past we’ve looked upon this primarily as adding accelerators (GPUs, FPGAs, vector


processors) to an x86 system. And while there is clearly some important news on the GPU front that I’ll be reporting on in this story, another trend has emerged in the past year: efforts to include low-power processors, found in mobile phones and tablet PCs, into HPC systems. Most interest thus far has been surrounding ARM devices, but there is also some activity with the Intel Atom. Te primary driver of adding such devices is low power, which is a major concern in efforts to build larger, more powerful systems. Such devices are attractive, says Sumit Gupta, manager of Nvidia’s Tesla high-performance GPU computing business unit, because ‘ARM designs its chips starting out with the assumption that they have zero Watts available.’


SoC now also means ‘server on a chip’ How will it be possible to integrate these chips into an HPC system? One answer comes from Calxeda, whose EnergyCore architecture is based on an ARM core. In this regard, the company is taking the established acronym SoC, which for most people has meant ‘System on a Chip’, and instead redubbing it as ‘Server on a Chip’ because this device contains all the functions, except for memory, needed to work as a server. Te EnergyCore ECX-1000 comes with


up to four ARM Cortex-A9 cores, Neon extensions for multimedia and SIMD processing, an integrated floating-point unit, and 4MB of shared ECC L2 cache drop energy


36 SCIENTIFIC COMPUTING WORLD


consumption by reducing cache misses. Its server-class I/O controllers support standard interfaces such as SATA and PCI Express, and each SoC contains five fabric links that operate between 1 and 10 Gbps per channel and support a variety of fabric topologies. With node-to-node latency under 200ns, network round-trip times are considerably faster than a traditional top-of-rack switch. To make it easier to integrate EnergyCore


products, Calxeda sells the Quad-Node Reference Design. It holds 16 cores on four EnergyCore SoCs integrated through a network fabric that collectively forms a complete cluster that can be easily expanded with additional cards. Leveraging the integrated I/O capabilities


USERS WILL HAVE TO EXAMINE THEIR APPLICATIONS CLOSELY WHEN MAKING A CHOICE CONCERNING CPUS AND ACCELERATORS


within each processor, the reference card exposes four (of five possible) SATA ports per SoC. Tere is also a slot dedicated per SoC to enable diskless system designs by booting from a microSD memory card. According to Calxeda, EnergyCore is an


architecture intended to dramatically cut power and space requirements for hyperscale computing environments such as scalable analytics, web serving, media streaming, infrastructure and cloud storage. One company very enthusiastic about the


While the SeaMicro SM 10000-XE is based on Intel Atom chips, the company’s new owner, AMD, plans a version with Opteron devices later this year


EnergyCore is Hewlett-Packard, which is using that device in the first development board for its Project Moonshot. Furthermore, notes Ed Turkel, manager, business development, service providers and HPC business, such devices are redefining what a ‘server’ is. We traditionally think of a server as being packaged in a rack or something even larger, but now there’s a 5W server in a chip where each core is independently bootable and can run different soſtware stacks. He adds that Moonshot is not tied exclusively to the ARM and that HP expects to have different low-power processor types; eventually Intel’s Atom.


Purpose-built servers How heterogeneous server configurations using such flexible boards will be set up will vary with the workload. Some scientific applications, for instance, might work best on a system with just a few CPUs to handle housekeeping, but where the ‘heavy liſting’ is done primarily with GPUs. At the Moonshot launch, the design focus was decidedly not HPC; this implementation is better tailored for memory caching, Hadoop and Java applications. ‘However,’ adds Turkel, ‘when we showed our reference design to our


www.scientific-computing.com


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52