high-performance computing Optimising state-funded HPC
In the first of a series of profiles on HPC centres,
Robert Roe talks to Dr David Young from the Alabama
Supercomputer Authority T
he main production system at the Alabama Supercomputer is a 2,216- core system with 16 terabytes of distributed memory called the Dense
Memory Cluster (DMC). Each compute node in the DMC cluster has a local disc of up to 1.9 terabytes but the system is also connected to a 45 terabyte GPFS storage cluster, which is accessible from each node. Te CPUs are drawn from several generations
of processor technology; of the 176 nodes that make up the DMC cluster, 96 nodes have 2.26 GHz Intel quad-core Nehalem processors and 24 gigabytes of memory, while 40 nodes have 2.3 GHz AMD 8-core Opteron Magny-Cours processors and 128 gigabytes of memory. Te final 40 nodes have 2.5 GHz Intel 10-core Xeon Ivy Bridge processors and 128 gigabytes of memory.
State funded versus federally funded HPC Dr David Young, HPC specialist for the Alabama Supercomputer Authority, explained why the DMC cluster is comprised of this amalgamation of HPC technology: ‘Te Alabama Supercomputer Authority (ASA) has taken to having an annual upgrade process. If the money is going to come in year by year, then go out and buy whatever is the best value for your money.’ Tere is an obvious trade-off between the
overheads associated with mapping which sections of the system will carry out certain applications, but Young explained that this yearly upgrade cycle allows the centre to make the most of its yearly investments – buying the most efficient hardware available at that time. In a perfect world it would be advantageous
to run applications across a system that uses only one generation of processor, but realistically that is just not possible for many production HPC centres. In the case of the ASA, which is a state-funded HPC centre, it receives funding in a yearly cycle, as opposed to the federally funded HPC centres, which generally
www.scientific-computing.com l
get larger amounts but on a less frequent cycle. Young explained that the effort the ASA experts will exert to recompile codes depends on how heavily an application will be used. ‘For a lightly used application, we normally compile it for the oldest processor and then it will not run quite as well on the newest. For our heavily used applications, we do the extra work to compile for multiple generations.’
Delivering scientific progress Providing HPC and computing services to the academic community of Alabama requires a measured approach to optimising application performance. Young said: ‘Te heaviest usage on our system has traditionally been computational
IF THE MONEY IS
GOING TO COME IN YEAR BY YEAR THEN GO OUT AND BUY WHATEVER IS THE BEST VALUE FOR YOUR MONEY
chemistry, but recently we have seen a lot more bioinformatics – crunching through DNA sequencing.’ Te user base also includes physics, material science and molecular dynamics, in addition to more unusual areas of research including political science, agriculture and music theory. ‘Tere are an awful lot of scientific
applications that only run efficiently up to eight cores, so I tend to have a beefy big memory
@scwmagazine
node with the soſtware that exists,’ said Young. Tis variety of users means that the ASA must
optimise its HPC hardware around a wide range of applications. Deciding how to spend annual funding also requires a considered approach, as Young explains: ‘We are driven a lot by faculty requests. Once a year I go out to all research campuses; I am collecting information on what they think I should do with the next round of upgrade money.’
Evaluating accelerator technology While the current DMC cluster is primarily a CPU-based, large memory cluster, the centre does have some nodes equipped with GPUs. Young said the GPUs were originally bought as a test bed. One possible upgrade path that Young discussed was to purchase the next generation of Xeon Phi, known as ‘Knights landing’. Young explained that the next generation
of Knights Landing will come in two flavours: one, a typical co-processor, and the second a bootable system that is effectively a highly parallelised CPU: ‘We are looking at the bootable one because it takes away all of those headaches of it being a coprocessor; you do not have to offload work to it and write your code in a specific way.’ Young concluded: ‘When I went from SSE to
AVX – from 128 to 256 bit vectorisation – I recompiled around a dozen codes that took up our biggest usage, and in many cases I got a clean 2x speedup out of the re-compile. If the benchmarks are good, then we will probably go that way, just to get more from our investment.’ l
APRIL/MAY 2016 15
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36