HPC_YEARBOOK

HPC 2013-14 | HPC trends

Phi, and 2,688 cores in the case of Nvidia’s Tesla K20X). While their architectures are very different, they both deliver a peak double precision performance in excess of one teraflop per chip, which is quite stunning. Te first compute accelerator I worked on had a peak performance of just 12 megaflops, and was the size of a fridge.

What does this mean for programs? Te great performance achievements that multicore processors and compute accelerators deliver come at a cost, and that is in ease of programmability and portability. Gone are the days when you coded your algorithm in the language of your choice, compiled it with a few optimisation flags, and sat back to watch it run on your single-processor HPC system. Te two issues introduced in the previous section that have a major impact on how you code your applications are massive parallelism (from both multicore processors and the fact that all HPC systems these days are clusters of servers), and heterogeneity. Tese have a fundamental impact on application design, and introduce two problems. First, there is not yet industry- wide agreement on a standard programming methodology that can (with minor tuning) enable an application to be built that will run efficiently on several different flavours of accelerated multicore systems. [See my analysis and opinion piece published at www.scientific- computing.com by Scientific Computing World on 19 August 2013 for more detail on this issue.] Te second problem is that many application developers are domain experts, and do not have a PhD in parallel processing. As systems become more and more complex, the pool of talent available that can program them effectively is shrinking.

What does this mean for server and cluster design? Te end result of HPC is insight through number-crunching. Te number-crunching is oſten the easy part, with the more complex part being to make sure that the data is in the right place at the right time so that it can be ‘crunched’. A typical example of an HPC system today is a cluster of servers, each of which has two multicore processors and a compute accelerator. So a program has to ensure that data is on the right node, in the right processor core or compute accelerator when it is required. And that impacts not only the design of the program, but also data storage, the network connecting the cluster nodes and the different components of the memory hierarchy within

6

a server. As the volume of data that can be processed increases, the amount of time and energy spent moving data around also increases, so the level of component integration within a server becomes more important.

What does this mean for networking and storage? InfiniBand and Ethernet networks have similar market share of high-end HPC systems. Although Ethernet’s bandwidth is improving, InfiniBand still delivers data with lower latency, a factor that can be important to the scalability of applications on large, parallel systems. Compute accelerators put a higher stress on networking, so ideas like Mellanox’s GPUDirect RDMA technology in its InfiniBand offering (which supports Nvidia GPUs) are important.

“Although this technology is oſten fascinating, even sexy, it is not the most important issue that the industry needs to address in order to ensure the success of HPC. Tat issue is applications”

Storage has two components, the storage

itself and the parallel file system that manages the data. A parallel file system is required as a serial file system is too much of a bottleneck for scalable parallel systems. Te leading parallel file system used in HPC is the Open Source Lustre file system, with IBM’s GPFS and PanFS from Panasas also featuring. pNFS, the parallel version of the ubiquitous NFS file system, helps with smaller HPC configurations, but lacks the scalability of high-end parallel file systems. Cray supports our contention that big data

is important to HPC by structuring itself in three divisions. One covers supercomputing, the second is storage (its high-end Sonexian storage is an OEM version of Xyratex’s ClusterStor) and the third is YarcData, which specialises in big data analytics. (In case you haven’t noticed, Yarc is Cray backwards).

Integration, is it good or bad? As mentioned above we now have compute accelerators the size of a book that deliver a peak performance in excess of one teraflop/s. Te first system to achieve this performance level was ASCI Red in 1996, a system based on the Paragon supercomputer designed by Intel Supercomputing Systems. Ironically, this division had been closed before the system

was delivered, marking the apparent end to Intel’s foray into the world of supercomputing. However, in recent years Intel has been developing some key HPC technologies and buying up companies (or divisions of companies) that deal in technology that is important to HPC, such as Cilk Arts, Pallas and RapidMind (parallel soſtware tools), Qlogic and Cray (high-performance networking) and Whamcloud (parallel file systems). Does this signal a future return for Intel as

a supercomputer vendor, or is it recognition that integration both at a component level and between the hardware and soſtware are required to deliver efficient applications as HPC systems become more and more complex? And is this tighter integration a good thing for HPC users, or will it limit choice – and therefore competition – if you can buy a supercomputer in any colour, as long as it’s Intel blue? Te requirement for tighter integration between hardware and soſtware is also recognised by other vendors, with Nvidia’s recent acquisition of compiler specialist the Portland Group (PGI) begging the question about its on-going support for Intel’s Xeon Phi. IBM is another major player in the HPC market that offers a high degree of integration through its range of hardware platforms (BlueGene, POWER and X Series) and soſtware components (compilers, middleware from its acquisition of Platform Computing and the GPFS parallel filesystem) that cover all of the bases for a potential customer.

Conclusions Te headlines in HPC are oſten about leading- edge technology. Although this technology is oſten fascinating, even sexy, it is not the most important issue that the industry needs to address in order to ensure the success of HPC. Tat issue is applications. In a world of massive parallelism and heterogeneous systems, ISVs will only invest in redesigning applications to exploit new innovations in HPC if the tools are available, standards are in place to ensure portability from one system to the next, and their staff have the skills required to tackle the task. Te HPC industry needs to ensure that these three boxes can be ticked if the trending topic of HPC through 2014 is to be a celebration of the value that HPC is delivering, rather than questions about why using HPC is so difficult. l

With more than 30 years of experience in the IT industry, initially writing compilers and development tools for HPC platforms, John Barr is an independent HPC industry analyst

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36