SCW_APRMAY14

high-performance computing As stated at the beginning of this article,

reducing the power consumed by HPC data centres has three components – building efficient data centres that house efficiently cooled HPC systems that have themselves been designed to consume less power. Te first two points are already being well addressed by the industry, and there is room for few additional savings if a data centre has a PUE of close to one and deploys warm water cooling, especially if some of the heat generated is captured. But the elephant in the room is the power consumed by the HPC systems themselves. Unless the power consumption can be reduced significantly, data centres will neither be able to cool nor afford to power future HPC systems. What is all of this electrical power used

direct liquid cooling taking the coolant to the source of hot spots, indirect cooling where the components are air cooled but the air then passes through a liquid-cooled door. Depending on the ambient temperature, the resultant water can then be cooled by chillers or a cooling tower. If there are several systems that are all water cooled, that may seem to suggest a standard approach to cooling, but different equipment from different manufacturers seldom comes with the same recommended water temperature. All of which suggests that HPC data centres of the future must have lots of capacity, and lots of flexibility in order to support a wide range of requirements.

Efficient systems It can cost as much to power and cool an HPC system as it costs to buy it in the first place, and (without significant efficiency improvements) the first exascale system will consume hundreds of MW of power, something that is not sustainable. So a number of things need to be considered. First, the power consumption of individual components should decrease (for example, by using low-power ARM cores developed for the mobile market). Tis implies that we must use many more of them, adding to the level of parallelism and therefore programming complexity. Second, the use of accelerators, such as Nvidia GPUs and Intel’s Many Integrated Core (MIC) Xeon Phi devices, can deliver higher compute performance in a lower power envelope. Tird, alternative architectures such as IBM’s BlueGene can deliver excellent compute performance more efficiently than clusters of standard servers. When considering HPC data centres it is

18 SCIENTIFIC COMPUTING WORLD

IT CAN COST AS MUCH TO POWER AND COOL AN HPC SYSTEM AS IT COSTS TO BUY IT IN THE FIRST PLACE

easy to focus on the facility and the systems, but the systems soſtware (e.g. energy- efficient job scheduling) and the efficiency of application soſtware also need to be considered. If applications are not optimised for the target architecture, it doesn’t matter how efficient the data centre is – it will be wasting energy. Te aim should be to minimise the energy required for the application to generate an answer.

The need for energy-efficient programming Without a doubt, the biggest issue confronting HPC data centres today is power consumption. Te use of alternative technologies such as accelerators and efficient cooling strategies using hot water and free cooling can reduce overall power consumption, and increased efficiency has been demonstrated by PUE figures that have improved from around 2 to much closer to 1 in recent years. While these steps are positive, they miss the point. Te HPC industry has been talking for some years about building exascale systems by the end of this decade. Such a system built from today’s technology would require around 100 million cores and would draw more than 500 MW. Te target maximum power consumption for an exascale system is 20 MW, so staggering improvements are required if the industry is to get close to its exascale target.

for? Ironically, very little of it is used to drive computation. Most of it is used to move data from one place to another. So instead of building better data centres perhaps the industry should focus on building a very different style of HPC machines. Adding power-efficient accelerators or using liquid cooling rather than air cooling are refinements of existing technologies, but are not the game changers required to make exascale systems and their data centres a reality. Tere is a need to reduce the power used

moving so much data. How can that be achieved? Step one is to build more power- efficient components for handling data, but that will bring only small wins. Step two is to design systems with a much higher degree of integration, so that data movement can be minimised. But to bridge the gap between 500 MW and 20 MW we need to do things very differently. Te current algorithms, programming models, and technology roadmaps won’t get close to where we need to be, so we must aggressively explore different approaches, different algorithms, different programming models and different technologies. It feels as if the HPC industry is sleep-

walking its way towards the failure to deliver affordable, usable exascale systems within the target timescale of the end of the decade. Addressing issues relating to data centres in support of the next generation of HPC systems without radically changing the way we build and program them may be no more valuable than rearranging the lifeboats on the Titanic.

With more than 30 years’ experience in the IT industry, initially writing compilers and development tools for HPC platforms, John Barr is an independent HPC industry analyst specialising in technology transitions.

@scwmagazine l www.scientific-computing.com

Oliver Sved/Shutterstock.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40