SCW Dec19/Jan20

HIGH PERFORMANCE COMPUTING

”Data centre leaders primarily noted TCO and return on investment as their primary measures of success”

academics, and this is changing with the stance on environmental issues in general, but they don’t see the cost of the electricity,’ commented van Kesteren. ‘They are not billed on it and so historically

that do not support these features and generally provide much less power per watt performance than today’s technologies, van Kesteren argues that there is a real financial incentive to starting over with a more efficient system.‘In this case, maybe they should think about replacing a 200-node system that is 10 years old with something that is maybe 10 times smaller and provides just as much in terms of computing resource,’ said van Kesteren. ‘You can make a reasonable total cost of

ownership (TCO) argument for ripping and replacing that entire old system, in some cases that will actually save money over the next three to five years. Sometimes replacing what you have got is the best option, but I think the least invasive way and the first thing that we would look at with customers is: are they being smart with their scheduling software – are there benefits they can get in terms of reducing the power consumption of idle nodes,’ he continued. While there is always a balance between gauging how comfortable users are with trying something new and how much expertise they have in-house, notes van Kesteren. ‘What we often end up doing is providing a training package for people because there are some schedulers out there that handle power management better than others.’ OCF works with the Slurm scheduler

because it provides ‘a simple but effective power management functionality’ which allows OCF or its customers to trigger a script when it realises there is a node no longer in use. ‘At OCF we have customised that script to power down or put nodes into a dormant state and that works the other way as well when it needs more nodes and it starts to run out then it can be used to spin up nodes in the cloud,’ said van Kesteren. ‘That is the sort of software that we would guide customers towards because of how flexible it is and the expertise that we have with it because we have found that it works in lots of different environments.’ The functionality that allows these scripts come out-of-the-box with Slurm

6 Scientific Computing World December 2019/January 2020

but van Kesteren and his colleagues prefer to customise this functionality to suit an individual customers environment and requirements. ‘There are some default scripts in Slurm, but I think it is best to modify them to an extent so that they fit your environment,’ said van Kesteren.

The wider computing market In December, Super Micro released its second annual ‘Data Centers and the Environment’ report, based on an industry survey of more than 5,000 IT professionals. While this is not focused purely on the HPC market, the findings highlight that energy efficiency is not always a primary focus. Results demonstrated again this year, the

majority of data centre leaders do not fully consider green initiatives for the growing build-out of data centre infrastructures, increasing data centre costs, and impacting the environment. Responses from IT experts in SMBs, large

enterprises, and recognised companies showed that the majority of businesses (86 per cent) do not consider the environmental impact of their facilities as an important factor for their data centres. Data centre leaders primarily noted

TCO and return on investment (ROI) as their primary measures of success, with less than 15 per cent saying that energy efficiency, corporate social responsibility, or environmental impact were key considerations. Some 22 per cent of respondents noted ‘environmental considerations’ were too expensive. The report also found that almost 9 out of

10 data centres are not designed for optimal PUE. It seems that while there are many novel technologies available to datacentre operators most people setting up a new cluster do not see enough ROI for deploying these technologies unless they are at a large scale or they have the benefit of a data centre that is built with the infrastructure to support them. ‘Within HPC you can pretty much split it into academic environments, which is a large part of our customer base, and commercial environments. A lot of

they have been quite unconcerned. They often think about it in terms of “is this rack going to have enough power supplied to it” but not in terms of maximum power budget and at some stage it is just not cost- effective. That is a much more commercial standpoint,’ he continued.’ In the IT industry, in general, they have a power budget and they spend it, but energy efficiency is not particularly high up on their list of priorities.’ If more energy-efficient technologies are

to see widespread adoption, processing technologies, such as Arm or innovative cooling technologies, then they require a cost to implement. For example, switching to GPUs or Arm processors could save a lot of money over the total life cycle but this is offset by the cost of porting existing applications. Similarly, cooling technologies may be more efficient but if it requires a datacentre investment there are diminishing returns on that energy saving. Ultimately it needs to economically viable to be energy- efficient. ‘The first thing is always “can we afford

to buy it” and then after that “can we afford to run it?”,’ said van Kesteren. ‘If you take a processor in isolation then the most energy- efficient processor design tends to be the ones with lots of cores that are fairly low powered. But the issue with that, in addition to maybe your application being single- threaded, is that you also tend to lose out on memory bandwidth per core because you are squeezing a lot of cores into one space. GPUs especially suffer because they have really high bandwidth on-card memory but the bandwidth from those processors to the main memory is quite poor. ‘Although you have all these cores and

they do not use a lot of power you can end up wasting cycles because processors are waiting for information stored on main memory. This is something that has to be taken into consideration when designing a system with lots of energy-efficient cores. It may not always be the most energy-efficient solution from a holistic standpoint when you take into account the kind of memory utilisation profile of the application you are running,’ van Kesteren concluded.

@scwmagazine | www.scientific-computing.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28

orderForm.title