SCW Feb/Mar 2020

HIGH PERFORMANCE COMPUTING As HPC users face the challenges of

exascale computing, one of the biggest stumbling blocks is trying to fit these colossal supercomputers to a power budget that is realistic and sustainable. So far targets have been placed at around 20MW, but this is still seen as ambitious using today’s technology. The Legato project aims to deliver an

order of magnitude energy efficiency increase by combining the hardware and software developments alongside the programming framework which enables the combination of CPU, GPU and FPGA resources.

Commenting on the 20MW target, Unsal

notes that ‘currently we are nowhere close to this target’, and to meet this goal software and hardware manufacturers must work together, as it will require all available optimisations ‘that we can manage to throw at this problem’. Lowering the voltage without reducing clock speed is one of the avenues of research for Legato. The researchers hope to go beyond what is possible with methods such as dynamic frequency and voltage scaling (DVFS), which has been used in the past. ‘DVFS worked quite well for sometime, but since we are now operating at voltages close to physical limits, the gains that could be possible from this more conservative approach is nearing its limits. ‘Another thing we want to do is select the most energy efficient hardware match for the application. Sometimes an application could be best if it was run on a CPU, other applications would run better on a GPU and others still on FPGAs. ‘It is important to complete a complex optimisation process, where you have those applications and you want to steer them to the most energy efficient hardware that you have at hand,’ added Unsal.

A place for FPGAs It has been generally accepted that at least the first generation of exascale systems will make use of GPU technology. Evidence for this can be found in the

pre-exascale systems developed for the US Department of Energy and, of the top 10 systems in the Top500, six are currently using GPUs with the top two positions taken by the DOE systems both using GPU technology. However, Unsal argues that the rise in

the use of AI for HPC systems means that FPGA technology can be used effectively. Particularly for applications such as fixed point precision, integer or DSP-type applications mentioned earlier. ‘There are emerging neural network

www.scientific-computing.com | @scwmagazine

applications that require a combination of training together with inference. For these applications it makes sense to run some part of the application – the training part on the GPU and another part on the FPGA,’ said Unsal. Instead of running these applications on GPU resources, offloading inferencing portions to FPGAs could help to improve energy efficiency of the system.

The beauty of the programming model The use of FPGAs in HPC has been evaluated before. The two main criticisms have been about the complexity of programming applications and the lack of floating point performance, making the chips unsuitable for many traditional HPC workloads. Most HPC users are not experienced with hardware description languages used for FPGA application development, making it much harder to use the technology without the use of a high-level programming language such as OpenCL. However, Legato researchers may have

come up with a solution in the form of the ‘write once, run anywhere’ software paradigm. The runtime system uses

‘How can we combine an efficient programming framework for FPGAs and have energy efficiency and also reliability and security?’

hardware performance counters used in modern processors to see how much energy is dissipated through the use of a given application. ‘You run the application on the hardware

platform, get the feedback about how much energy you are dissipating, and then it is the runtime’s responsibility to decide what are the optimal resources at the time, based on this closed loop you get from the system. However, using this type of system requires that code can be run on any of these hardware platforms without sacrificing performance, which would affect energy efficiency. ‘The problem that we are facing is that if

you want to have your application run on a heterogeneous computing platform, you have one version of your application for each hardware technology CPU, GPU and FPGA,’ said Unsal. ‘From a programmability point of view, that is not the best. We want to provide a programming model that makes it easy to write the application once, with some

hints that this application may benefit from a GPU or an FPGA if there is one available. The runtime system looks to see if the resources are available, then sends the task to the GPU and FPGA. You do not need to write a special version of your code for your application for these devices,’ added Unsal. There have been many debates about

the use of high-level programming models, especially when it comes to the use of FPGAs. There is a belief that if you program with hardware description languages such as VHDL or Verilog, you would be more efficient. However, Unsal argues that this is similar to the discussions on the CPU side; that if you write in assembly, it will be more efficient than writing in a high-level language such as Fortran or C. While Unsal notes that it is important for

application developers to ‘develop with their application and algorithm in mind’, he argues that it should not require the kind of coding used in the past. ‘It is reminiscent of the times early on in the General purpose GPU (GPGPU) discussion, that it was very difficult to run things on the GPU because you would need to write explicitly for the GPU. And there was no support for that until CUDA and the others came along.’ ‘We are at a similar inflection point

for FPGAs. Maybe it is more difficult for FPGAs because they are much closer to hardware than the GPUs would be, but we are going through a similar discussion. I would say there are good software development frameworks for FPGAs. This is one of the aims of the project, said Unsal. ‘How can we combine an efficient programming framework for FPGAs and have energy efficiency and also reliability and security?’ The Legato project will finish its activities in December this year, but the team has already developed the framework, and is now trying to demonstrate use cases for the runtime system and highlighting their work to the HPC community. ‘It is always difficult to influence the

general programming community in, let’s say, a new programming paradigm. But we want to show through our research that perhaps it is time now for hardware and software to work together, to ensure one level of energy efficiency gain for exascale and for data-centre efficiency. ‘We want to tell them we have made it much easier for you to run on these technologies. Why not give it a spin? Without having to change your application a lot, you can get much better energy savings,’ concluded Unsal.

February/March 2020 Scientific Computing World 9

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28

orderForm.title