Hpc

HPC YEARBOOK 2021/22 >

Dr Sergi Girona, operations director and CIO of the Barcelona Supercomputing Center: Diversity of hardware is very challenging but it’s also very important because then you can really select, diversify and customise your system to the applications. You can focus. We need scientists to conduct science and we need researchers evaluating different hardware components, or evaluating the best computing models and how they can be applied to different algorithms. We need to learn how to make

sure that the software stack is stable, the programming environments are maintained, the tools are maintained. At the same time, we are learning that we have to kill old applications because they need to move to new environments that are stable. Diversity is always good. If you have access to only one system, you have a problem. If you have access to diverse systems you have a choice.

Are there any downsides to increasing diversity?

Simon McIntosh-Smith, professor in high performance computing at the University of Bristol: I think it’s widely accepted that everything is much more diverse now than in the past. There are many more options for different vendors for CPUs, there’s now even multiple vendors for GPUs. On top of that, we’ve even got new kinds of technologies and architectures appearing, especially if you’re into things like artificial intelligence or machine learning. So on the one hand, that’s a great

opportunity, it might mean that there’s now new products, new hardware, new technology that can solve your problem for you better than ever before. But also that creates more uncertainty, and an awful lot more analysis that has to be done by almost everyone. Now you’re trying to say: ‘Okay, I

have this particular problem. What’s the best way to solve that problem?’ In the past the answer was: ‘It’s an x86 Linux cluster and maybe some GPUs from one vendor.’ Today could be almost anything, how do you know what’s going to be the best solution for what you need to do? When you’re representing not just one problem but actually a whole set of people with a whole set of problems. What’s the best solution? That becomes a really hard challenge.

6

Kalyan Kumaran: I agree that this can potentially create a fractured ecosystem. We don’t want the scientists to suffer but from a facilities point of view, we also don’t want any vendor lock-in. So the best thing we can do is to encourage our users to look at solutions that are standard and are open and are not vendor proprietary. One of the things we want to do

with Aurora, our upcoming exascale system, is to support SYCL. That is the major programming model, it’s an open standard and we are very involved in the Kronos standard body in developing SYCL or at least defining the standard. There are people who come from

other environments they could come from any AMD GPU environment. They could come from an Nvidia GPU and environment. We would like to convince our users from there to convert their code to SYCL. But if they are hesitant to do that because they’ve spent a lot of time developing the application then we provide solutions where they can run their application on the current platform. We have partnerships with NERSC and another facility that has Nvidia GPUs in their production systems. We have collaborations with Oakridge, that support AMD GPUs for their upcoming platforms.

The other thing we also do is

to partner with Sandia National Laboratory, the organisation that develops the Kokkos library. This is considered to be very portable because they port their library using the underlying native programming models available on the different architectures. They do the dirty work, they are the performance engineers, for scientists, the scientists can make use of that and focus on their research and algorithm development.

How do you evaluate new systems and technologies for HPC?

Simon McIntosh-Smith: Usability is really key. One of the very interesting things when you’re looking at some of the new architectures, and especially with things like GPUs, how easy is it to actually really get your code working on things and get it working well. And then also getting it working in a performance portable manner. How much effort it took to get you to that stage, how usable it was, how

Sergi Girona: So for me, it’s an analysis of total cost, which is not only including performance but also reporting costs and understanding what the additional costs may be. It’s important to analyse the learning curve when you are doing a full evaluation of a new system. You really need to understand the failures, the good things, and then you can get this for the next system you’re building. You can influence these topics and this is really important. That’s the reason we are doing this kind of operation.

The full discussion with the three

interviewees was featured as part of a webcast by our parent publication, Scientific Computing World. The webcast focuses on not just hardware diversity but also the challenges that emerge for scientists, how HPC centres evaluate new hardware and resources being created to help scientists make use of next-generation HPC systems. View the webcast at www.scientific- computing.com l

www.scientific-computing.com

sure they are using the latest open-source

programming models

“

much work you have to put in. Did you just recompile the code? Did you have to port the code and then optimise it for months and months? All of those things are really important. So there’s quite a lot of different considerations when you’re looking at what’s going to be the best solution for the set of target codes I really care about for my users.

The most important thing for us when it comes to supporting our users or supporting our scientists is to make

“

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42

orderForm.title