High-Performance Computing 2019-20

High-Performance Computing 2019-20

HPC Yearbook 19/20

Center for Computational Science gave a presentation on the Post K computer and the new processor developed in conjunction with Arm and Fujitsu. ‘Compared to other processors that are

not HPC-optimised but optimised for things like web workloads, this is a processor that is totally HPC optimised. It has 1TB/s memory bandwidth, it has an on-die network which is essentially a 400GB network integrated onto the die,’ stated Matsuoka. ‘It is similar to Cascade Lake in terms of

FLOPs but it has a much higher memory bandwidth and it also has various AI supports such as FP16 and FP8 but it is still a general purpose CPU it is not a GPU,’ Matsuoka added. ‘It runs RedHat, for example, right out of the box, it runs windows too.’ Matsuoka also stressed that the design

is intended to be very energy efficient. As it is based on the Arm chip design this is to be expected but Matsuokastated that ‘In some of the benchmarks, we have seen an order of magnitude improvement of per watt performance on things like CFD applications on a real chip - this is not simulated.’ Tis could be potentially huge in terms of sustained performance for many real-HPC

It is similar to Cascade

Lake in terms of FLOPs but it has a much higher memory bandwidth and it also has various AI supports such as FP16 and FP8 but it is still a general-purpose CPU it is not a GPU

presented details on hybrid chip packaging technology, Intel Optane DC persistent memory and chiplet technology for optical I/O.

Arm has a number of CPUs that are being

developed in conjunction with its partners such as the Cavium TunderX 2, which has already been announced in systems such as Sandia National Laboratories ‘Astra’ system built by Hewlett Packard Enterprise (HPE). Arm and Fujitsu are also developing

a processor for the RIKEN ‘Post K’ supercomputer, currently known as the Arm64fx. At the HPC user forum in April this year Satoshi Matsuoka, head of the Riken

applications. Although we will have to wait a bit longer to see how this performs in the final post K system once it has been completed. ‘We take this chip and we build the largest

supercomputer in the world. I cannot disclose the exact number of nodes yet but it will be the largest machine ever built with more than a 150,000 nodes,’ comments Matsuoka. ‘What is important is not so much the FLOPS. We all know that for real HPC applications it is the bandwidth that counts. Te machine has a theoretical bandwith of more than 150 petabytes per second bandwidth which is about an order of magnitude bigger than any other machine today.’ Rather than focus on purely double-precision flops, the Post-K system will use the Arm64fx

processor and the Tofu-D network to sustain extreme bandwidth on real applications such as seismic wave propagation and CFD, as well as structural codes. Post-K is expected to deliver more than 100 times the performance of the previous system for some key applications. However, the system will also include big data and AI/ML infrastructure.

The push towards AI

Intel’s announcement of the Nervana systems suggests that the company is pushing to capitalise on the huge growth in AI. Te company announced two Nervana systems NNP-T (Neural Network Processor) for training networks and NNP-I for inferencing. Intel Nervana NNP-T is built to deep

learning models at scale prioritising two key real-world considerations: training a network as fast as possible and doing it within a given power budget. Te chip named ‘Spring Crest’ provides 24 tensor processors arranged in a grid with a core frequency of up to 1.1GHz and 4 x 8GB of HBM2-2400 memory and 60MB of distributed on-die memory. Intel also claimed that to account

for future deep learning needs, the Intel Nervana NNP-T is built with flexibility and programmability so it can be tailored to accelerate a wide variety of workloads – both existing ones today and new ones that will emerge in the future. Te Intel Nervana NNP-I is built

specifically for inference market and aims to introduce deep learning acceleration, leveraging Intel’s 10nm process technology with Ice Lake cores to deliver high power per watt for data centre AI inferencing workloads. Te chip named ‘Spring Hill’ is much

smaller in power usage than ‘Spring Crest’ at an estimated 10-50 watts as opposed to the 150-250 watt power envelope of Spring Crest. Te NNP-I chip provides on-die SRAM using Intel’s 10nm process technology featuring dual-core processors and 12 Inference Compute Engine (ICE) which provides high bandwidth memory access, a programmable vector processor and large internal SRAMs for power. In a blog post leading up to the Hot Chips

conference, Naveen Rao, vice president and general manager, Artificial Intelligence Products Group at Intel commented on the need to specialise architectural development to suit AI workloads. ‘Data centres and the cloud need to

have access to performant and scalable general-purpose computing and specialised acceleration for complex AI applications. In this future vision of AI everywhere, a holistic approach is needed—from hardware to


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32