SCW Summer 21

orderForm.title

orderForm.productCode

orderForm.description

orderForm.quantity

orderForm.itemPrice

orderForm.price

orderForm.totalPrice

orderForm.deliveryDetails.name

orderForm.deliveryDetails.accountNumber

orderForm.deliveryDetails.phone

orderForm.deliveryDetails.poNumber

orderForm.deliveryDetails.email

orderForm.deliveryDetails.companyName

orderForm.deliveryDetails.billingAddress

orderForm.deliveryDetails.deliveryAddress

orderForm.deliveryDetails.deliveryDetailsDeliveryAddressSameAsBillingAddress

orderForm.deliveryDetails.address1

orderForm.deliveryDetails.address2

orderForm.deliveryDetails.city

orderForm.deliveryDetails.state

orderForm.deliveryDetails.postCode

orderForm.deliveryDetails.country

orderForm.deliveryDetails.additionalInformation

orderForm.noItems

HIGH PERFORMANCE COMPUTING

“The accelerator node architectures that the United States has been focused on is changing from being an Nvidia only ecosystem to an ecosystem that has a wider variety of GPUs”

‘The sheer bandwidth of the machine is six petabytes of network injection, which is 10 times the magnitude of all the data centres internal traffic according to Cisco,’ Matsuoka continued. ‘System architecture wise it is the world’s first ultra-scale disaggregated architecture. The cores, memory and everything can act independently. For example, any memory from any part of the system can be injected into the L2 cache of any processor without any other processor intervention.’

Define exascale Exascale has been a long sought after goal for HPC because it represents the next order of magnitude of performance in supercomputers since the first petaflop

www.scientific-computing.com | @scwmagazine

systems were announced more than 10 years ago. While there have been significant and wide ranging advances in technology since that time, the elusive exaflop has been out of reach, However since that time the types of computation conducted on these systems has also changed. Ten years ago, before the advent of AI and ML technologies, FP64 or double precision was a ubiquitous standard for many HPC applications but increasingly single precision FP32 or half precision FP16 is used, particularly for AI. This has meant that some of these systems can deliver an exaflop of reduced precision performance in certain applications – even if the FP64 figure has not yet reached an exaflop. ‘People sometimes contest

us when we say that we offer the first exascale supercomputer,’ notes Matsuoka. ‘But what do we mean by exascale? Well, there are several definitions. If you think exascale is the FP64 performance then an exaflop would be represented by the peak performance or achieved LINPACK performance and of course for Fugaku this is not the case. It’s RMAX max is 0.44 exaflops.’ Matsuoka continued: ‘However, very few applications correlate with IP 64 Absolute dense matrix linear algebra performance in this context. So actually this may not be a valid definition when you think about the capability of a supercomputer.’ ‘The second possible definition is any floating point precision performance that is bigger than an exaflop or a metric from some credible application. In that respect, Fugaku is an exaflop machine, because, for example, in HPL we achieved two exaflops. However, OAK Ridge National Labs Summit machine has achieved two exaflops in the Gordon Bell-winning applications. So, although Fugaku is an exascale machine by this definition, it was not first,’ added Matsuoka. ‘I think the most important definition when we started thinking about these exascale machines was to achieve almost two orders of magnitude speed up as compared to the current, what was the current state of the art in 2011/2012 timeframe when we had 10 to 20 petaflop supercomputers. ‘As I have demonstrated, Fugaku is

about seven times faster across the applications than the K computer, which was an 11-petaflop RMAX machine. And because of the “application first” nature of the machine we believe this is the most important metric. We have achieved a two orders of magnitude speed up over our last generation machine which you would call a 10- to 20-petaflop machine. Being application first, this was the most important and in this context we have achieved what was expected out of the exascale machine,’ Matsuoka concluded. In a presentation from the recent ISC high performance conference, Lori Diachin

Summer 2021 Scientific Computing World 17

Thannaree Deepul/shutterstock

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42