HPC_YEARBOOK

HPC 2017-18 | High-performance computing

is a niche too small to base major product decisions on for the big guys.’ With $2 million in funding they

have already developed a new processor architecture, created silicon chip units and the initial soſtware. In comparison, the typical cost for a 28-nanometre node process for traditional semiconductor companies ranges from between $30 to $250 million. ‘A number of start-ups in the past decade

raised tens of millions of dollars without ever producing a working chip,’ said Sohmers. ‘While those big companies may look at posits as a risky proposition, we see opportunity in being the first to offer innovative solutions to those early adopters.’ To date, the Neo general-purpose float-

based processor is achieving 128 single precision and 64 double precision gigaflops per watt in tests. ‘In comparison, that’s more than double what you see at the top of the latest energy-efficient Green500 supercomputer list,’ said Gustafson. According to Sohmers, the latest Intel

‘Knights Landing’ Xeon Phi chip, made on a 14 nanometre process, has a theoretical peak performance of about 10 double precision Gflops per watt. Te current theoretical peak performance

on their Neo chip is better. On an older 28 nanometre process, the Neo performs 32 double precision Gflops per watt, 26 Gflops per watt for a DGEMM benchmark (designed to measure the sustained floating- point computational rates of a single node) and 25 Gflops per watt for a FFT – a very communication intensive function. Tis will widen advantages over x86-type processors for 64-bit operations. ‘We are showing three to 25 times better

energy efficiency while we have a huge (us on 28 nanometres versus Intel’s 14 nanometres) process technology disadvantage. For our production chip, which is roughly on a par with Intel’s 14 nanometre, our numbers would just about double,’ said Sohmers. Based on these conservative estimates, a

32-bit REX-type design processor, based on posits, instead of a 64-bit processor based on floats, could achieve 60 billion real-world operations per second per watt. Scaled up, this is the energy-efficient

exascale computer that Sohmers and Gustafson envision. ‘With a 20-megawatt power budget, yes, you’re definitely beyond exascale at that point,’ said Gustafson. However, as mentioned, some in the

community have their doubts, which is based on the current dominance of the larger players

24

Te established companies won’t liſt a finger until they see their market share threatened by an upstart; and sometimes, not even then

Large and established chip manufacturers

are still squeezing as much out of CMOS technology by investing in Fin FET (fin field- effect) and seven-nanometre scale transistors. ‘Te established companies won’t liſt

a finger until they see their market share threatened by an upstart; and sometimes, not even then. With the belief that these initially risky ideas will gain more mainstream adoption once they are proven out as being viable... it would only be at that time that the rest of the industry would be practically forced to change.’ Te innovation that REX Computing is

making is by taking a lot of unnecessarily and complex logic out of their hardware design for their processor through the use of ‘scratchpads’. Tey have written unique code that gives exact latency guarantees for all operations and

in the semiconductor industry. ‘Tat may have been possible in the late 1980s when the industry moved to the IEEE floating point standard, but at that time, the market was much smaller and floating point arithmetic was indeed faulty and counterproductive for soſtware development,’ said Simon. But, according to Gustafson, IEEE 754

floats are obsolete: it’s just that the world doesn’t know it yet. ‘Te small companies have early-mover advantage and the big companies have amazing resources to apply, but are always conservative. Tat’s where the revolutionary fun is – and always has been. Very much like the disruption of parallel computing in the 1980s,’ said Gustafson.

memory access, allowing a compiler to be able to handle all of the memory management just within soſtware, not hardware. ‘While it sounds simple and obvious, the

actual algorithms and compilation techniques we are using are very unique, and up until us doing it, many said it would be impossible,’ said Sohmers. In regards to REX Computing’s IEEE-

float-compliant Neo processor, they have had evaluation units in use by early customers since May 2017. And they are planning on sampling 16 nanometre-scale chip units in spring of 2018, with larger volume availability in the last quarter of 2018. Sohmers said, ‘Depending on our results

with the posit project, we expect to have evaluation units available for a variant of our processor replacing the IEEE float unit available in spring 2018.’ Based on their current posit-based

simulations, they are very confident that they will they exceed 60 Gflops per watt with their first production chip next year, which has one potential ‘peta-scale’ supercomputer installation in the pipeline for 2019. Tis shows the potential for a reasonably priced exascale supercomputer by 2020 using Neo chips. Back in the late 1990s, quips Gustafson, the

goal was a ‘tera-ops’ machine, staying clear of flops and Linpack. But it wasn’t long before the supercomputing community said, ‘Yeah, yeah, sure. So does it get a Tflop on Linpack?’ Tis cycle repeated itself in the 2000s

with the peta-scale computing goal: Pflops became the flavour of the decade. Exascale will probably reach the same fate with the first questions being about Eflops. ‘It’s just too much fun to plot trend lines for a benchmark that is older than dirt,’ said Gustafson, who is still undaunted of the potential for posits. ‘I did a quick scan of my email and found

40 entities working on making posit arithmetic real at the hardware level. Most are start-up companies, but also national laboratories, universities and companies like IBM, Intel, Qualcomm, Samsung, Google, Microsoſt and Nvidia. ‘Mostly, the feedback I’ve gotten: When

can I have it? I want it now! Frankly, I’d be surprised if people are still using IEEE 754 floating point in 2027.’ In the supercomputing chip race, perhaps

the surprise will come from a smaller country or start-up that will develop paradigm-shiſting solutions first, and drag the race to a new path. Nonetheless, the past has shown that for any big idea, it takes time for change; the clock is ticking. l

Smahok/Shutterstock.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32

orderForm.title