SCW Dec19/Jan20

HIGH PERFORMANCE COMPUTING

applications

”These features, and the tight integration of MLP blocks with embedded memory blocks eliminate the traditional delays associated with FPGA routing”

bandwidth memory interfaces. With each of the GDDR6 memory controller capable of supporting 512 Gbps of bandwidth, the card uses up to 8 GDDR6 controllers in each device so it can support an aggregate GDDR6 bandwidth of 4 Tbps. Mensor also addressed the concerns of

to the Molex organisation we can take this product to a global customer base and we have got the global infrastructure and resources of Molex to let us qualify, validate and support the complete lifecycle for customers that are deploying high volumes of FPGA products.’ Petrie’s comments highlight the

company’s plans for this card to compete in the high-volume, high-end markets such as HPC and the backing of the parent company Molex could help to further justify supporting this new technology as it will be supported by a well-established company with a global customer base. Today’s high bandwidth applications

can easily overwhelm the routing capacity of a conventional FPGA’s bit-oriented programmable-interconnect fabric but the Speedster7t architecture uses a high- bandwidth, two-dimensional network on chip (NoC) that spans horizontally and vertically over the FPGA fabric, connecting to all of the FPGA’s high-speed data and memory interfaces. This high- speed network running over the FPGA programmable-logic fabric could help to enable the card to compete with other accelerator technologies such as GPUs. The Speedster7t NoC supports high- bandwidth communication between interfaces and custom acceleration functions in the programmable-logic fabric. Each row or column in the NoC is implemented as two 256-bit, unidirectional,

www.scientific-computing.com | @scwmagazine

industry-standard AXI channels operating at a transfer rate of 2 Gbps.

Machine learning processors The new card also features a large array of programmable math compute elements, organised into new machine learning processors (MLP) blocks. Each MLP is a highly configurable, compute-intensive block, with up to 32 multiplier/accumulators (MACs), that support integer formats from 4- to 24-bits and various floating- point modes including native support for Tensorflow’s Bfloat16 format as well as the highly efficient block floating-point format which dramatically increases performance for ML applications. ‘These features and the tight integration of MLP blocks with embedded memory blocks eliminate the traditional delays associated with FPGA routing, ensuring that machine learning algorithms can be run at the maximum performance of 750 MHz,’ added Petrie. ‘This combination of high- density compute and high-performance data delivery results in a processor fabric that delivers the highest usable FPGA- based tera-operations (TOps) per second.’ Critical for high-performance compute and machine learning systems is high off-chip memory bandwidth to source and buffer many high bandwidth data streams. To achieve the needed level of bandwidth, Speedster7t devices include hard GDDR6 memory controllers to support high-

users that, in the past, FPGAs where only suitable for a limited number of applications and could not provide the general usability of technology such as a GPU: ‘I do not think it was that FPGAs were only suitable for certain applications, it was more than FPGA economics means that they can be relatively pricey and they have a power profile which meant that you could often do something in a standard product or an ASIC then the power could be dramatically lowered,’ he said. ‘In these type of applications today,

where you get into high compute requirements and not just simple programmability, now you are getting into the type of solutions that are consuming hundreds of watts – and if there is any level of inefficiency, like with a CPU, now you are going to be on the other end of the scale where the acceleration factor makes the FPGA much more power-efficient,’ added Mensor. Petrie also added to the reasons behind the more broad direction for FPGA technology. ‘There have been a number of inhibitors in the adoption of FPGA technology and these are being addressed; Steve has already touched on some of them. ‘These things were relatively expensive and that relegated them to extremely high- performance applications. If you wanted to play with an FPGA you basically couldn’t. Making this technology much cheaper and more ubiquitous has been really important,’ said Petrie. ‘The other thing that has been an inhibitor

to adoption is programmability. There have been innovations such as the NoC and they [Achronix] have hardened a lot of the IP that users would normally have to have written themselves. The IP is now ready to go and that provides a major step up for new users in particular,’ Petrie concluded.

December 2019/January 2020 Scientific Computing World 11

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28

orderForm.title