SCW_APRMAY16

high-performance computing

smaller chunks, run in parallel across a network of servers. ‘We could not get any more performance

from just increasing the frequency of CPUs’ commented Shainer. ‘Te idea of running everything on the CPU

got us to the performance walls of today. Now we are moving to the exascale era and that means another technology change that revolves around co-design to create synergy.’ Te Mellanox solution to overcoming these

performance barriers is to implement intelligent switching, where the computational overheads of managing the movement of this data are moved to the switch itself, rather than being processed on the CPU. Te first commercially available products in this area are Mellanox IB2 switch and the upcoming Bull Interconnect, codenamed Bull Exascale Interconnect (BXI). Both aim to move MPI communication directly onto the switch silicon – freeing up the CPU to focus on computation. Te reasoning behind this development is

that collective operations, such as the sum of series of calculations, performed in parallel on different nodes, typically requires multiple communications transactions across a network – each with its own associated latency. By reducing or removing these collective operations interconnect developers can provide a significant reduction in latency, as these operations are managed by the interconnect and not the CPU. Shainer stated: ‘Today, when you run

collective operations on the server for MPI, the server needs to have multiple transactions to

THIS IS SOMETHING

THAT HAS ALWAYS BEEN IN THE GENES OF BULL, IT IS THIS EXPERIENCE IN MASTERING THESE TECHNOLOGIES

complete just one collective operation. Anything you do on a server will not be able to overcome this. Moving the management and execution of collective operations to the switch enables the switch to complete MPI operations with one transaction over the network.’

Making use of internal expertise While Mellanox has been developing this technology for many years, the introduction of the European supercomputer provider Bull, into the interconnect market could be somewhat confusing. But as their CTO Jean-Pierre Panziera explains, the company has significant experience developing the ASIC (application-

www.scientific-computing.com l

specific integrated circuit) used for the hardware acceleration of its upcoming BXI Interconnect. ‘We have developed ASICs in the past that

were meant for extending the capacity of SMP nodes; this is a project that was both for HPC and the enterprise markets’, explained Panziera. Panziera stressed that the main motivation

behind Bull’s decision to develop its own interconnect technology stems from the same principle as Mellanox – co-design. ‘Te second point is that we have a demand and we are working very closely with our customers. Tis project specifically was a co-development with our main partner CEA (the French Alternative Energies and Atomic Energy Commission. We had a partner, a use case and we had the technology, so we looked at what would make the most sense.’ While Bull has not developed interconnect

technologies previous to BXI, the company is using the experience it has gained from producing ASICs in previous projects. ASICs are custom built computers for specific applications – such as offloading MPI processes. Te advantage of using these custom integrated circuits is that they are specifically designed for use in one application. As such, the logic elements can be fine-tuned so that the computational power and energy requirements are exactly what is needed – providing high performance and very low energy consumption for a single application when compared to a more general purpose CPU. Panziera said: ‘Tis is something that has

always been in the genes of Bull, it is this experience in mastering these technologies.’ Te computation which manages

and executes MPI operations in the BXI interconnect is a custom-built ASIC which acts as an MPI engine- effectively a custom built computer which can be used to process MPI communications – in a similar fashion to Mellanox’ approach. Panziera said: ‘HPC now has become a

synonym with parallelism and a high degree of parallelism. Applications will oſten need to use thousands or even tens of thousands of nodes for your application. Tis puts a stress on the interconnect. Almost all applications are standardised around the MPI library for communications, for moving data between processes, between nodes.’ When asked about the potential impact

of offloading MPI operations onto the switch, Panziera explained that interconnect performance is governed by two main areas, bandwidth and latency. Panziera said: ‘It varies a lot from one

application to another. If you think about the different components of performance for your application, it might be bandwidth. Tat is quite

@scwmagazine

easy to achieve at the technology level. It is the width of the pipe in a link between the different components. If your application is some kind of LINPACK application, you do send some information across the system, but it is relatively small compared to the computation you are doing – here the impact of the interconnect will be noticeable but it will be small.’ He explained that when you start to scale

applications to run across larger systems – thousands or tens of thousands of nodes – then the importance of the interconnect increases significantly. Panziera said: ‘If you have an application that is trying to push data, the latency

IF YOU HAVE AN

APPLICATION THAT IS TRYING TO PUSH DATA, THE LATENCY AND THE NUMBER OF MESSAGES THAT YOU ARE ABLE TO EXCHANGE ON YOUR NETWORK BECOMES CRUCIAL TO APPLICATION PERFORMANCE

and the number of messages that you are able exchange on your network becomes crucial to application performance.’ ‘It is really when you are pushing the

application to a high level of parallelism, to a high level of performance, that you will see the most impact on your application. We think that for some applications it will be 10 to 20 per cent, and if you are really pushing it to the extreme – where you are trying to scale to the maximum, you could see a difference of up to a factor of two,’ concluded Panziera. While Bull is developing its own interconnect

technology, Panziera stressed that the company would still offer Mellanox’ InfiniBand solutions with the latest generation of its supercomputing platform named ‘Sequana’. Panziera said: ‘It is not something that you

can dictate to customers, if customers think that InfiniBand is better then we will offer InfiniBand to these customers.’

Reconfigurable supercomputing While most technology development is an iterative process, sometimes there will be a disruptive technology that threatens to disrupt the status quo. Te US-based Calient is aiming to do just that with the use of its purely optical switches for HPC interconnect.

APRIL/MAY 2016 9

➤

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36

orderForm.title