HPC YEARBOOK 2012

Networking HPC 2012 Make a

Within general purpose IT, Ethernet remains, and will continue to remain, the network of choice. New classes of high-speed Ethernet technologies are being driven by

the needs of high-performance data centres surrounding Big Data, and the comment ‘I don’t know what the interconnect of the future is, but I know it’s something-Ethernet [sic]’ is now gaining credibility in traditional large- scale high-performance technical computing (HPTC) markets as technologies borrowed from proprietary networks are standardised and the fabric-model of interconnect emerges. HPTC can be described by the requirements

of balance between throughput where applications are sensitive to I/O load and low- latency where an application is parallelised and decomposed over multiple cores and servers. Typically, organisations have an operational mixed workload of applications of these types and thus systems need to apportion bandwidth appropriately to allow the system to scale within a fixed budget. Providing the entire HPC service therefore becomes a balancing act of how much of a potentially proprietary HPC interconnect is needed (ranging from 0 to 100 per cent) versus the operational requirement of providing a service for both types of users within an operational budget and service level. Tis can become exacerbated as multiple

HPC systems are deployed and data services become fragmented, especially in the presence of cluster-file systems, for example. While there exists a choice in terms of the HPC interconnect, cluster systems require at least one standards-

14

connection

Enabling users to exploit the full potential of their systems, interconnects have a vital role in HPC. We find out more

John Taylor, VP, technical marketing at Gnodal looks at high-speed Ethernet for high-performance technical computing

based network for cluster administration and/ or storage. During the past eight years, proprietary

interconnects specifically used for HPTC (as measured by http://top50.org) have witnessed a rise-and-fall on quite short timescales while, surprisingly some would comment, Ethernet has remained constant. Tis has tracked a high flux of performance enhancements in commodity CPUs; a standardisation and convergence of the commodity I/O interfaces promoting scale-out storage and compute, as well as enhancements to operating systems supporting low-level packet processing used to deliver the low latency with low overhead. With the rise of high-speed

Ethernet technologies in HPDC, two important points require highlighting. Firstly, is the emergence of Remote Direct Memory Access (RDMA). Tis has been used by a number of proprietary interconnects as a mechanism to avoid excessive processing of the host CPU and lower latency. Such techniques avoid the requirement of the host O/S to be directly involved in communication in various degrees, such as copies among multiple buffers and kernel bypass. RDMA is now being supported on Ethernet, under new lossless standards, to provide significant gains in latency. For example, MPI implementations used as a parallel abstraction for applications can now make use of iWARP and RoCE which are now supported in general by O/S stacks and offer significant opportunities around convergence. At the Ethernet switch edge, current

switches can impose severe latency penalties on individual packets and when compared to the

latency at the edge, add significantly to overall round-trips in large systems. While latency is an easy metric to measure at the micro-level, the macroscopic effect of congestion within switches incurred by these hot-spots in the network can cause catastrophic drop-off in the overall bandwidth. Secondly, therefore, bridging the gap between

“Ethernet remains, and will continue to remain, the network of choice”

the latency of proprietary server adaptors and standard Ethernet comprises only one aspect of delivering the performance necessary for a wider exploitation of Ethernet in the HPTC sector. Ethernet switches have not previously been sympathetic to exploitation at the large scale, as heavy-weight algorithms such as Spanning- Tree Protocol to avoid dead-lock have encumbered the use of ‘flat’ networks and dictated complex tiered layer 2 and layer 3 switches to support scale-out architectures. Tis imposes significant latency and bandwidth penalties when in operation and also

necessitates significant over-subscription causing insurmountable bottlenecks. Our approach was to develop the Gnodal

PETA ASIC and while the speeds-and-feeds of PETA are impressive, it is when multiple ASICs combine that a real transformation happens, as a self-consistent Ethernet fabric is formed conforming to Ethernet standards at the edge. Importantly, there is an upside in respect of

Ethernet and its ubiquity. Tis highlights the area of convergence: a single-wire-to-rule-the- cluster, reflecting the ability to support device- level communication with a single protocol encompassing inter-processor communication as well as network storage. Tis reduction in complexity will transform the use of HPC.

Spectral-Design/Shutterstock

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32