SCW_JUNJUL15

high-performance computing

➤ developers do not have as steep a learning curve as they try and adapt to the new technology. AMD has opted for OpenCL, which has an

open framework designed around supporting many different heterogeneous computing platforms, from GPUs to FPGAs or DSPs. Baratault explained that he thought the adoption of OpenCL would be a big benefit to AMD. ‘Code portability is a key topic; it means

that a user is not tied down to one vendor; it is also the best way to leverage programmers’ expertise,’ said Baratault. ‘We see more than 1,000 ongoing OpenCL

projects; it is growing because it is an ecosystem supported by various hardware vendors that have seen the benefits in shooting for such a programming framework.’ Intel has perhaps the easiest job in this

regard, as it is extending languages, models, and development tools from the Xeon CPU family across to the Xeon Phi. Hazra said: ‘Tere is a lot of legacy code; there is a lot of knowledge about how to write such code that has been built up in the soſtware industry; and we did not want to lose that. It would be akin to burning your entire library and starting anew.’ Hazra also explained that because

programmers were using similar techniques, such as threading and vectorisation to parallelise code, investigating speedup on Xeon Phi would never be a wasted exercise because the improvements would still impact CPU performance. Hazra said: ‘Tat is a huge economic gain

for companies that have huge applications and do not have the money to throw away on months of work’. He also stressed that this meant programmers could use Xeon CPU as a development environment for Xeon Phi code using AVX instructions for example.

The question of PCIe Te question raised by Jean-Christophe Baratault of AMD about the long-term future of the PCIe bus, has troubled the HPC community for as long as it has been looking to face the challenges of exascale computing. As applications increase in scale and complexity, there is an ever-increasing need to drive more data to the processors, whether CPU, GPU or accelerator. But data movement has its own energy cost. Tis has led accelerator manufacturers to come up with new strategies to integrate accelerators more efficiently into the compute architecture – in essence moving the accelerators closer to the data or at least giving them more access to it.

42 SCIENTIFIC COMPUTING WORLD Intel’s Xeon Phi coprocessor 5110P/3000 series based on Intel Many Integrated Core architecture Nvidia was the first to announce its own

version of this technology, called NVLink, a fast interconnect between the CPU and GPUs that allows them to move data more efficiently. Kim said: ‘NVLink is going to give you

five to 12 times more bandwidth than what is available today through PCI express, and so that again gives you the ability to move lots of data to where the computing engines are.’ He went on to explain that the next

generation of IBM Power processors would also integrate with NVLink, which was a big

IT IS A REAL GAME CHANGER: INTEGRATING AN INTERCONNECT FABRIC WITH THE CPU

part of the IBM bid for the CORAL project. Kim said: ‘Te next generation of IBM’s

Power processors are going to integrate with the technology in their CPU and so if a customer deploys a power system with GPUs in the next gen set of systems you will get a high-performance interconnect between CPU and GPU.’ Intel has also been working on its own

interconnect technology. It acquired Cray’s interconnect business in 2012. So far, very little is known about the new interconnect, other than that it will be called Omni- Path, and that it will be optimised for HPC deployments. Intel’s Raj Hazra explained that Intel would

be announcing more on this technology at ISC this year, and he explained some of the

rationale behind investing in this technology. Hazra said: ‘What we are working on is Omni-Path, an interconnect tuned for HPC, and integrating that with our processor itself. Interconnects are extremely important in hyperscale systems; they take traffic back and forth, and that is how you get aggregate compute or aggregated parallelism.’ He explained that this provides benefits

including increased energy-efficiency, higher memory bandwidth, and increased compute density due to the more tightly integrated components. He pointed out that this enables a much more efficient use of resources. Hazra said: ‘Today the PCIe express card

has to have its own memory, but when you integrate that into the CPU then it can use system memory, and it is much more efficient to schedule traffic. Tis means that you can architecturally innovate with many more degrees of freedom when you are closer to the CPU than when you are in just an I/O device.’ Hazra concluded: ‘We believe it is a real

game changer integrating a high performance interconnect fabric in with the CPU and we can do this because we have the ability based on our Moore’s Law advantage.’ Never before has the HPC community

had such a choice in the type of processors and different technology platforms available on this scale. It is impossible to tell which ones will ultimately see the most widespread adoption – indeed the future may be one where several different architectures co-exist – but this competition helps to generate the innovation needed to reach exascale. l

@scwmagazine l www.scientific-computing.com

Intel

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52 | Page 53 | Page 54 | Page 55 | Page 56