SCW_JUNJUL15

high-performance computing

➤ Jean-Christophe Baratault, senior business development manager HPC GPU Computing at AMD, explained that AMD has made significant improvements in energy efficiency and memory bandwidth in recent years. Tis can be seen by AMDs at the top of the Green500 in the most recent list, published last year. Te Green500 is a list of the top supercomputers, rated by energy efficiency rather than pure computational power.

APU rather than GPU? Baratault said: ‘Another benefit is performance: double precision performance, which is very important for HPC; as is the memory bandwidth, because I would say that eight out of ten applications are memory bandwidth-limited. On top of that, we have very large frame busses up to 16 BG so there is no equivalent today per GPU.’ One thing to unique to AMD is its full

OpenGL acceleration. Although OpenGL is mainly used for professional 3D graphics rendering, there are a growing number of HPC applications both for number crunching and 3d graphics rendering in HPC that can make use of this technology. ‘Tis is unique; it is only with our boards’

said Baratault. ‘It is not applicable to all of the HPC applications for sure, but I think

FOR AMD’S NEW GPU ARCHITECTURE, THE FOCUS WILL BE ON ENERGY EFFICIENCY

with this unique functionality we can address some specific workloads.’ One thing that Baratault was keen to stress

was AMD’s commitment to HPC, which has been revitalised in recent years. ‘We showed the world last year with the Green500 that we have the most energy-efficient GPU; we have not yet talked about potential future products, but it has been highlighted during the AMD final show day in New York a few weeks ago that, for the new GPU architecture, the focus will be on energy- efficiency.’ At that meeting, the company announced

its roadmap for new CPUs and GPUs, but it also announced that a new 64 bit accelerated processing unit (APU) would be coming to commercial laptops this year. ‘Te big question is more on how do

you save energy? Will the PCIe bus be a solution to address the challenges of exaflop computing – because you have to send the data, and we are talking about terabytes of information?’ said Baratault.

40 SCIENTIFIC COMPUTING WORLD ‘We think that the future is based on

system-on-chip like the APU that AMD has,’ he said. Baratault went on to explain that the first

APUs were designed strictly to address the consumer gaming market, but the newly announced Carrizo APU is the first 64 bit x86 APU to be released for consumer laptops later this year. Baratault stated: ‘Te reason why it is going

to be interesting is because you will be able to take your existing code in OpenCL and start preparing your code using this Carizzo APU as the test vehicle. So that, whether you are an academic or an ISV, you can be prepared for what I think is going to be the big revolution at AMD that we announced during our final show day, a multi-teraflops HPC APU.’

Open frameworks spur technology adoption All manufacturers of processors, not just accelerators, are facing similar challenges. As the industry meets the hard limits of materials science, it cannot continue to shrink semi-conductor technology at the same rate as before. Companies have thus been forced to look for more innovative solutions. Rajeeb Hazra, VP of Intel’s architecture

group and GM technical computing said: ‘We are starting to plateau, not as a company with a product but as an industry, on how quickly we can build performance with the old techniques, by just increasing the frequency.’ Te increases in clock speed over previous

CPU technology, which would give soſtware developers a free performance increase just by running their soſtware on the latest CPUs, is now coming to an end. In order to continue to increase performance, processor manufacturers must exploit parallelism within application code. Hazra stressed that the key to increasing

performance is first to clearly identify which sections of code can be parallelised and which cannot. Intel is developing a suite of products that can address both the highly parallel code with its Xeon Phi, and the serial code with its Xeon CPUs. Hazra went on to give an example of ray

tracing used in graphics rendering and visualisation: ‘Each ray does not rely on information from another, so each can be computed in parallel,’ he said. Hazra said: ‘Amdahl

said the world is not all highly parallel or highly serial, there is a spectrum and the amount of performance gain you can get through

Nvidia K80 dual GPU accelerator @scwmagazine l www.scientific-computing.com

parallelism is gated by the proportion of how much is parallel and how much is serial in an application.’ Hazra concluded by highlighting that

the years of work that have been put into the development of its Xeon CPUs directly address the serial portion of code; now Intel has developed the many-core architecture to address the more parallel sections of code. ‘What we have done for years with Xeon,

is provide a fantastic engine for the serial and the moderately parallel workloads. What we have done with Xeon Phi is extend that surface to the very highly parallel workloads.’ Hazra said: ‘Tere are many applications

that are highly parallel and there are some that cannot easily be parallelised. So what we need is a family of solutions that covers this entire spectrum of applications, and this is what we call Amdahl spectrum.’

The need for a skill base No matter the competency of the technology, if it is not adopted by the broadest spectrum of users then it will fail to gain traction in a market that is fiercely competitive. Tis is made more difficult as, in the case of new processor technology, applications must be re-written, a skill base needs to develop around the architecture, and this takes time. While Nvidia has a head start with its

own programming language and tools, AMD and Intel have decided to look at open programming frameworks. As Baratault explained, this gives some freedom to application developers as they are not necessarily locked in to one hardware platform. But the overwhelming reason is that it helps to speed up adoption of the new technology as the hardware manufacturers are tapping into an already established base of skilled programmers and the codes and techniques that they have developed to optimise code. It means that new

➤

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52 | Page 53 | Page 54 | Page 55 | Page 56