high-performance computing ➤
and then we have seen the gradual adoption of GPUs,’ said Yip. ‘If you think about it, we have had to do a phase shiſt in our thinking because our compute cores have 128GB or 256GB of memory in the compute node, whereas the graphics card itself has only got 6GB or 12GB. ‘Tis throws up a new challenge for HPC
developers: how can they get all of this data across into the GPU or accelerator without causing major bottlenecks for the system? It is a paradigm shiſt in terms of programming because we are talking about going from coarse grain parallelism into ultra-fine grain within the GPU on the same node,’ explained Yip.
New processing technologies for HPC In 2016 it was announced that the follow-up to Riken’s flagship K supercomputer will feature ARM processors that will be delivered by Fujitsu, which is managing the development of the ‘Post-K’ supercomputer. Toshiyuki Shimizu – vice president of
the System Development Division, Next Generation Technical Computing Unit – explained that, in order to reach exascale systems early, the HPC industry needs to address certain challenges in addition to developing new processing technologies. ‘Generally speaking, budget limitations and power consumption are serious issues to reach exascale,’ stated Shimizu. ‘Advantages found in the latest silicon technologies and accelerators for domain specific computing will ease these difficulties, somewhat.’ In regards to the Post-K system, Shimizu
commented: ‘One major enhancement on the Post-K that is currently published is a wider SIMD, at 512 bits. With the Post-K, Fujitsu chose to stay with a general-purpose architecture for compatibility reasons and to support a wide variety of application areas.’ Shimizu stressed that these are important features for this flagship Japanese supercomputer. ‘In terms of reaching exascale, we need to conduct research projects to discover new architectures, and research and development, to support specific applications. In addition to development of the
‘Post-K’ computer, Shimizu commented that ‘interconnect technology and system soſtware will become more important, as well as the design of CPUs.’ He also mentioned that, for many applications, node scalability and efficiency will also be critical to system performance.
The future is parallel Today the clearest path for exascale is through the use of increasingly parallel computing
14 SCIENTIFIC COMPUTING WORLD
architectures. Partly this is due to savings in energy efficiency that use large numbers of low-power energy efficient processors – but also the performance introduced by accelerators such as GPUs and Intel’s ‘Knights Landing’. Accelerators have continued to grow in
popularity in recent years – but one company, Nvidia, has made significant progress when compared to its rivals. According to Yip this is due to the promotional efforts of Nvidia, as they have not only raised awareness of
However, Yip was quick to point out that it
is not just the highest performing technology that will see widespread adoption by the HPC industry. ‘It all takes time there is a lot of research that needs to be done to exact the most performance out of these systems,’ said Yip. It’s not just hardware, it’s the soſtware – and we also need education for all the people that want to take advantage of these systems.
Securing the future of storage One challenge that is created by the increasingly parallel nature of processor architectures is the increased number of threads or sets of instructions that need to be carried out by a cluster. As the number of processors and threads increases, so must the performance of the data storage system, as it must feed data into each processor element. In the past it was flops or memory bandwidth that would limit HPC applications – but, as parallelism increases, the importance of input/output operations (I/O) becomes increasingly important to sustained application performance. To solve this issue storage companies
TODAY THE CLEAREST PATH FOR EXASCALE IS THROUGH THE USE OF INCREASINGLY PARALLEL COMPUTING ARCHITECTURES
GPU technology, but also spent considerable resources ensuring that as many people as possible have access to this technology through education, training and partnerships with research institutes and universities. ‘Nvidia has done the most fantastic job
over the last 10 years,’ said Yip. ‘What Nvidia has done is get cards out there to people; they have given training running workshops and education on all of their platforms. Tey have blanketed the market with their technology and that is what it takes – because, if you think about it, we are only just brushing the surface of what is possible with GPU technology’. Towards the end of 2016 Nvidia announced
that the second generation of its proprietary interconnect technology, NV Link 2.0, would be available in 2017. NV Link provides 160 GB/s link between GPU and Power 9 CPU. For the second iteration this will be increased to 200 GB/s, drastically increasing the potential for data movement inside a HPC node.
are developing several new methods and technologies for data storage. Tese range from the use of flash memory for caching, as well as in-data processing and improvements, to the way that parallel file systems handle complex I/O. Storage provider DDN has been working on
its own solution to tricky I/O challenges with its IME appliance – which, at its most basic level, is a scale-out native high performance cache. Robert Triendl, DDN senior vice president
of global sales, marketing and field services explained that IME is taking advantage of the convergence of important factors: flash memory technology such as 3D NAND and 3D XPoint, the decreasing cost of flash, and ‘a strong demand from the supercomputing community around doing something different for exascale’. ‘We see huge technological advances, which we need to take advantage of. Te economics are right in terms of implementing flash at large scale, and there is obviously significant demand for doing I/O in a different way,’ he said. One of the biggest challenges facing the
development of storage technology in HPC is that this increasing parallelisation introduced large numbers of processors with growing numbers of cores. Tis creates a problem of concurrency for I/O storage systems, as each processor or thread needs its own stream of data. Without the data reaching the processing elements in a quick enough time, the performance of the application will deteriorate.
www.scientific-computing.com
➤
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32