‘When the simulation starts, the data is immediately ready for it, so you are using the processors for what they are best at, while everything else is done in the background’

limited by how fast the system is able to read and write data. This represents a significant loss of time and energy in the system. Being able to widen, and ultimately eliminate this bottleneck would increase the performance and efficiency of HPC systems. NEXTGenIO partners Intel and Fujitsu are

developing a prototype system based on Intel’s Optane DC Persistent Memory non- volatile memory technology. The prototype system will be used to

explore how to make best use of this new technology in the world of I/O intensive high- performance scientific computing. In these initial stages of the project, partners worked together on defining the architecture of the prototype system, with the aim of creating a platform that would be transformational not only for HPC, but across the entire computing spectrum.

Memory architecture One of the early project goals first step in this process was to capture the requirements of the system that they are developing. This is then used to define the architectures for the different components in the prototype that will eventually be built by the EPCC. In total, there are three different architectures that the project will need to develop; hardware, software and the data architecture. The hardware architecture will define

the hardware components required, and how they will be connected and packaged. This is more complex than simply defining the compute nodes for the prototype, as it must cover any login nodes, management resources, networks, and I/O systems. EPCC will define a complete software

architecture for enabling the use of NVRAM in scientific applications. The project will | @scwmagazine

utilise existing software components that are already under development for NVRAM usage, as well as implementing any new software solutions that will be needed to effectively use this technology. The data architecture will provide users with an understanding of how data can move and when it can be located in our hardware and software architectures, enabling users and developers to understand how to optimise performance on the system for their applications.

Generating research Dr Michèle Weiland, project manager for NextGenIO at the Edinburgh Parallel Computing Centre (EPCC), at the University of Edinburgh, explained the objectives of the project. ‘NextGenIO is working on improved I/O performance for HPC and data analytics workloads. The project is building a prototype hardware system with byte-addressable persistent memory on the compute nodes, as well as developing the system ware stack that will enable the transparent use of this memory,’ said Weiland. ‘Another key outcome from the project’s

research is the publication of a series of white papers, covering topics ranging from the architecture requirements and design, to the system ware layer and applications. Each white paper addresses one of the core challenges or developments that were addressed as part of NextGenIO,’ added Weiland. The project has been running since 2015 and has already delivered experimental results, the white papers and the design for a motherboard, which is being released as a commercial product by a project partner, Fujitsu.

The next stage involves installing a

prototype system at the University of Edinburgh. This will then be used to test the new memory, which the project researchers will help to blur the line between memory and storage. ‘The difference is the memory can be plugged in right next to the processor, just like a DRAM DIMM. The processor will see them as a single space,’ said Weiland. ‘We focused on this because it offers two things. It offers a large amount of capacity, in the region of several terabytes per node, and it also offers performance. In terms of capacity, you can get many terabytes and, in terms of performance, although it is slower than DRAM, it is much faster than flash. Some people have been putting SSDs onto compute nodes and using that as a buffer to accelerate applications,’ added Weiland. She notes that in today’s systems, if your simulation has to read a lot of data at the start, this takes up time at the beginning of the simulation. ‘During this time your processor is not necessarily doing anything useful, they just sit reading data and depending on the size of the simulation this can take a long time. We are looking at supporting techniques to pre-load the data before the job runs,’ stated Weiland. ‘When the simulation starts, the data is immediately ready for it, so you are using the processors for what they are best at, while everything else is done in the background. You can use it both as memory and

as storage, so it has different modes of operation. You can either say I want this to be memory, and your processor will see a very large memory region, or you can say I want this to be storage and you have to manage this space yourself,’ concluded Weiland.

Spring 2020 Scientific Computing World 11

Dmitriy Rybin/

Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32