SCW_AUGSEPT10

HPC TECHNOLOGY: STORAGE

provides a centre-wide fi le system for all major systems within the Oak Ridge Leadership Computing Facility (OLCF).

Today’s biggest and fastest? In confi guring the storage for this computer, explains Arthur ‘Buddy’ Bland, project director of OLCF, they needed something well-balanced to cover the breadth of scientifi c applications the lab covers so they could ‘take all comers’. Just a few examples include studies on climate change, ultracapacitors, superconductivity, fusion energy along with basic science such as studying supernova explosions. As in most HPC systems, a requirement was to get data in and out fast. They determined they wanted 200GB/s bandwidth between discs and main memory in the Cray XT5 system with its 224,256 cores, a spec that then was pushing technical limits. In fact, their Lustre-based fi le system has achieved in excess of 240GB/s with 10PB of storage with RAID6 protection.

Besides bandwidth, other key criteria were density and fl oor space. The ‘Spider’ fi le system consists of 13,440 1-TB drives housed in 32 racks, and architecture is made of 192 building blocks, each with an object storage server that runs the fi le services part of Lustre and serves 7 RAID6 sets (70 disk drives). This all fi ts in a few hundred square feet of valuable computer- room space. This was possible with storage from DataDirect Networks, which provided twice the density of any competitor at the time. ‘We don’t know of another computer in the

around the world using it. He notes that there are only a handful of serious parallel fi le systems and that it’s important to support Lustre for both business and technology reasons. Oak Ridge is today working hard to make sure Lustre continues to stay around, and the lab is involved with the Open SFS (scalable fi le system) Consortium, which has a goal of providing a viable ecosystem for Lustre.

What’s in store

If this all sounds impressive, look what’s in store: the IBM Blue Waters at the National Center for Supercomputing Applications, explains technical programme manager Michelle Butler, is expected to be the most powerful supercomputer in the world for open scientifi c research when it comes on line next year. With more than 300,000 cores on 8-core Power 7 processors, Blue Waters will achieve peak performance of approximately 10 PF and will deliver sustained performance of at least 1 PF. The system will have a peak memory bandwidth of nearly 5 PB/sec, more than 1 PB of memory, 18 PB of disk storage with a peak I/O rate exceeding 1.5 TB/sec aggregate. Redundant servers/paths will be confi gured and advanced RAID will be used in conjunction with IBM’s General Parallel File System (GPFS). A large near-line tape subsystem, eventually scaling to 500 PB, is also directly attached to the Blue Waters system. The tape archive will run IBM’s HPSS (High Performance Storage System); while appearing to the user as a disk fi le system, HPSS moves inactive data to tape and retrieves it the next time it is referenced. It will also work with the new GPFS-HPSS interface (GHI) to integrate the tape archive into the GPFS namespace. GHI also allows for the transparent migration of data between disk and tape.

The 10-PB Spider fi le system at Oak Ridge National Laboratory

scientifi c world right now with this capacity and bandwidth,’ boasts Bland. Meanwhile, that amount of fl oor space could be reduced even further, because DDN is leading a trend to move the functionality of the object storage servers directly onto the drive controllers. In this case, eliminating 192 servers reduces not only cost, but also space and the power/cooling budget.

Commenting on the choice of Lustre for the parallel fi le system, Bland notes that ORNL collaborates with many other laboratories

28

A close cousin to Lustre A third major player in parallel fi le systems is Panasas with its PanFS, which has common roots with Lustre, because both were designed at roughly the same time with a similar architecture at Carnegie Mellon University (CMU). The original problem statement for parallel NFS access was written by Garth Gibson, a professor at CMU and co-founder and CTO of Panasas. Gibson was already a renowned fi gure, being one of the authors contributing to the original paper on RAID architecture from 1988.

Panasas recently announced it will provide SCIENTIFIC COMPUTING WORLD AUGUST/SEPTEMBER 2010

the fi le system for Los Alamos National Laboratory’s CIELO, which is expected to have more than 10x the technical capacity of the Lawrence Livermore’s Purple supercomputer system, which it is replacing. CIELO will be based on Cray’s just-announced XE6 architecture with a new interconnect named ‘Gemini’. In phase one, the platform will consist of 6,704 compute nodes, 107,264 compute cores and 221.5 TB of memory; in phase two, the system will be expanded to nearly 9,000 compute nodes and approximately 300 TB of memory. The fi le system will support users at all three National Nuclear Security Administration (NNSA) laboratories including Lawrence Livermore, Los Alamos and Sandia.

pNFS on the horizon Panasas is also very involved in the development of pNFS (parallel NFS), a standard currently under development by the Internet Engineering Task Force. This client code is expected to be included in Linux distributions such as Red Hat 6.0 by mid to late next year. Users will then have the easy option of selecting either NFS or pNFS out of the box when setting up systems. According to Larry Jones, this will eventually replace Panasas’ DirectFlow client, but he sees this as a positive trend, because Linux distributions with pNFS will expand the market for parallel fi le systems. That then means more business for the company’s server-side PanFS and complete hardware/software storage solutions. ‘The hardware and software must work closely together or else you’re going to drop data,’ says Jones, ‘and although we build blade systems, 80 per cent of our engineers do software development.’

Hardware acceleration in fi le systems

There seem to be three camps when it comes to storage, adds Jones: those who concentrate primarily on software, those who sell complete systems and those where it’s all about hardware. An interesting player in the third category is BlueArc, whose SiliconFS fi le system offl oads specifi c fi le-system operations to fi eld- programmable gate arrays (FPGAs). Memory objects are manipulated by logic residing in FPGAs; for instance, when a host requests a particular fi le/directory by name, FPGAs execute a binary search of numeric values (as opposed to having to do computationally

www.scientifi c-computing.com

➤

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48