SCW_AUGSEPT10

HPC TECHNOLOGY: STORAGE

Feeds and speeds T

The potential bottleneck in many HPC systems is moving data between disks and memory. Great progress in hardware and parallel file systems has been made, and Paul Schreier takes a look at what we can expect before long

he unrelenting increase in the number of cores in HPC systems carries many implications, and a key one is storage. These many cores have a voracious appetite for data, and it’s not cost effective for them to sit idly by waiting for data. In addition, they’re generating unfathomable amounts of data. We apparently want to store all that data. The total HPC market in 2009 for all products and services was $15bn, says Addison Snell at market research company Intersect 360, while storage was about 16.5 per cent of that total. At 9.2 per cent CAGR, it’s the highest growth segment of all categories. The problem isn’t so much storing the data

– large server farms can handle plenty of it – but rather moving data back and forth within computational job flows at speeds that won’t throttle processing tasks.

A data superhighway The original NFS (network file system) developed by Sun Microsystems has long been the de-facto standard for the provisioning of a global namespace in networked computing. A central master node acts as an NFS server with its local file system storing input, interim/ results data and exporting all this to other cluster nodes. The bottleneck arises with a large number of nodes, because the single NFS server can’t keep up with receiving or delivering data in a prompt fashion. Also consider that today’s HPC systems often work with extremely large files.

The solution has come in the form of the parallel file system. With it, a large file is ‘striped’, meaning it is divided into many smaller pieces that are simultaneously sent to a number of physical disk drives. This approach clearly reduces the amount of time to read or write large files. ‘It’s the difference between a two-lane road and a multilane highway with as many lanes as you like and as

www.scientific-computing.com

wide as you like,’ explains Larry Jones, VP of marketing at Panasas.

A key piece of the puzzle is a metadata

server: it keeps track of which data is stored on which physical device and how to access the files (the layout). Besides these striping parameters, this server further manages other metadata including access rights. In this

A parallel file system provides multiple paths from clients to storage (courtesy of www. pnfs.com)

discussion you’ll also see references to object storage servers, which hold the constituent pieces of the file (the objects that make up the file, which allow parallelism by writing objects to many object storage servers).

Dominant parallel file systems Another key piece is the parallel file server software that handles all these tasks. Today there are three main contenders with a variety of proprietary schemes also being sold. The dominant names are Lustre (originally developed by Cluster File Systems (CFS), later acquired by Sun Microsystems, which was subsequently acquired by Oracle), GPFS (IBM) and PanFS (Panasas) with pNFS (an extension of the NFS standard proposed by the Internet Engineering Task Force) soon to become available.

Of these, Lustre is the most pervasive on large-scale systems – seven out of the top 10

and 60 out of the top 100 systems run it, and it’s the only one that is open source. Because of that open source aspect, however, Lustre is also surrounded by many questions dealing with continued available support. In fact, one supplier recently ran an online seminar asking the provocative question, ‘Feeling exposed by Oracle’s lack of support for Lustre?’ – a question some other vendors and users feel is totally unjustified. Specifically, in the past, CFS –and later Sun – provided support for Lustre on virtually any hardware platform. Oracle has indicated that this will change, starting with Lustre 2.0, where only Oracle and certified non- Oracle hardware will be supported. At a Lustre User Group meeting last April, Oracle made these statements: ‘The Core Lustre file system technology, including both Lustre 1.8 and Lustre 2, will remain open source… Oracle intends to continue to release Lustre 1.8 as a software package as it is today, tested, qualified, and packaged for all the currently qualified client and server platforms. Oracle intends to support Lustre 2 on its future integrated storage products…Oracle intends to provide a Lustre 2 qualification program for Lustre 2 support on non-Oracle hardware… Oracle intends to build and test a software- only release of Lustre 2, but there is no plan to support this open source version.’ Such questions weren’t a concern of Oak Ridge National Laboratory several years ago when it was designing the Jaguar supercomputer, currently leading the Top 500 list, and which uses Lustre for its parallel file system. However, the ability of Lustre to scale and provide the required performance on this platform were concerns. Working closely with CFS and later Sun, the Lustre file system was scaled to more than 26,000 file system clients and, through the development of an advanced overlay network mechanism, Lustre now

SCIENTIFIC COMPUTING WORLD AUGUST/SEPTEMBER 2010 27

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48