SCW_AUGSEPT10

HPC TECHNOLOGY: STORAGE

➤

expensive string comparisons of names) to locate the position within the directory name table object at which to begin the search for the required name.

Pipelining is achieved when multiple

fi le system operations are simultaneously overlapped in their execution sequence. For a NAS system, pipelining means multiple data requests, usually from some number of independent hosts concurrently. SiliconFS achieves data pipelining by routing data operations to independent sets of FPGAs for accelerated processing. The operations are independent and have neither shared-memory nor message-passing dependencies. Finally, unlike general-purpose CPUs, FPGAs enable massive parallelism and therefore don’t require the extreme clock rates of CPUs to process large amounts of data. The relatively low 90 MHz clock on the FPGAs means lower power and cooling requirements.

Tape as ‘active archiving’ The advanced systems described in this article rely on tape to archive the huge amounts of data they generate. In general, says Molly Rector, VP of product marketing at Spectra Logic, modern tape systems are 10 per cent the cost of high-density lower cost disc solutions and provide 6x to 8x the storage in the same footprint. She says: ‘If you can wait a minute, you can save six fi gures in power and cooling.’ That company’s T-Finity has an entry base capacity of 150 TB, but it scales up to 91 PB of compressed storage in a single library and you can easily connect multiple libraries. As for transfer speeds, the fi rm specs 140 MB/sec. Perhaps the biggest knock against tape has been the delay for a fi le, waiting for a tape robot to fi nd and mount the tape before the data is even read. Even just ten years ago it could take hours to get a large fi le set. That’s changed, says Rector, in what she calls ‘active archiving’ – today it takes between 55 to 75 seconds (the lag time to spin up the tape) to get data, and the data is fully accessible without manual intervention. Access is even faster if you set data groups in a single drawer, such as all data from one experiment, because there’s no need for robots to go searching all over for the proper tapes. Integrity is also improving. Before the data is written the tape mechanisms are verifi ed to be functioning properly, and during streaming, software checks that what you get out is what you expected. It’s also possible to implement RAID on tapes.

30

Parallelism not needed for offl ine analysis

The focus in this story has been on large parallel fi le systems, but in some science a parallel fi le system isn’t mandatory. Consider the case in molecular biology, where just in the last two or three years there’s been an explosion of data driven by instruments such as spectrometers. A typical lab preparing samples might have 300 GB of binary data per week, and a larger lab could have 10 times that amount. There’s a need to retain all this data because scientists don’t always know what will be important in the future. ‘They’re looking for the billion-dollar needle in a haystack,’ says David Chiang, founder of Sage-N Research. Until recently, explains Chiang, scientists

thought they could get by with a bigger PC and larger USB drive, but that’s clearly no longer the case. There’s now a transition from PC-style IT to server class machines. At the same time, these biologists aren’t interested in cores and gigabytes; they just want to get

T-Finity’s dual robotics and redundant systems provide robust data availability supporting active archive access to all data

their work done. That’s why this company sells turnkey systems including analysis software and storage. A recommended system has 8.5 TB with a RAID 5+1 hot spare and a separate RAID 5+1 in a separate building. Transfer speed isn’t the issue here. Life scientists often deal with large datafi les in offl ine analysis so there’s no need for parallel fi le systems.

Solid-state in the future One technology not yet mentioned is solid- state memory, whose advances in densities and prices are starting to make them attractive

SCIENTIFIC COMPUTING WORLD AUGUST/SEPTEMBER 2010

in large storage systems. But where? ‘Flash makes sense as an intermediate step between fast RAM and disk,’ says ORNL’s Buddy Bland. In many labs, the highest bandwidth requirements are for checkpointing – so why not put on fl ash or phase-change memory for that job? ‘How can we best use solid-state memories? That’s going to be a big deal going forward, and it’s something we’ve been looking at for some time now,’ adds Bland. According to Jamon Bowen, director of sales engineering for Texas Memory Systems, SSDs (solid state disks) have always had a niche for metadata use, but now there are two new use cases. First, with a shared namespace and SSDs, storage can be leveraged for small- block I/O and used more like a high-latency shared memory space with tremendous capacity. Very few HPC jobs are written with this I/O pattern, so this is really not targeting existing apps, but are getting people to move from mainframe or huge memory NUMA (Non-Uniform Memory Access) systems to an HPC setup utilising SSDs in the namespace to make the switch possible for the random IO requirements that disks cannot supply. Second, says Bowen, SSDs have recently reached the point that they can be less expensive per ‘GB/s’ than disk. They’re more expensive per GB, but for HPC workloads, bandwidth is often the factor that limits performance. TMS, for instance, is introducing QDR Infi niband interfaces on its SSD systems to deliver 8 GB/s per 3U chassis. Mike Lance, director at NextIO, comments that, although parallel fi le systems have been historically weak in the area of media redundancy, many new solutions have emerged to fi ll that requirement. To help HPC administrators enhance the performance of their clustered systems, fl ash storage arrays are capable of providing even greater performance and improved latency over traditional server hard disks. Not only can it replace local hard disks, fl ash can be used in a direct attach storage array to give servers access to more than 5TB of local storage per node. Because the array is a direct attach over PCI Express to each server node, there is no latency introduced through extra disk controllers or software.

References

Tennert, O, ‘High Performance Computing 2010: Technology Compass’, transtec AG.

www.scientifi c-computing.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48