This page contains a Flash digital edition of a book.
Volume, velocity, and variety


In the era of big data, the challenge for storage technology lies not in massive


data files but in the smaller ones, as Tom Wilkie discovers. On the following three pages we round up some recent releases representing this trend


I


n high-performance computing, as John Barr’s article on page 18 of this issue notes, soſtware is being outstripped by the pace of developments in hardware. Tere are still not


enough applications capable of running efficiently in the world of massive parallelism, accelerators, and exotic processors such as those from ARM. But although science, industry and commerce


are all generating ‘big data’ and thus an increasing demand for storage, the hardware has not shown the kind of development so evident on the computational side. ‘Storage has been on a hard- drive kick for the past 20 years or so,’ as Geoffrey Noer, senior director of product marketing at Panasas, put it. Although the capacities of hard disks are


increasing, the interfaces and hard drives themselves have been getting only slightly faster. As a result, increased capacity comes at a price: their performance in terms of I/O and throughput is not keeping pace, to the point where some types of hard drives are being relegated to less demanding tiers like archival storage. Te challenge facing all storage vendors is to


meet their customers’ requirements – or exceed them – not only for the amount of data under storage; but also for the speed at which it can be accessed; its security; and ensuring that the whole system is cost effective. For Laura Shepard, director of HPC markets at Data Direct Networks (DDN), the task is ‘optimising among four criteria: scalability; availability; performance; and cost-effectiveness’. Flash technology holds the promise of


overcoming some of the limitations of hard drives, but unlike parallel processors on the compute side, it’s relatively expensive. Te onus for optimising storage solutions therefore falls on those who create the architectures and they need to understand the structure of the data itself. According to Noer: ‘Te challenge for storage vendors is to use hard drives and flash technology


24 SCIENTIFIC COMPUTING WORLD THE ONUS FOR


OPTIMISING STORAGE SOLUTIONS FALLS ON THOSE WHO CREATE THE ARCHITECTURES


together in an efficient way that delivers most value for the customer’. Customers are interested in ‘volume, velocity, and variety’, according to Shepard. Tey want to store large amounts of data and they want high speed access. But it is the third ‘v’ – the variety of files – that holds the key to developing storage systems that can meet the requirements without breaking the bank. According to Shepard: ‘Variety is a fun one,


because the mix of I/O is becoming one of the most challenging components of HPC workflows. Big data is not just volume, but variety: larger and larger files with lots and lots more descriptive data – metadata – which, of course, is tiny. So those [storage] systems that have spent the past decade trying to fix the big-file problem are suddenly faced with the situation where we have six to ten times as much metadata and it’s all smaller. Te problem has moved – even if we did not have increasing amounts of metadata, then the small file would become the problem.’ Noer points out that Panasas has done many


surveys to understand what its customers’ data looks like: ‘Even with technical computing


workloads, it is very common for 70 to 80 per cent of the files to be smaller than 64kbytes. But these files take up less than 0.5 to 1 per cent of the capacity, because the large files dominate.’ But the small files put a big load on the system and interfere with the responsiveness to the end user. ‘So if you store all the small files, as well as all of the metadata on flash – and that’s something you can do with very little cost impact on the system because you only need one per cent of the capacity to be flash – the characteristics of the storage change dramatically.’ Panasas’ current product ActiveStor 14 works in just this way. For Shepard too, such mixed I/O workflows


have become increasingly important in HPC, on the commercial side as well as in research, and DDN’s Storage Fusion Architecture (SFA) architecture is set up to accommodate them. ‘Of the Top100 fastest sites in the world more than two-thirds are being powered by SFA technology,’ she said. When, some five years ago, the company


redesigned the architecture, and built SFA, one of the main considerations was a mixed I/O approach. ‘Tis was the beginning of our journey as a company in recognising the importance of cache-centric storage as a component of high- performance computing,’ she said. ‘Everyone has cache, but leveraging intelligent


cache to reduce cost but also focus on mixed I/O extends into the current era where we’re talking about integrating flash into a lot of different layers in the architecture – that’s another form of cache centrism.’ Noer expects that ‘as flash moves to NVMe


[non-volatile memory express] based on PCI Express, that will make for very high-throughput devices, so we expect to see flash devices play a broader role in the future.’ He pointed out that Samsung is the world’s largest manufacture of flash memory and that its investment arm, Samsung Ventures, had invested in Panasas.


@scwmagazine l www.scientific-computing.com


fotomak/Shutterstock.com


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40