This page contains a Flash digital edition of a book.
Storage HPC 2012 David Barker of 4D Data Centres looks at the storage issues created by large datasets

With the rise of Big Data the world of high- performance computing is being reversed. Te focus is no longer primarily on the computation part of the platform, but on

the data itself and how to manage the storage of such large datasets. Although this data is now an organisation’s most important asset, unfortunately not a lot of consideration is generally given to managing it, with data being shovelled into the public cloud. Tis may save money in the short term, but longer term the data is now being held on unknown storage, in an unknown facility, sometimes on the other side of the world. Besides the obvious security risks of public

cloud storage, there are other factors to consider. Tese include higher bandwidth costs to access that data for analysis; efficiency of computations if an organisation needs to access the remote

data; and vendor lock-in (some cloud providers only allow access via their own proprietary set of APIs and it can be very difficult to get the data back if you wish to move providers). If you have infrequently accessed data,

that isn’t a security risk, then making use of a large-scale public cloud probably makes sense. Te best solution here would be to team the cloud storage with a local cache (known as a ‘cloud storage controller’). Tis will keep a local cache of the most popular data in order to keep processing times down. Tis solution only works with infrequently accessed data as moving large quantities of information, especially in the terabyte or petabyte range, across the internet to the cloud and back again in a short period of time is impractical, even with a large bandwidth leased line. Where the processing of the data takes

place is an important factor; once all the data is in the cloud, then utilising it with a cloud compute farm from the same provider is a very

compelling idea, especially when users have elastic demands for their processing. However, if you already have a local compute farm, then a private cloud storage platform, hosted on-site or within the data centre of the colocation provider, makes a lot more sense as it removes the bottleneck of transferring large quantities of data over the internet. Public cloud storage should offer a simple

solution for HPC but it is currently let down by the network, lack of maturity around data security and uncertainty around the process of managing the growth of the storage when paying monthly on a ‘per GB’ or ‘per TB’ basis. At the moment, a local private cloud that can scale with additional storage in a manner that doesn’t require downtime and which can be accessed over relatively cheap iSCSI or fibre channel is the best comprise. However, as the costs for leased lines come down, and the maturity of public clouds continues, we can expect to see more data being pushed out over the next 5-10 years.

Barbara Murphy, chief marketing officer at Panasas, discusses Big Data

Big Data is not a data type, per se, but rather a label that describes the phenomenon of using data to derive business value. Its recent growth has occurred thanks to

the massive advances in technology, including faster, cheaper, processing power, affordable clustered computing, lower cost SATA storage, and networking performance improvements, allowing almost any corporation or research laboratory to perform computing tasks that required advanced systems only a few years ago. Business customers have traditionally

looked for storage with enterprise-class features, reliability, and fast response times (IOPS). With

Further information 4D



EMC Isilon








Raid inc.


Spectra Logic




Virtual Instruments

Western Digital

the growth of Big Data, requirements for larger and larger data volumes and higher throughput have increased. At the same time, technical computing customers, those traditionally thought of as HPC, who have looked for high- performance storage solutions that support high throughput in the past, now look for more enterprise-class storage features and capabilities. Historically, many of these systems were

based on open-source soſtware components that depended on technical expertise to piece together and maintain them. In recent years, however, an increasing number of HPC customers have looked to vendors to provide high-performance, parallel storage systems that are also reliable and easy to install and manage. Tat is, bringing enterprise-class reliability and

features to high-performance computing. While high throughput is essential for

HPC/technical computing customers, there is increasing interest in being able to support mixed workloads on their primary storage infrastructure and customers want to use storage for more than just technical computing needs. Panasas performed an extensive analysis of file system file size data from a sample of its current customers and prospects across different market verticals. Even at the most large file-oriented customers, data sets consisted of a very high percentage of small files that took up a very small share of the total disk capacity. As Big Data storage reaps the benefits of what the HPC community has already learned, 2013 looks set to be an exciting year of growth. l

Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32