high-performance computing

to know where the data resides and the team here do not have to manage that. Since I have a small team, I do not have to worry about where the data resides as that is all taken care of automatically using the DDN WOS Bridge technology.

Object storage in scientific computing Tis project was a consolidation of disparate storage systems onto one central platform that is accessible to everybody equally, reliably, and in a robust fashion. As I mentioned earlier, our workflows are not very different from other institutions, and like us, over time storage systems do grow, especially across large institutions. As they grow, they become harder to manage. Tis issue of managing data is somewhat

automation and image acquisition and designing algorithms for image analysis. In this programme, our faculty has research projects that use microscopy exclusively to study hypotheses. We distribute the technology, the methods

and reagents we develop to the broader research community – we have a big role in educating the wider industry, not just our students.

Our challenge With such a broad range of research and outputs, it is interesting to note that our central IT department does not have a mandate to support our central research data systems. In fact, our IT support is decentralised and is all department based – that in itself required a significant administrative overhead. Additionally, data has been growing, and continues to grow at an unprecedented rate – just a couple of years ago, our data growth was around one terabyte per week. Tese two challenges, coupled with an

independent HPC resource, multiple storage systems and a plethora of Lightweight Directory Access Protocol (LDAPs) for authenticating users, made it very difficult for us to collaborate internally, let alone when we want to make our findings available to other institutes and researchers. Tere was also a lack of high availability systems – if we ever needed to perform maintenance, or there was a network outage to deal with, our systems had to go down and important research would have to be scheduled in advance, or potentially stop altogether – that is unacceptable. Our workflows may not be very different from other biomedical research facilities; we


collect raw data on microscope systems – we collect the images and, using metadata, can catalogue them. Our researchers study the data sets using homegrown and commercial soſtware systems on many different hardware platforms, which all need to ‘speak’ with each other seamlessly. In short, our challenge was that we

needed the ability to collaborate within the institution and with colleagues at other institutes – we needed to maintain that fluid conversation that involves data, not just the hypotheses and methods.

Solving the problem Having identified the challenges, we knew that our new system needed to be easy to manage. Te team and I needed a storage solution that we could support internally – we need it to be a sustainable solution – something that would be easy to scale when needed. Other stipulations were around having

a copy of all data in a secondary location – for disaster recovery and archive. We also wanted to store data in a secondary tier that was a little bit lower in performance, certainly at a lower cost. So, we wanted a system that was easy for users to access and have everything in a single namespace. Te primary building block of our

infrastructure now is DDN’s GRIDScaler running IBM’s parallel file system, GPFS – a parallel file system appliance that delivers high performance and scalability for high data rate capture and future data growth. We also use a technology that ‘bridges’ our

main storage solution to the secondary tier of object storage – DDN WOS. In a single namespace, we can copy or move files across the tiers – our user community did not need

mitigated with object storage – there are numerous case studies, articles and live environments where object storage systems throw out the existing file system approach. Te hierarchies that become complex as they grow make way for the flat structure of object storage. Te top-line benefits of object storage are the potential for near unlimited scalability, the custom metadata, and the access to other storage solutions through bridging technologies. What stands out for me, though, is the use

of object storage as the secondary storage tier, it gave us lots of features I was not



expecting when we were first thinking about the system we wanted. One thing is having data stored in an immutable fashion, so since we store metadata anyhow, we can add Object Identifiers (OIDs) as additional fields. Our experience of object storage as the

secondary tier makes a lot of sense. Our parallel file system connects into our HPC resource and, once that data is no longer needed on primary storage it can be moved to this secondary tier. Tanks to custom rulesets (such as age of data, size, use), it is automatically moved to the object storage layer – which we can then grow as and when we need it. l

Shailesh Shenoy is director of the office of research computing at the Albert Einstein College of Medicine, New York


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44