Contributed content

HPC Yearbook 19/20

High-Performance Computing 2019-20


powering the most exciting research on Earth

Harry Richardson highlights the value of Ceph in scientific applications, in this guest article from SoftIron


s researchers seek scalable, high performance methods for storing data, Ceph is a powerful technology that needs to be at the top of their

list. Ceph is an open source soſtware-defined storage platform. While it’s not oſten in the spotlight, it’s working hard behind the scenes, playing a crucial role in enabling ambitious, world-renowned projects such as CERN’s particle physics research, Immunity Bio’s cancer research, Te Human Brain Project, MeerKat radio telescope, and more. Tese ventures are propelling the collective understanding of our planet and the human race beyond imaginable realms, and the outcomes will forever change how we perceive our existence and potential. It’s high-time Ceph receives the praise it deserves for powering some of the most exciting research projects on Earth. Ceph is flexible, inexpensive, fault tolerant,

hardware neutral, and infinitely scalable, which makes it an excellent choice for research institutions of any size. ‘Ceph has the capability to support research at any level,’ says Phil Straw, CEO at SoſtIron. ‘Many research organisations have unique, complex storage requirements and don’t want to be locked in to a particular hardware vendor. Ceph is a great fit for them.’ Ceph’s benefits for researchers include: Support for multiple storage types:

including object, block, and file systems − regardless of the type of research being conducted, the resulting files, blocks and/or


objects can all live in harmony in Ceph. Hybrid cloud-ready: Ceph natively supports

hybrid cloud environments, which makes it easy for remote researchers – who might be located anywhere in the world – to upload their data in different storage formats. Hardware-neutral: Ceph doesn’t require

highly performant hardware, which lowers equipment costs and eliminates vendor lock-in. Resilient: there’s no need to buy redundant

hardware in case a component fails, because Ceph’s self-healing functionality quickly replicates the failed node, ensuring data redundancy and higher availability. In this article, we’ll examine how four

organisations with vastly different research projects and unique data storage requirements are using Ceph.


Scientists from around the globe use CERN’s particle accelerators to explore questions such as ‘What is the nature of our universe?’. CERN’s super-sized data centre executes more than 500,000 physics jobs daily1 and its current

storage requirements are estimated to be 70 petabytes per year2

. CERN selected Ceph

because of its ability to build block storage for OpenStack, and the fact that remote servers can easily be added with no downtime3

. Immunity Bio

Genomics research requires the manipulation of massive amounts of data. Immunity Bio, a leader in molecular testing and personalised cancer treatments, processes enormous amounts of data, including one terabyte per genetic test, so it’s important that storage should not become a bottleneck. It takes one month to process raw data on an 800-core cluster, and the workload can vary from 2.5 million small random files to a handful of giant, sequential files. To make its storage requirements even more complex, Immunity Bio’s data is ‘infinitely useful’ meaning it will be stored forever for future research or reprocessing. Immunity Bio chose Ceph as it is very good

at processing and storing large amounts of data cost-effectively. Te fact Ceph supports unified storage of object, block and file types, and

Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32