SCW_FEBMAR13

HPC in pharma Research

Dr Gianni De Fabritiis, group leader at the Computational Biophysics Laboratory of the Research Programme on Biomedical Informatics (GRIB) within the Barcelona Biomedical Research Park

T

he ability we have to run complex molecular simulations is largely due to GPUs, as before them computers were

far too slow to be of use. Te work we are currently conducting into the maturation of the AIDS virus is exactly the same as we would run on a supercomputer, but by using a distributed computing platform, GPUGRID. net, we are able to achieve around 50 or 60 microseconds of aggregate data per day for a system of roughly 100,000 atoms. Te level of detail we are getting is unprecedented and it provides us with a new perspective as we are able to do ligand binding in terms of kinetics. Te program we use for molecular simulation

is designed to run on GPUs. Called ACEMD, Acellera developed it in 2008 and I would say that it’s the fastest of its type. We started to create soſtware in-house and created the distributed system because we needed more power, we then developed a cloud system and have gone so far as to design and manufacture GPU cases. Te advantages of this approach are that we have full control over the solution and can guarantee that any patch will work in future generations. Te disadvantage is that we have had to undertake all the work ourselves, which takes time away from running simulations.

Molecular dynamics’

impact has so far been limited because there is no systematic use of simulation within the industry. Te tools have been around for a while but only now with GPUs can they be used with some level of interest. If simulation is adopted on a wider scale, its impact will be considerable given that it will practically replace part of the pipeline. Tere are problems with this, however. We have 200TB of storage and are producing 30TB

THERE IS NO

SYSTEMATIC USE OF SIMULATION WITHIN THE PHARMACEUTICAL INDUSTRY

per year of data and we think that in five years, with the current rate of growth, we will be close to 1PB per year of data. Even handling our analyses is incredibly difficult. Our best solution is to get the data out of the server, use standard scripting to filter it to reduce it to smaller amounts of around 700GB and then analyse that. Data is arguably the biggest challenge – one we have yet to solve.

Storage Jeff Denworth, VP of marketing at DataDirect Networks

With the advent of high-throughput sequencing, there is now a tight alignment between analytics and HPC. Te generation of huge amounts of genomic data means that pharmaceutical companies are evolving beyond their traditional file storage capabilities and are looking to scale-out NAS to accelerate

THE INDUSTRY IS MOVING TO PARALLEL FILE SYSTEMS

their drug discovery pipeline. Storage costs are a necessary part of the strategy for these companies, but the challenge is the scaling up of capacity to support growing sequence analysis farms. Everyone wants to deploy HPC- style technologies because once they get on that scalability curve, they can make business decisions without needing to make possibly

www.scientific-computing.com

disruptive IT decisions. Ultimately, the industry is moving away from a standard infrastructure and towards parallel file systems. From an IT perspective, there is a reconciliation that has to happen in order for them to learn how to use these tools and how to scale them up effectively. In terms of where investments are being

deployed, it’s no longer standard IT initiatives – budget cuts are forcing organisations to make smarter decisions on optimising and consolidating their infrastructures. New technologies, such as Apache’s Hadoop soſtware stack, are really being accepted by industries, sooner rather than later, and one of those is pharma where the combination of parallel computing and enterprise business intelligence is coming together for Big Data analytics. Interviews by Beth Harlen

Research

Jerome Baudry, assistant Professor at University of Tennessee, group leader at UT/ORNL Center for Molecular Biophysics

HPC has allowed us to scale up the complexity of the problems we can tackle, and in the world of drug discovery we are now able to calculate not only the behaviour of the drug candidate in a test tube, but how it behaves in a cell. While statistical approaches exist to predict the toxicity of drug candidates, we can now systematically reproduce what happens in the cell of the patient on the computer. In addition to aiding drug discovery, this reproduction facilitates the repurposing of drugs and enables us to find new applications for existing compounds. In a few years, when we have reached the next level of computing, we will be able to reproduce what happens not only within a cell but in the entire body of a patient, and within a group of patients. Tis computational power will be used to calculate what will happen, rather than simply extrapolating possibilities from what we observe in a test tube. Computing beyond what would actually happen in a test tube is a revolution in the way we approach drug discovery. Te challenge is that in order to make

efficient use of these resources, each discipline has to embrace the cultures of the others and essentially translate the foreign languages of science. Being able to explain the biology to a computer scientist or the computational difficulties to a biologist, for example, is fundamental if we are to catalyse the use of HPC. I’m a computational biophysicist, not a computational scientist, and I came to HPC because it was clear that it would allow us to make a quantum leap in the kind of scientific questions we are able to answer. I somewhat naively thought that when moving our work to supercomputers we would simply upload the program and run it, but of course that’s not what happened. It’s very important that the code is optimised to run on these incredible machines and to my surprise I found myself spending the first few years of my research optimising the program with computer scientists and mathematicians who could translate the equations into efficient computer programs. Te fundamental science is here, the hardware is here, but we had to do a lot of work on the soſtware to parallelise it efficiently.

FEBRUARY/MARCH 2013 29

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40