SCW_AUGSEP11

INTERVIEW Statisticallyspeaking

Statistical models play a fundamental role in scientifi c discovery, as Professor Mark Girolami, chair of statistics at University College London, explains

complexities of living systems from the perspective of their specialist fi eld. There are no general underlying theories or principles regarding those complexities as yet, and so we rely predominantly on observations of data; and that’s where I come in. I try to make sense of the data scientists produce by developing mathematical and statistical models, to put it all in a sound, statistical evidential footing. As a statistician, I initially found that a lot of the

T

problems I dealt with were very interesting from a statistical perspective, irrespective of the underlying question. Since working more closely with life scientists, and cancer biologists in particular, the underlying science has become even more interesting and in many ways it guides my own research. An example is that I’m currently working with cancer biologists who are trying to understand how tumours develop. They know that tumours begin because the control systems for cell division and growth somehow break, and so they’re trying to study the mechanisms of these control systems that ensure cells proliferate and differentiate in the correct way. If biologists know how cells are controlled normally, it gives them an opportunity to understand what happens when things go wrong, and then consider targeted therapies, and so on. My colleagues in biology may have data

regarding how certain proteins and genes will interact or become active, and I then transfer that into a mathematical model that offers a formal description of the working hypothesis. Or there can be a series of models that provide plausible ways in which the cell is controlled. Given certain experiments that the biologists undertake we can, from a statistics perspective, start to assess the evidence that supports each of these models. This allows us to advise the biologists of which of the working hypotheses being considered is best supported by their experimental data. That then allows them to focus in on specifi c aspects of the systems they’re studying rather than taking a guess or using a gut feeling. Statistics provides a rational and systematic way of exploring the complexity of systems – in this case,

20 Statistics special

he research I’m involved with is highly multi-disciplinary in the sense that I work with biologists, life scientists and medical scientists who are trying to understand the

cellular. These models are not trivial. The data is not absolutely perfect and so, from a statistician’s perspective, we face enormous challenges that motivate us to develop novel, cutting-edge statistical methods. This makes it a really exciting area to work in because it not only presents mathematical and statistical challenges to work on, but it also has a direct impact on science. It’s not as though it’s a very elegant but perhaps not terribly important mathematical or statistical problem; this is enabling fundamental science that is trying to understand the genes of some of these diseases. It really gives a level of importance and signifi cance to the work I do. Of course, if we didn’t have the software and

hardware at our disposal we couldn’t do the job; it’s as simple as that. The work we do relies heavily on computer simulation, massive computing power and highly-effi cient algorithms, and a lot of our time is spent in the development and implementation of these algorithms. It’s very challenging because we fi rst have to come up with the mathematical statistics required to solve a particular problem. We then have to ensure the correctness and validity, and consider how this becomes a statistical method that could actually be transformed into an algorithm. We then address the issue of how to design the software to make it work in an effi cient way. At every step we’re always asking questions and

searching for errors – we don’t have the solutions at the back of a text book, which makes it challenging from that perspective alone. Now that we’re going into the age of Exabyte computing, and certainly for my area of work which could be described as computational statistician, we have to be aware of what’s coming. Some of my time is spent talking with various computer scientists about the development of algorithmics and hardware, for example GPUs, and what goes beyond this. It’s an important part of what I need to keep up with in order to do my job effi ciently. Deciding upon resources is very much problem-

led. Because of their size, our simulations can be very time consuming and we need the results, and possibly multiple results, as quickly as possible. In terms of computing platforms, this typically means using parallel, large-scale clusters. For us, the big revolution came when a general methodology called Markov chain Monte Carlo emerged, which allowed us to really think about a type of inference called bayesian statistics in a new way that would have never been possible before. There’s no question that computing has dramatically changed the fi eld of statistics.

Interview by Beth Sharp

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48