SCW_AUGSEP11

INTERVIEW Welcome to the

The computer has revolutionised statistics, says David Hand, professor of statistics at Imperial College London

I

f we go back to the dawn of the electronic computing age and look at statistics at that time, we see it was constrained by what could be done by hand. Statistics in those days was, dare

I say, a rather tedious subject because you needed arithmetic faculty to be able to add up the numbers, and do so correctly. All of that’s been swept away by the computer. Computers have shifted things to a higher level so that we can now concentrate on the deeper meanings of what we’re doing, and if we want to fi t our models to the data or extract some signals from it we can do it at the press of a button. I have always liked to describe the pre-computer

era of statistics in terms of the image of a Victorian clerk sitting scribbling away at a column of fi gures with a quill pen. That’s gone now, and it has been replaced by gleaming software tools that are analogous to the telescope and microscope – instruments which enable us to see objects that are either so distant or so small they aren’t visible to the naked eye. Modern statistics and modern statistical software allow us to see things that are so complex they remain invisible otherwise. They enable us to squeeze illumination from the numbers. Computers have revolutionised the fi eld in a

number of ways. On one level, we can do things that would have taken a considerable amount of time to do by hand, such as inverting big matrices and, as computers have progressed, we have got to the stage where we couldn’t even think of doing things by hand. We are now also able to consider things that would have been inconceivable before; some modern bayesian models and tools fall into this class, as it wouldn’t have crossed anyone’s mind that we could do it. Other things such as interactive and dynamic graphics are broadening the range of statistics now and opening up the fi eld to non-statisticians and, indeed, non-numerate people because while they might not have understood the maths, they can certainly understand the modern graphical displays. By removing the tedium, computers have lifted statistics to a higher conceptual plane – for example, I’m able to explain a structure in data in terms of underlying latent variables, and do so without writing down the equations or doing the arithmetic. Adapting to this has been interesting. Broadly

speaking, statisticians have been delighted by this progression because it has enabled us to do so much more. However, there have been reservations, and

22 Statistics special revolution

certainly in the early days of big software packages like SPSS and SAS there was concern that too much power was being put in the hands of users who didn’t truly understand what they were doing. Of course, there was also the worry that statisticians would no longer be needed. That’s proven not to be the case because as the conceptual level has risen, statisticians have been needed even more. People using these software packages would do an analysis and ask for all the descriptive statistics relating to a particular thing they were doing, which meant they would get reams and reams of output. They would then take that output to their consultant statistician and ask where the number was that they needed. In reality, I don’t believe any of these anxieties ever truly materialised. Statistics has advanced and gained recognition for the tremendous power it has to extract meaning from data in order to understand what’s happening in the world. The data itself is also changing as we are faced

with increasingly larger datasets. This poses new challenges for statisticians. One of my favourite examples is to produce a scatter plot of two variables where one is plotted against the other with 100 data points, and you can clearly see where there is a relationship. When I take that up to 1,000 data points you can see that relationship more clearly, but when I go up to 10 million data points we are left with a black rectangle. That’s a trivial example, but it does make us think about how we can make sense of all that data. Above all, however, these large datasets are providing us with opportunities for discovery. Statistics is advancing on so many fronts. For the

past 50 years it has been driven by the development of computers, but the other broad driver has been new application areas. Once tools have been developed for one area, they tend to pervade others. Factor analysis, for instance, began in psychology but is now applied in geostatistics and astrostatistics. Astrostatistics, in particular, will be a very exciting area over the next 20 years, infl uencing the development of statistics itself. Databases with hundreds of millions of objects are fairly commonplace now, and when dealing with things on that scale you have to use a statistical technique to understand how the universe evolved and its structure. Statistics is used quite heavily in this area, and if you look back to the origins of the fi eld you fi nd Gauss using the least-squares method to fi t models to astronomical data. So, in a sense, the dawn of statistics came out of astronomy. Things have come full circle but are continuing to move forward and evolve. You really have to keep running to stay where you are – but it’s a very exciting fi eld to be in.

Interview by Beth Sharp

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48