“We are developing datasets and analysis tools for integrating these diverse data types, so they can be used for drug discovery. But we have no ambition to become involved in this ourselves”

discovery’, says Cutts. ‘But we have no ambition to become involved in this ourselves.’ BenevolentAI, headquartered in London and New York, is one of a new type of bioinformatics company that is using machine learning to look for patterns in such diverse data and to discover or re- purpose ‘the right drug for the right patient at the right time’. The unique reputation of the Sanger

The growth in genomics data Some years ago, scientists would routinely download Sanger datasets, but this is not always possible now because of their sheer size. Tim Cutts, head of scientific computing at the Institute, explains: ‘In an extreme scenario, it would take a user a year and cost a fortune to download all our data, even over the fastest commodity networks in the world’. This is clearly impossible, so the default now is for researchers to login to a server and analyse the data onsite or in the cloud. Another approach is to perform an initial analysis as soon as data is generated, save that and throw the raw stuff away. ‘Data analysis on the fly is commonly used in some disciplines, including | @scwmagazine

particle physics and crystallography, but it is only recently becoming common in bioinformatics, which is a younger science’, adds Cutts. Cutts expects that the sequencing

capacity of the Sanger Institute will grow further over the next few years, putting further demands on its data storage and analysis capacity. However, the most significant growth is likely to be in another of big data’s Vs: variety. Increasingly, genomic data will be integrated with physiological and pathological data from individuals’ health records, accelerating the growth of personalised medicine. ‘We are developing datasets and analysis tools for integrating these diverse data types, so they can be used for drug

Institute and, by association, the University of Cambridge in genetics and genomics, is matched at the ‘other place’ by a similarly high one in other data-rich biomedical sciences: epidemiology and population health.

The pioneering British Doctors’ Smoking Study began in Oxford in 1951 when academics Richard Doll and Austin Bradford Hill, both later knighted, sent a survey of smoking habits to almost 60,000 registered British doctors. Over two-thirds of the doctors returned questionnaires, and the data gathered was of sufficient statistical power to demonstrate smokers’ increased risk of death from lung cancer and from heart and lung disease within five and seven years respectively. Over 60 years later, research in these

disciplines has been integrated with the g October/November 2018 Scientific Computing World 19


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36