This page contains a Flash digital edition of a book.
ANALYSIS


MINING FOR DATA, THE NEW RAW MATERIAL


Alastair Horne reports on discussions about open data and data mining at the recent UKSG One-Day Conference A


lthough open access was the main focus of November’s UKSG One-Day Conference, open data remained an important secondary topic. Adam Tickell, provost


and vice principal at the UK’s University of Birmingham, suggested that, in his role as a university manager, open data was actually a greater concern for him than open access. In the 2012 UK government White Paper on Open Data, cabinet minister Francis Maude suggests that data is ‘the 21st Century’s new raw material’. Tickell noted that the explicit inclusion of data in the UK government’s drive towards greater openness means that universities and research institutions must now address the challenges and opportunities presented.


Making the data behind research outputs openly available raises logistical challenges – how might one easily and effectively make available the terabytes of data being generated at CERN, for instance? However, it also offers considerable possibilities, not least of which are enhanced scientific capacity, and greater


opportunities for exposing research misconduct. Peter Murray-Rust, reader in molecular informatics at the University of Cambridge, and the final speaker in the afternoon session, took a different approach to these challenges and opportunities in his talk. An ardent and vocal campaigner for the benefits of data mining, Murray-Rust began by pointing out that the vast majority of the scientific data that we spend billions of dollars creating is thrown away. Deemed superfluous by publishers, this data is actually of enormous value, not only vital to any attempt at validating or reproducing results, but also capable of re-use elsewhere. Key to extracting full value from data, according to Murray-Rust, is data mining, automated mechanical processing that could, he claimed, extract a hundred million scientific facts from the data, build reusable objects from them, and even create new businesses that might earn the UK alone £500 million annually. Text and data mining, he suggested, quoting John McNaught, could even save lives. The only obstacles to all this added value, Murray-Rust told his audience, are publishers.


Though some had, it seemed, recently experienced a Damascene conversion to the open data cause, many publisher contracts explicitly prohibit mining of their content, and institutions that ignore this directive can find themselves cut off from accessing papers. Rejecting publishers’ concerns that allowing


such mining might result in their servers being overloaded by automated requests, and the resulting data being distributed freely without their consent, Murray-Rust dismissed their attempts to regulate the process through licensing as akin to taxing spectacles. ‘The right


The vast majority of the scientific data that we spend billions of dollars creating is thrown away


to read’, he insisted, ‘is the right to mine’. His Ami project[1]


, for example, would liberate the data held within PDF articles and convert it into usable HTML and CSV files. Discussions with interested parties outside of publishing – including the British Library and Mozilla – are on-going.


Murray-Rust finished his talk by urging libraries not to sign contracts that prohibit data mining. It was left to Gemma Hersh, head of public affairs for the Publishers Association, to offer an alternative perspective in the short question-and-answer session that followed, assuring the audience of publishers’ efforts to find a workable solution to what they felt were genuine problems raised by data-mining, and reminding them that Murray-Rust had recently walked out of talks attempting to resolve those issues. Sadly, that discussion was cut short, but data mining is likely to remain a controversial issue for some time to come.


FURTHER INFORMATION [1] Ami - The chemist’s amanuensis Journal of Cheminformatics


2011, 3:45 doi:10.1186/1758-2946-3-45 www.jcheminf.com/content/3/1/45


6 Research Information FEBRUARY/MARCH 2014 @researchinfo www.researchinformation.info


sweeticons/Shutterstock.com


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28