This page contains a Flash digital edition of a book.
inside view


Drowning in data: a European perspective


Our ability to digitise the world around us now readily outstrips our ability to process the data we generate.


Rob Baxter reports F


rom seismometers that measure every sigh and tremble of the Earth to high-definition scans of mediaeval manuscripts, researchers are applying


digital techniques to create data pools of unprecedented depth. Our potential for understanding our world and ourselves has never been greater, thanks to digital data. But this digital bonanza raises challenges –


not least that, far from spreading and joining to form an ocean, our pools of data are instead growing deeper. Researchers and digital engineers are becoming so busy shoring up the ‘wells’ of data, ensuring they don’t collapse under their own weight, that we run the risk of losing a broader perspective. Like the scientist who over a lifetime of specialisation learns more and more about less and less, we risk ending up knowing everything about nothing. Hyperbole, perhaps, but the challenges of


managing exponential data volumes while maintaining our abilities to cross-reference subjects from different domains are real enough. Fortunately, researchers are on the case. In Europe, the last decade has seen great efforts to collect, catalogue, curate and preserve our new digital heritage. European research policy has emphasised the importance of ‘research infrastructures’ – trans-national digital laboratories to underpin this new era of data-driven discovery – and initiatives like the European Strategy Forum on Research Infrastructures (ESFRI) have resulted in significant and successful endeavours to shore up the wells across a range of disciplines. Around 10 years ago, big data was the


preserve of high-energy physics and astronomy. It is testament to our digital ingenuity that the problems of managing and sharing vast data sets can be found in any and every modern field of research. Tat the study of the Earth’s climate


50 SCIENTIFIC COMPUTING WORLD


is one of these is probably no surprise. Te European Network for Earth System modelling (ENES) is a European research infrastructure that brings together around 20 climate research and modelling centres to better understand the climate and our impact upon it. ENES has created a standardised environment for the preservation and exchange of tens of petabytes of simulation and satellite observation data. What ENES attempts for the sky, the


European Plate Observing System (EPOS) aims to do for the ground beneath our feet. EPOS is integrating the activities of a large number of Earth research infrastructures across Europe and has particular challenges in the assimilation of data from the ever-increasing network of high-capacity sensors – so-called broadband seismometers – across geologically active parts of the continent. International data standards help, of course, but the problem is still one of


the above projects have one thing in common: they are keystones in the European Data project, EUDAT. EUDAT is the largest, most significant ‘horizontal’ infrastructure project in the ESFRI roadmap, an infrastructure of infrastructures that aims to bring together the discipline-specific activities of initiatives like ENES, EPOS and CLARIN and find common ground among them. EUDAT’s first goal is to tap into the physical infrastructure of some of Europe’s leading computing and data centres to create a digital preservation network of connected disk and tape to counteract the increasing risk of losing something important. With that done, how many of the soſtware


services offered by different disciplines are actually common, and could be provided in a generic way? Tis is EUDAT’s goal – and the benefits aren’t just economic, of course. If we can standardise the underlying infrastructure


EUROPE IS NOT AN ISLAND. ALL THESE ACTIVITIES, WHETHER GENERIC OR DISCIPLINE-SPECIFIC, ARE INTERNATIONAL


assimilating and managing tens of millions of individual data files in a dynamic environment. Perhaps the most interesting impact of the


digital research revolution is in the humanities. Te digitisation of speech, the scanning and digital interpretation of texts and manuscripts over the last decade has created a wealth of data and research opportunities. CLARIN is the European Common LAnguage Resources and Technology Infrastructure (the ‘T’ is silent), an organisation spanning nine countries that preserves and provides access to digital language data collections. Twenty years ago, there could have been no CLARIN; now there are more than 30 centres together managing petabytes of rare spoken language data. One of CLARIN’s biggest challenges is in the assimilation and cross-referencing not of data but of metadata, ‘the data about the data’. Language is one of the richest, most diverse dimensions of human culture and capturing and describing it in ways that can be harmonised, correlated and reasoned about is no easy matter. Tough they cover three different domains,


and services, sharing the content becomes so much easier. Te worldwide web tells us that. And Europe is not an island. All these


activities, whether generic or discipline-specific, are international. Science is today a global endeavour, science in its broadest possible sense of ‘systematised knowledge’, and the challenges of digital science are too great to be tackled in corners. Te Research Data Alliance, a new coordination body for global research data, hopes to emulate the very best of the Internet Engineering Task Force, doing for the nitty-gritty of global research data sharing what the IETF has done for the internet: ‘wouldn’t it be better if this just worked the same way here, here and here?’ Te RDA wants to bring together data practitioners from across the spectrum to sit down with problems like this, and solve them, one small step at a time, steadily reducing the barriers between our wells of data.


Rob Baxter is software development group manager at EPCC, University of Edinburgh.


@scwmagazine l www.scientific-computing.com


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52