This page contains a Flash digital edition of a book.
inside view

Building a European data infrastructure

An EU project aims to tackle Europe’s data deluge. Kimmo Koski, EUDAT project coordinator, explains


ignifi cant investments by the European Commission and European member states have been made in recent years to create a pan-European

e-infrastructure supporting multiple research communities. However, we are facing new challenges resulting from the accelerated proliferation of data – newly available from powerful scientifi c instruments, simulations and digitisation of library resources. This so-called ‘data deluge’ has created a new impetus for increasing efforts and investments in order to tackle the specifi c challenges of data management, and to ensure a coherent approach to research data access and preservation. Traditionally Europe has been putting

effort on collaborative projects for computing, such as HPC and grid initiatives and networks. In addition, promoting collaboration between the various projects and stakeholders in e-infrastructure domain has been high in European agenda. Although there are many important topics where investments continue, today probably the most diffi cult challenges are related to the need for more effi cient management and exploration of data. Understanding data derived from scientifi c experiments, detecting the essentials from huge datasets or ensuring persistent storing of data for centuries are examples of typical requirements. Some estimation is that currently about

300 different research infrastructures – some small and some large such as CERN, ITER or EMBL – exist in Europe. All of them utilise ICT in one form or another. Many utilise computing resources, everyone requires high speed networks and most of the infrastructures need to manage data in some way. Think about a situation in which all these 300 decide to run their own data management systems, independent from (and probably incompatible with) each other. As a result we would probably run out of competent people maintaining the ICT services, not to mention about the wasted work due to the constant reinvention of the wheel. We need to build common services spanning across multiple disciplines and


CSC managing director and EUDAT project coordinator

develop workload division between different parties. European collaboration is required at all levels: scientists creating and utilising the data, ICT people providing services for data management and authorities funding persistent storing of data, just to name a few. EUDAT is a new, horizontal three-year

EU project that aims to build a collaborative pan-European data infrastructure (CDI). Figure 1 illustrates the different layers and roles of the CDI. In optimal cases, users and service providers in different levels will work together developing and maintaining

The Collaborative Data Infrastructure – a framework for the future

Generators Data Users

Community Support Services

Common Data Services

User functionalities, data capture & transfer, virtual

research environments

Data discovery & navigation workfl ow generation, annotation, interoperability

Persistent storage, identifi cation, authenticity, workfl ow execution, mining

Figure 1: The Collaborative Data Infrastructure as presented in EU High Level Expert Group report Riding the Wave, published in 2010

the required services. EUDAT focuses on the lowest layer, common data services, with the ability to support communities on the next level, and established links to the users and data generators on the higher level, in order to understand the requirements for the service development. The main focus of EUDAT will be on

building a common layer of generic cross- disciplinary data services. To this end we have formed a unique consortium that brings together national data centres, technology providers, researchers, and funding agencies that invest in e-infrastructures, representing 25 partners from 13 European countries. There is one principal requirement

for a successful sustainable service: it has to be user driven. To build a sustainable data infrastructure upon which common services can be deployed for use by diverse communities, a comprehensive approach is required, including several activity strands. As a fi rst strand EUDAT is currently investigating user requirements, starting with research communities in linguistics (CLARIN), earth sciences (EPOS), climate sciences (ENES), environmental sciences (LIFEWATCH), and biological and medical sciences (VPH), which have been allocated project resources to help specify their requirements and co-design related services. This investigation will be extended to additional communities in 2012. A second activity strand concerns

the appraisal of technologies and service candidates, which involves identifying, designing and constructing appropriate services, using existing solutions where possible. The third activity strand involves primarily the data centres and deals with the operation of the collaborative infrastructure, particularly the provisioning of secure, reliable (generic) services in a production environment, with interfaces for cross- site and cross-community operation. The operation of the infrastructure should provide full life cycle data management services, ensuring the authenticity, integrity, retention and preservation of data, especially those marked for long-term archiving. Challenges in managing the data will not

become less demanding in the future. The amount of data is already huge and keeps growing fast. Complexity will also increase with requirements for sustainable solutions for storing and utilising the data. EUDAT targets to be one player in this fi eld, trying to provide a working solution for at least a part of this. We will wait to see how successful this is.

Trust Data Curation

Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40