Analysis and news
The art of the possible We need a national infrastructure for research data so that we can apply AI and machine learning, writes Nathan Cunningham
The problem we’ve got is that research
It was in the Arctic that I came to understand the importance of open science and the infrastructure that is needed to support it. I was working at the Polar Data Centre to ensure that scientific polar and cryospheric data was available to various research teams, both in the UK and in the field. It struck me that access to this rich data was limited to the research teams directly involved but didn’t go beyond the biological and environmental research communities.
Breaking down Chinese walls Research funding is still presented in siloed ways and we need to look at how we can break down those barriers. An example of where that is working well is the Big Data Network. Since 2013, the Economic and Social Research Council (ESRC) has invested more than £64m in this project, bringing together data that helps to inform government policy makers. The Big Data Network connects comprehensive data on people, their behaviours, attitudes and motivations at the national level. All scientists require this national research data infrastructure to capture, store, process and access data to enable collaboration across all disciplines. However, we see a lot of money invested
in some superb research assets, but they are not all joined up. Especially research communities, such as humanities, are often not (yet) integrated into a national data infrastructure, even though rich peta- scale datasets have become the norm for most research projects. When I was at the British Antarctic
Research Survey 10 years ago, I already had about six petabytes of data to manage, of which most was remote sensing modelling data. None of that research was fed into a national infrastructure and there’s still a lot of scientific data that is stored outside a national research infrastructure. This is not a criticism; we just need to review where we’re at, and see how we can link up these rich sources of data.
@researchinfo |
www.researchinformation.info
communities are granted big pots of money to support their area of research. The UK has some fantastic research facilities such as the MET office and the Square Kilometre Array, but investments in these fabulous projects does not necessarily create a national infrastructure. What I take away as a learning from the £64m Big Data Network is that a lot of the assets are standing on their own.
“Now is the time to create a greater understanding of this national federated asset piece”
This is where we’re missing a piece. We’re trying to stand up a lot of information, but it needs a lot of computational support and we’re getting to a point that we can no longer move the data around so easily. If you look at the government’s industrial
strategy or the grand research challenges of ‘healthy ageing’ or ‘sustainable food’, we need to bring in a lot of requirements to overlay or wrap data, so that we can connect various data sources. Now is the time that we need to create
a greater understanding of this national federated asset piece. My vision is that for every bit of money from UKRI or any of the research councils, a small proportion will be allocated to the creation of a national infrastructure for research data. Currently, research projects need to
present a research data management plan and a place of deposit. But this practice is fairly lax, and a lot of research communities are still without robust data management and storage plans. As a person that has worked within
Russell group institutions, I’ve supported a lot of effort just capturing research data. I estimate that around 70 per cent of the
research still isn’t part of a larger data infrastructure. I propose that we look at the UKRI
investment funds and say that all research communities now have the same requirements as the large ones, such as CERN and the MET office, 10 years ago. Large computational needs are now the norm, because any projects that are tracking the industrial strategy or the grand challenges agenda, all these projects work in an interdisciplinary and multi-institutional way. We want to capture that base line
to support research. We can’t ignore the dearth of that interlocking piece of engineering that is not funded, and that is harder and harder to do due to the increasing complexity of datasets. If every research grant is given out by
research councils, stipulating that there will be a place to deposit research data that is already paid for, where people can do research computing at a base level, Jisc would offer that as a service like you would normally enter into when you work on a specific project. This base-level-connectedness
will promote interproject working and interdisciplinary working. It will create a knowledge base where we can apply all new technologies, from AI to machine learning. We need to leverage this baseline of
investment, and very much need to bring in linkages between different research communities, so that we can couple oceans and atmosphere research, but also bring medical research to other communities by extending the trusted environment they’re currently working in. It is the art of the possible. I believe this this piece of infrastructure will allow us to lay down a more coherent capability across the academic landscape. I genuinely think that it will allow us to continue to compete on the global academic stage. RI
Nathan Cunningham is head of research computing at the Norwich Bioscience Institutes partnerships
August/September 2021 Research Information 23
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36