RI August 2018

Analysis and news

Messy data: managing it really matters Jude Towers and David Ellis, from Lancaster University, discuss practices in 24/7 data generation and what this means for research

When asked, most people could have a good go at explaining ‘data’. Ask a researcher and the answer (nearly) always starts with ‘well, it’s complicated’. In arts, humanities and social sciences we have a particular challenge when we ask ‘what are, or what counts as data?’ because of the persistent assumption that data are quantitative, rather than data being (any) information that is used to progress research, whether that be by survey, scraped from the web, and interviews, or simply a way of conceptualising existing knowledge. But, whether quantitative or qualitative,

‘data science’ or ‘social science’, cross- disciplinary conversations are desperately needed. We need to: talk about the collection, use and management of data; how to systematically explicate and take account of the strengths and limitations of data; and to develop strategies to ensure that the research underpinned by data is ethically and intellectually robust. The siloing of different forms of data within specific academic disciplines is increasingly problematic – especially as data and its use (or abuse) becomes increasingly central to our social lives and to societal evolution. One only has to think about the potential impact of the Cambridge Analytica scandal on democracy, or the radical changes to the concept of ‘privacy’ being instigated by the Internet of Things.

Messy data Some research data are relatively less messy – that generated by clinical trials or experimental physics, for example – while other forms are extremely complex. Any measurement or data source contains some element of ‘noise’, but this is often well understood and acknowledged in advance. However, data originally collected for non-research purposes and then retrospectively used brings a completely new set of problems. Data from our everyday lives and digital existence are regularly collated, anonymised and shared with academics

22 Research Information August/September 2018

“We need such scrutiny to be systematically embedded within everyday research practices”

for research purposes – and sometimes with others, for other purposes. These data were historically collected or logged with another purpose altogether. This includes administrative health data, police recorded crime, social media profiles or geolocation tagging. So, when these data are then used for research, the research process becomes (even more) difficult. Research conducted using these data is vital for helping inform evidence-based public policy and practice, but to do so robustly and transparently, we must think explicitly about how these data are collected and made available for research. What are the implications of these practices and protocols on research practices and research findings? Without this scrutiny, the research is null and void. We need such scrutiny to be

systematically embedded within everyday research practices, in a way which breaks down the silos and enables researchers across disciplines to build data practices

that are accessible, interoperable and reusable (FAIR). Without these criteria, research projects such as Imperial College London’s use of data from a network of 10,000 phones, to speed up cancer research while we sleep, risk remaining science fiction, rather than being current science.

Why now? Researchers, students and the wider public are increasingly asking questions about the value of research and the ‘ethics’ of the hitherto unimaginable scale of the collection and use of (our) data. There is a growing movement among

researchers (and policy makers) that, at least, publically funded research must be shared as widely as possible: it is not enough for individual researchers or individual university departments to ‘do the right thing’ or ‘do things right’; we need a collective and consistent approach. This is partly what a Jisc initiative sponsoring a group of Research Data Champions in UK universities has been supporting. As data champions, we are working together to find innovative and effective ways to develop robust data usage and management practices and protocols, to share good practice, and to maximise the positive impacts of research for progressive and positive social change. At Lancaster University, for example,

psychology staff have established an informal support group (PROSPR) which aims to promote open science practices within the department and beyond. Similar movements to maximise the impacts of research data require more than just researchers to be developing a shared understanding and common language. In universities, our research data managers are a vital part of this process too. The Research Data Shared Service, from Jisc, has been piloting approaches to help support the cataloguing of data that is being produced in universities and is exploring ways in which these resources can best be shared, to enable greater value from both data and research. Whether data is messy or not, robust

research calls for FAIR data, and we all have our part to play.

@researchinfo | www.researchinformation.info

Sakkmesterke/Shutterstock.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40

orderForm.title