Analysis and news
Bringing method to madness Data is everywhere – but where does it come from, where is it being stored, and is it worth keeping? It is crucial to take stock, says Paul Stokes
The threats of obsolescence or
The UK Government has recently published a sweeping new National Data Strategy, which revealed plans for an unprecedented audit and digitisation of public datasets. The £120m project will include the digitalisation and assessment of government documents and datasets from the NHS, police, fire and rescue and education. Educational institutions such as
universities and colleges need to preserve large amounts of data. Think about digital research outputs such as research articles, research data and PhD projects as well as special collections, archives and electronic records management. On top of that, institutions have statutory obligations to keep a multitude of records like financial, tax, staff, students and governance data. A first step in managing these datasets is to get a handle on the extent of the data at hand. Auditing data sounds cumbersome but has become an essential tool to assess whether an organisation’s data is fit for purpose. The government is keen to get on top of the ever-growing problem of what to scan, store or shred, as a great many organisations are generating more data than they dispose of, creating a gargantuan virtual landfill. Digital preservation is a significant challenge, needing continuing assessment as both technologies and usage change. Clever tools, such as Jisc’s Preservation, automatically reformat files so they are readable with new and as-yet unbuilt software. Once in the Preservation system, the
files are automatically ‘recognised’ and then processed according to pre-set rules into an appropriate format that is as future-proof as possible. These sort of tools help preserve the
types of documents that form the building blocks of our history. Birth, death and marriage records used to be kept in paper form, giving insights into what life was like many years ago. These records are relatively easily preserved if they are well kept, and the adoption to an accessible digital format is relatively straightforward.
42 Research Information October/November 2020
loss are amplified where the technical challenges are high and when it’s not clear who’s responsible for preserving the data, for instance when there’s multiple stakeholders involved or platforms change.
Think about old-fashioned Telex messages that often used to be the first source for breaking news. These Telex messages are on the Digital Preservation Coalition’s ‘Bit List’ of Digitally Endangered Species. This list highlights digital materials that are most at risk of extinction, as well as those
“Perhaps this is a time to reassess the value of our possessions, be they physical or virtual”
that are relatively safe thanks to digital preservation.
Profoundly human Over the past decades, data collecting has evolved in an organic fashion. Data is now preserved in a myriad of ways, as the amount of information that needs to be stored has increased and preservation systems have changed. And the idea ‘if it doesn’t exist in three places it doesn’t exist’ simply causes a tripling of the problem. People are (and will probably continue
to be) one of the biggest problems when it comes to organising and managing data. What is a logically ordered dataset for one may be totally incomprehensible to others. A data audit allows institutions and individuals to address the following questions.
• What data do you have, and what do you generate?
• Where is the data stored? Is it in the
most appropriate place? Is it known about by the people who should know about it? Quite often all sorts of hidden, unregulated data comes to light.
• Who generated or is generating the data? Who has and is using the data? And who could or should be using the data? It is not uncommon for information that could be widely used across a whole institution to die in isolated silos.
•
What did the data cost to produce? What does it cost to keep? What would it cost if it were lost? Is it worth keeping in limited storage? Understanding the value of the data generated and stored can help in managing budgets and form a source of income - especially when it is processed, aggregate and enhanced.
•
Current data regulations, especially those pertaining to GDPR and personally identifiable information relating to living individuals, mean what used to be adequate and appropriate systems and services for storing data are no longer fit for purpose. The risks associated with data loss or exposure are often misunderstood at best, ignored at worst and, unfortunately, they’re ever increasing as data and infrastructures become more connected.
• How vulnerable is the data and how can it be protected? Formats change and become obsolete. Digital data deteriorates. An audit can help establish where your data sits on the Data Curation Maturity scale and, more importantly, it allows you to formulate a roadmap to progress up the scale.
In the words of 19th-century textile designer, poet and socialist activist William Morris’ words: ‘Have nothing in your houses that you do not know to be useful or believe to be beautiful.’ Perhaps this is a time to reassess the
value of our possessions, be they physical or virtual. Ri
Paul Stokes is senior co-design manager at Jisc @researchinfo |
www.researchinformation.info
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36 |
Page 37 |
Page 38 |
Page 39 |
Page 40 |
Page 41 |
Page 42 |
Page 43 |
Page 44 |
Page 45 |
Page 46 |
Page 47 |
Page 48 |
Page 49 |
Page 50 |
Page 51 |
Page 52 |
Page 53 |
Page 54