This page contains a Flash digital edition of a book.
Data


column headings to common vocabularies.’ Such issues can be tackled by developing standard ontologies, for example for expressing depth in metres. This means that future contributors of data can choose from pre-determined lists of terms that correspond with the terms others are using. This, in itself, can be tough though. As Hodson observed: ‘Standardising ontologies is a necessary building block, but it requires agreement and this can be challenging.’


And it goes beyond simply choosing some shared terms. Hedges said that in some cases for FISH.Link there had been existing vocabularies but they were not in appropriate forms for RDF.


‘Pooling knowledge: bringing together data sets about, for example, different parts of a lake could give more insight into the whole’


‘Implicit assumptions was another thing we identified in the project, although we didn’t solve the problem.’


He has observed similar issues in other areas of humanities research. ‘Historical events are often broken down very differently in different countries, for example the way that the First World War is described in historical documents, including when it started, depends on where the documents are from. You can have some ambiguity in conversation but once you are trying to formalise and map data into RDF it’s more of a challenge.’ This issue is more of a challenge in areas like humanities than hard sciences, he added. ‘In the humanities, data is much more difficult to capture and more subtle. Also, researchers don’t tend to think of their stuff as data.’ Nonetheless there were also similar challenges in the FISH.Link and DTC Archive projects. ‘So far we’ve been dealing with legacy data created before this project began so the challenge is finding or creating appropriate vocabularies to map it,’ said Hedges. ‘When people create these things, they do it for themselves, so many things like column headings don’t follow ontologies.’ He explained that column headings might lack units, use different words for the same thing or use the same word to mean something slightly different. ‘One of the challenges was mapping


www.researchinformation.info


For the new project with the FBA, this issue is being tackled with two-day vocabulary workshops. The aim of these is to bring all the interested parties together – freshwater biologists, the people developing the data systems and other potential users of the data such as policy makers – to hammer out vocabulary terms. ‘It is a cyclical, iterative process, with proposed vocabularies fed back to researchers. That’s key for a successful vocabulary,’ explained Hedges. He hopes that the ontologies developed through this project will be useful to freshwater biologists in other parts of the world.


Another challenge comes from the use of RDF triples themselves. ‘We’ve learned that if you extract a whole bunch of RDF triples it may be hard for a typical scientist to know what to do with them,’ observed Hedges. The reason for this is that potentially hundreds of thousands or even millions of triples (tiny pieces of information linked


FEATURE


said. ‘Humanities researchers don’t necessarily come to data with such a clear plan of what to query. They tend to browse data. One of the key things is to provide them with enough information to know what to browse and to stop them getting lost in the data.’


Good management


Meanwhile, there are challenges with the underlying data itself. ‘To convert legacy data is very time consuming. However, it’s not the researchers’ fault; when they created the data it was for their own purposes,’ said Hedges. ‘The linked data approach forces people to think collaboratively about data gathering and this aids discovery across datasets.’ And this is where Hodson’s role comes in.


‘One of the challenges yet to be solved is how to make such projects scalable because the data needs to be better quality.’


Data-quality issues arise from the ways that data has been created and maintained. ‘Many researchers have excellent data skills but many do not. Many researchers will add just enough information for the data to be useful to them rather than making the extra effort of creating a perfectly-annotated table of data,’ observed Hodson. ‘This means that anyone using it is going to be confronted with a significant job of data cleaning.’ And the reasons for looking after data


‘At the simplest level, of course researchers do know that their data is important’


together) could be available to researchers and the users may not have the technical expertise to know how to make use of them. ‘It is very easy to get overwhelmed by triples. One of the things we are looking at is ways to make it easier to query them,’ he said. ‘It obviously makes it less flexible, but it also makes it easier to interact with the data without learning a complicated querying language.’


In helping researchers to interpret these datasets Hedges has noticed some different patterns of behaviour. Scientists tend to come with specific questions and query the data for them but humanities researchers tend to interact with data in a very different way, he


aren’t purely altruistic. After all, as Hodson pointed out, ‘The first person you share your data with is your future self.’ His programme at


JISC focuses on building capacity in universities, developing policies and strategies in universities about data management and providing training


and advice for universities. It is also about improving data management during projects. ‘By and large we hope researchers will have a good awareness of what they need. Also, increasingly publishers are becoming more aware of the importance of looking after data. It is of more use if it is linked to research articles,’ he said. Hodson is also involved in the Dryad data archive, an international initiative that he is very enthusiastic about. ‘Again it’s a question of making data available so that researchers have the raw material that allows the prospect of semantic enrichment and the benefits that brings. Without initiatives like Dryad this would not be possible.’


DEC 2012/JAN 2013 Research Information 11


Worradirek/Shutterstock


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36