RI_APRMAY14

Data sharing FEATURE

any place or at any time with millions of web pages on seemingly any topic; more results than any researcher could hope to read in a lifetime. The quality of a library’s holdings may be higher than those of Google, but they are destined to remain under-used unless a library’s holdings are as accessible as Google’s. Whilst the challenge from the web was once met by putting library catalogues on the web and providing electronic access to a library’s holdings, the increasingly recognised need is to integrate a library’s data with the web – to make it available where the user is, rather than expecting the user to come to a library’s homepage.

Technological hurdles The problem, however, is that the idea of a web of data is accompanied by a raft of technical specifications and standards that can be daunting for those from a less technical discipline. The acronyms and neologisms – RDF triples, RDF/ XML, RDFa, RDFs, OWL, microdata – are seemingly without end, while the multitude of competing ontologies and vocabularies that are available for providing structure to the content can provide another layer of confusion and indecision.

How many of the underlying technical aspects the typical librarian needs to engage with is a matter of debate. Some (including the author of this article) have argued that the changing environment requires the librarian to become more technically proficient, and that they now need an extensive understanding of RDF and other semantic technologies, and may even consider acquiring programming skills as they start to facilitate access to this web of data. Others however, even among the leading advocates for linked library data, see far less of a need for librarians to change drastically from what they have already been doing so well.

A less daunting perspective OCLC is one of the organisations at the forefront of the sharing of library data, and does not necessarily see a future of linked library data necessitating librarians with more in-depth technical knowledge. Richard Wallis and Ted Fons are involved in the data-sharing strategy at OCLC and are contributing to its forthcoming whitepaper on data sharing in libraries. Rather than predicting a sudden change in working practices, they expect a more gradual process with technology and aggregation dealing with many of the complexities. For Wallis, part of the reason for the gradual process is the difference in the way the library world and the web world view information. ‘The

www.researchinformation.info @researchinfo

ways that we model and the vocabularies we use are very library specific, and are created around a record that will contain everything to do with a book in one chunk of data. This is compared with the web world where they tend to have one source for information about different entities that is linked to,’ he explained.

‘The problem the library world is grappling with at the moment is how do we do this with library data, and we are starting to develop standards, such as the Library of Congress BibFrame for library-focused vocabularies. But that’s still very library specific. OCLC is also working with open vocabularies that are used widely across the web, such as Schema.org, which is backed by the major search engines. ‘The general library community has significant investment in business as usual, and the MARC record is not going away very quickly, it will evolve away over time. Linked data is probably an add-on process over time and the library systems will start to evolve without the librarians having to understand all the technicalities. OCLC is looking at how the

How many of the underlying technical aspects the typical librarian needs to engage with is a matter of debate

systems we provide to our members can take advantage of the technologies without having to worry about the technologies.’

Fons reiterated the idea that the librarians

don’t need to get bogged down in the technologies. ‘The librarian doesn’t have to worry about RDF and the technologies, as these are handled at the network level. We need librarians to contribute their work, make sure they are sharing, and the systems can do the rest.’

‘The library community is good on making adjustments to Schema.org, but there’s

still

a final gap in true global commitment to aggregation of data and making that data available globally with a high level of accuracy. What we have today is really good collections of

library

where we have a fully recognised place to go”. We want to really impress the world with size and scale and linking activity into and out of the existing metadata store. That’s the next step, to bring together the progress in existing metadata sharing, that’s where we’ll see benefits due to the power of aggregation and collaborative management of data.’ So if there is no need for librarians to get bogged down in the details of the technologies, what should they be doing? According to Wallis, we are at an ‘evolutionary stage’ and contributions that the librarians can make will change over time. ‘Make sure your systems are capable of sharing their information in the global manner. If it’s interesting they can join SchemaBibEx, a group to discuss and prepare bibliographic extensions to Schema.org schemas, but we’re not expecting everyone to join. What librarians need to do is keep up to speed, be aware of the systems, and be aware that an increasing number of users don’t come directly.’ Wallis gave the example of the National Library of France; which put up a catalogue that was accessible to search engines. The library found that over 80 per cent of the hits then came directly from the search engines; these were users who probably didn’t even know the web address of the library catalogue.

Conclusion For Wallis and Fons, data sharing in libraries does not require every librarian to embrace every angled bracket of an RDF/XML file. The traditional record will gradually evolve to a more web-friendly format. If the library community are willing to commit to sharing high-quality data the value of this data can undoubtedly have a significant impact on the web and the information people access.

The seeming inevitability of linked library data, irrespective of the decision of any individual librarian, should not be an excuse to opt out, however, but an argument for librarians becoming as involved as possible. If librarians don’t

try to direct the technology, inevitably come to direct them.

metadata, like WorldCat, where we record what has been published and is being published (although not all special collections are there) but we don’t have global commitments for libraries to say “yes, I want my holdings record there at a degree of accuracy

David Stuart is a research fellow at the Centre for e-Research, King’s College London, as well as an honorary research fellow in the Statistical Cybermetrics Research Group, University of Wolverhampton

FURTHER INFORMATION

Library of Congress BibFrame bibframe.org SchemaBibEx www.w3.org/community/schemabibex

APRIL/MAY 2014 Research Information 15 it will

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33