Analysis and news g
How today’s machine learning is different One crucial development has made the technology much more applicable. Early, symbolist approaches to machine learning used a ‘top-down’ approach: first model your universe, then apply that model to the content. This two-step process required an extensive domain modelling before doing anything useful. By contrast, the current generation of machine-learning tools (including UNSILO) are probabilistic, using statistical tools to identify core concepts using only the content supplied, without using any prior taxonomy – in other words, working from the bottom up. They train themselves, without human intervention, using a subject-based corpus of content. Using such tools need not result in an
increase in the IT department headcount. It provides not only cost savings but more fundamental improvements to the entire scholarly process. For example, UNSILO provides a simple tool that checks an article abstract and enables the author (or publisher) to identify key concepts from the text that were not included in the abstract. No training is required to use the software. It has always been prohibitively expensive for publishers to provide manual curation of abstracts, and this new tool enables publishers to genuinely add value to author content. Other successful machine-learning
initiatives include the identification of suitable peer reviewers for academic article submissions. To find an expert reviewer by hand is a lengthy and fraught process for non-specialists, yet a machine can identify with far greater precision and speed the most appropriate reviewer based on a search of research. UNSILO’s flavour of machine learning
differs fundamentally from social media initiatives. Software tools such as FigShare and altmetric (and, fundamentally, Google) are popularity-based. While there is a role to play in social media for disseminating knowledge about research, it does not play the central part in research that content plays. Whatever the merits or demerits of a probabilistic approach, it is entirely derived from the content itself. When UNSILO finds a related article, it is independent of the number of hits – indeed, the article may never have been read or accessed before. For researchers, the content, not the popularity, is key.
Machine learning and taxonomies Most publishers have, or work with, a taxonomy or taxonomies, either a public taxonomy, such as MeSH for medical content, or a proprietary, in-house
18 Research Information June/July 2017 “Of course,
machine learning is not a universal panacea that will solve all the problems”
classification. The probabilistic model used by some of the current machine- learning software tools works without any requirement for a prior taxonomy or ontology, then how are these two worlds – taxonomic and machine-based concept extraction – to be reconciled? The probabilistic machine-learning
tools such as UNSILO do not require a taxonomy, but can match their automatically extracted concepts to a given taxonomy. They can even identify new terms that do not yet exist in a standard taxonomy. Using such tools, the cost of building and maintaining an in- house taxonomy is considerably reduced.
Objections to machine-learning As happens with any technical innovation, publishers and researchers initially respond to machine learning with concern. The questions they ask include: ‘This is a black box – I don’t understand
how it works. How can we control it?’ Publishers have an understandable fear of losing control, but it is perfectly possible to measure machine-learning output and to compare alternative tools. UNSILO has carried out studies with researchers that indicate high levels of engagement by researchers with the tools provided. ‘What’s wrong with my current
workflow?’ The relentless drive towards lowering costs means that any publisher who does not innovate will become sidelined by more efficient rivals who can introduce more powerful features. ‘I can’t justify the investment.’ Publishers often instinctively believe new technology
may not have an immediate payback; they assume that machine learning tools only deliver relatedness features. On the contrary, at UNSILO we have identified many publishing and research workflows where an immediate and dramatic cost benefit can be shown. ‘Everyone uses Google Scholar.’ Google Scholar has indeed captured a high proportion of academic search activity, but without providing particularly useful tools to improve the process. UNSILO’s UX studies, which echo several other public surveys, suggest that Google Scholar is used for initial research, to find a named author or paper, simply because it is the largest collection of academic content. Yet the academic user journey is considerably more elaborate than that. From their starting article, researchers like to browse and to explore “what if?” avenues. A good machine-learning tool provides several avenues of discovery; UNSILO provides links to tens if not hundreds of related articles and concepts for every article.
Implementation and adoption The most successful machine-learning innovations will be the ones that do not require extensive in-house technical teams to manage the technology. Instead, machine-learning tools will be delivered in a user-friendly way. The current generation of machine-learning tools demonstrates an intuitive interface that facilitates rapid adoption.
Machine learning is not a universal
panacea that will solve all the problems that researchers and publishers face. The recent book by Cathy O’Neil, Weapons of Math Destruction (2016) remind us all that algorithms are designed by people, whose bias may be questionable. But this does not detract from the potential of machine learning. The best machine-learning will not claim to replace all human input; instead, they make more effective use of human skills. Typically, a probabilistic tool delivers
correct solutions to a percentage of reliability. Depending on the context, for those problems where human input is essential, it will provide automated tools to enable humans to focus on the challenging parts. By uniting the best features of humans and machines, it looks likely that machine learning will transform the publishing sector in the coming years.
Michael Upshall is head of business development at UNSILO, a Danish machine-learning startup. He co- founded reference publisher Helicon Publishing and has worked with several other academic publishers (Pearson, CABI, IET, Cambridge University Press). He has been with UNSILO since March 2016.
@researchinfo |
www.researchinformation.info
Boo-Tique/
Shutterstock.com
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36 |
Page 37 |
Page 38 |
Page 39 |
Page 40 |
Page 41 |
Page 42 |
Page 43 |
Page 44