RI August/September 2021

Analysis and news

Trekking into the semantic frontier Jonathan Bresman explains why the Starship Enterprise and Jurassic Park needed semantic search

A discovery service isn’t really discovering if it doesn’t comprehend what it is looking for.

If it functions primarily by brute

keyword searches, then it isn’t capable of distinguishing between meaning, context and nuance, and can’t truly understand the user’s intent. If it just presents the user with page after page of results in which the keyword simply makes an appearance, then it really isn’t any better than dumping a haystack on the user and leaving it to them to find the needle. Now, is searching for a needle in a haystack better than having to search for it out on the prairie? Sure, but it’s still not good enough. What is needed, instead, is the semantic enrichment of the content that is indexed in the discovery service – ‘smart’ subject indexing is an example. Essentially, metadata that provides every knowledge item with its frame of reference, definition and significance, as well as information about its connections and relationships with other knowledge items. Once a body of content is infused with such semantic metadata, an ideal discovery service designed for semantic search is then capable of understanding the meanings of words, comprehends the sentences in which the words are strung together, has a sufficiently expansive knowledge graph to understand if the sentence is referencing a concept, and if that concept is, in turn, part of a larger mental model. This, however, is easier said than done.

There is even an episode of Star Trek: The Next Generation built around the ship’s computer having difficulty with it. Essentially, the Enterprise encounters an alien race called the Tamarians, and while the ship’s universal translator can interpret individual Tamarian words, their sentences don’t make any sense. The problem is that the Tamarians speak entirely in cultural references, and since the Enterprise computer is not familiar with Tamarian history and literature, it can’t convey the Tamarians’ intended meaning. (Basically, it is like someone

12 Research Information August/September 2021

unfamiliar with Shakespeare and the Bible not understanding that Romeo and Juliet is shorthand for doomed love, mentioning Cain and Able is shorthand for murderous brotherly envy, etc.) Semantic understanding is more than

just comprehending context, however. It is also understanding overall systems, the system’s subsets, and which combination of the system’s components might overlap with the user’s needs. For example, in Jurassic Park, Lex, the young heroine, desperately needs to find out how to lock a door to keep out a hungry dinosaur. She wastes precious time having to look through different clusters of files. If the Jurassic Park computer system was capable of semantic search, Lex could have done a search for securing the specific door she needed. Another way to think of this is that

semantic search ideally allows a discovery service to be capable of making

“Semantic understanding is more than just comprehending context”

inferences and reading between the lines. Furthermore, it should understand the subjects it indexes well enough to recognise where there may be overlap between them. Metaphorically speaking, it would be able to recognise where the area of overlap is in a Venn diagram. The system should then present these ‘overlap’ areas to subject matter experts, who can recognise the significance of this overlap and tag it with appropriate metadata, thus training the discovery service still further. For example, a discovery service

capable of semantic search understands that ‘prednisone’ is an anti-inflammatory steroid. It also comprehends that ‘acne’ is a skin condition. But beyond that, it is able to identify a set of articles in which ‘prednisone’ and ‘acne’ overlap. Subject matter experts would see these and recognise that the reason for the

overlap is that acne is a side effect of prednisone. The subject matter expert then would add this knowledge to the semantic enrichment, and the discovery service would now ‘understand’ both explicit and implicit relationships involving prednisone. The explicit relationship is what it already knew – that prednisone is an anti-inflammatory steroid. But now it also ‘knows’ that one of prednisone’s potential side effects is acne – an implicit connection. A system that instead replies on

brute keyword searches would simply have provided list after list of results for prednisone and list after list for acne. It would not have presented the articles where they overlap. The user would have had to manually go through the raw output of all these countless articles and track by hand all the ones in which both were

@researchinfo | www.researchinformation.info

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36

orderForm.title