Laboratory Informatics Guide 2020

clinical trials. Big data heralds a new era of precision medicine, where treatments become increasingly effective by tailoring to individual patients. If researchers and doctors can make use of all the information available to them they can make use of the valuable structured information that biomedical text contains. Microsoſt’s Project Hanover aims to

advance the state of the art of machine reading for accelerating Curation-as-a- Service (CaaS) workflows in precision medicine. Te company has developed a general framework for incorporating forms of indirect supervision to compensate for the lack of labelled examples. Combining deep learning with probabilistic logic the researchers are expanding the scope of machine reading from single sentences to cross-sentence and document-level. Te researchers have proposed novel neural architectures such as graph Long Short Term Memory networks (LSTMs) to incorporate reasoning with linguistic constraints. Peter Lee, corporate vice president

of Microsoſt Healthcare in Redmond, Washington commented on the importance of this technology in a Microsoſt research blog post. ‘For something that really matters like cancer treatment where there are thousands of new research papers being published every day, we actually have a shot at having the machine read them all and help a board of cancer specialists answer questions about the latest research.’ Hoifung Poon, director of precision

health natural language processing with Microsoſt’s research organisation, also stated: ‘In biomedicine, you can’t do that because your latest finding may only appear in this single paper and if you skip it, it could be life or death for this patient. In this case, you have to tackle some of the hard linguistic challenges head-on.’

Making sense of the data Cancer is a thousand diseases driven by disparate genetic mutations. Advances in sequencing technology make these mutations easily accessible for individual patients, yet deciphering the underlying code requires researchers to stay on top of a vast library of biomedical literature, which comprises of tens of millions of papers and grows at thousands per day. By combining deep learning and

probabilistic logic, Microsoſt has developed machine reading technology to automatically extract knowledge from publications, empowering decision-makers to curate much faster and make better-informed decisions based on larger datasets.

Building on past work in Literome, an

automatic curation system to extract genomic knowledge from PubMed articles, Microsoſt has developed Project Hanover to create literature machine readers for a variety of domains, from fundamental biology to translational medicine such as precision oncology, without the use of labelled examples. Te latest system has read all publicly

available biomedical literature. Te Project Hanover team reported that, in a matter of minutes, it found several times as many facts as a whole year of manual curation at an NCI-designated cancer centre, which can then be validated by expert curators in an assisted curation interface using Microsoſt Azure. Mockus and her colleagues are using

Microsoſt’s machine reading technology to curate CKB, which stores structured information about genomic mutations that drive cancer, drugs that target cancer genes and the response of patients to those drugs. One application of this knowledge base

allows oncologists to discover what, if any, matches exist between a patient’s known

say here is what we are interested in and it will help to triage and actually rank papers for us that have high clinical significance,’ Mockus said. ‘And then a human goes in to really tease apart the data.’ Over time, feedback from the curators will

be used to help train the machine reading technology, making the models more precise and, in turn, making the curators more efficient and allowing the scope of CKB to expand. Developing an FDA-approved drug now

takes on average more than a decade and can cost more than $2 billion. Randomised-controlled trials are the

best way to develop these drugs, but they are relatively expensive and time-consuming while covering only a tiny fraction of potential patients. Even aſter a drug has been approved,

it is important to conduct post-market surveillance to monitor adverse effects and efficacy in the general population. But now deep learning has emerged as a versatile tool for a wide range of natural language processing (NLP) tasks, due to its capacity in representation learning. Indirect supervision has emerged as

In biomedicine, you can’t do that because your latest finding may only appear in this single paper and if you skip it, it could be life or death for this patient

cancer-related genomic mutations and drugs that target them as they explore and weigh options for treatment, including enrollment in clinical trials for drugs in development. Te core of Microsoſt’s Project Hanover

is the capability to comb through the thousands of documents published each day in the biomedical literature and flag and rank all that are potentially relevant to cancer researchers, highlighting, for example, information on gene, mutation, drug and patient response. Te collaboration with JAX allows Poon

and his team to validate the effectiveness of Microsoſt’s machine reading technology while increasing the efficiency of Mockus and her team as they curate CKB. ‘Leveraging the machine reader, we can

a promising direction to address this bottleneck, either by introducing labelling functions to automatically generate noisy examples from the unlabelled text or by imposing constraints over interdependent label decisions. Tis can effectively speed up the time

needed to find new drugs and to match patients to clinical trials increasing the efficacy of these projects. Probabilistic logic offers a unifying

language to represent indirect supervision, but end-to-end modelling with probabilistic logic is oſten infeasible due to intractable inference and learning. In a recent paper published by Microsoſt

the company proposes the use of deep probabilistic logic (DPL) – used in Project Hanover – as a general framework for indirect supervision, by composing probabilistic logic with deep learning to accelerate doctors ability to assimilate the relevant information needed to make effective judgments on the treatment of cancer. Jeannette M Wing, Microsoſt’s corporate

vice president in charge of the company’s basic research labs, said: ‘Te collaboration between biologists and computer scientists is key to making this work.’ ‘If the computers of the future are not

going to be made just in silicon but might be made in living matter, it behoves us to make sure we understand what it means to program on those computers,’ Wing concluded. n


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36