RI_AUGSEPT15

ANALYSIS AND NEWS

Our colleagues from Elsevier provide invaluable biomedical domain expertise in this regard. We can then identify relevant information that could easily stay hidden in the literature, before expressing it in terms that every member of the team understands. This in turn allows deep reading to be applied to the literature and other relevant input.

In addition to applying computational power to analysing and understanding literature, how does the human element of a project team support the aims (e.g. domain knowledge, understanding of language)? Domain knowledge is a crucial component of any project team. Domain experts can wade through the tangled thicket of information and explain that when one person says X, and another person says Y, they both actually mean Z. Similarly, when one person uses an unknown term, experts can explain what they are really referring to. With the help of Elsevier experts we can easily disambiguate these situations.

What are the main challenges that the project is trying to overcome (e.g. how research abstracts are written and indexed, how published articles may be incomplete)? The first challenge for our team – one of 12 funded by DARPA – is making sense of a vast amount of data from diverse sources. This means doing much more than just scanning the published literature. If we limited ourselves to that level of input the fact that, for the most part, the published articles are incomplete would result in large information gaps. In particular, there are two problems that can interfere with any conclusions we might draw, and that are common to many other projects. First, the ‘methods’ sections of many articles don’t show all the steps the authors took to reach their conclusions. With only part of the story, we can’t tell if a claim or result is valid or not. Quite simply, we can’t take results sections or author interpretations of their findings at face value. Second, statements featured in papers may be contradictory. Oversimplification or incorrect assertions

www.researchinformation.info @researchinfo

might not be picked up during the editing and publication process. We saw one particularly egregious example of this: a recently published article’s introduction section claimed a particular protein could transcriptionally activate certain genes, yet cited a study stating that the same protein represses gene transcription.

Why did CMU choose to work with Elsevier on the Big Mechanism project? CMU recognised that Elsevier could help inform our own decision making, thanks to its technologies that mine the full text of both literature-based and experimental evidence, as well as relevant clinical data – all of which could be a potential source of useful information. Elsevier also provided specialists with a thorough knowledge of biological terminology and language, as well as access to both a massive library of papers and the capacity to handle new information.

So far, what progress has been made towards the project’s ultimate goals? We began the project approximately eight months ago. DARPA initially requested the development of a ‘use case’, starting with text input and analysis that suggests new ways to examine as-yet

‘Our colleagues from Elsevier provide invaluable biomedical domain expertise in this regard’

unresolved issues; leading in turn to a bench scientist doing real experiments with the output; and ultimately ending with that scientist returning the results of these experiments to the system to inform future investigations. We resolved a lot of organisational issues and began to assemble the software pipeline integrating Elsevier NLP engine with other programs that we began to develop specifically for this project.

Our priority for the next 18 months, is mining all documents that mention anything related to proteins of the KRAS gene, mutations in which underlie a significant proportion of colon, lung, head and neck cancers. We will extract

all relevant information into a central database; identify any inconsistencies and gaps; then ask the team’s biologist to perform specific experiments that could help remove and close them, as appropriate. In the long term, the hope is that once the process for developing better models of KRAS-driven cancers has been sharpened, it can be applied to not just different types of cancers, but also other complex diseases. This in turn should yield a better understanding of specific disease processes and accelerate the journey of effective treatments from bench to bedside.

What are the problems that researchers currently face specifically related to KRAS cancers (e.g. terminology used, various proteins referenced in the literature, synonyms)?

One significant problem is that different laboratories often use different naming standards when referring to the same proteins or processes. For example, in our current project, the gene of interest could be called KRAS, KRAS2 or RASK2. Its protein in turn might be referred to as GTPase Kras; K-Ras 2; Ki-Ras; c-K- ras; or c-Ki-ras. Furthermore, KRAS interacts with about 150 other proteins in the human genome, each of which has multiple synonyms of its own. This could easily result in hundreds, if not thousands, of different ways of describing the same basic information.

What implications will the eventual findings of the research have for personalised therapies? The text mining and natural language processing tools we are using in this project can ultimately help build mechanistic models for treating cancer in practice, allowing physicians to more easily perform precision medicine through assessing a patient’s individual cancer profile and suggesting the most effective treatment. We expect that models such as these will eventually assist decision making in molecular tumour boards – meetings of teams that collaborate on treatments for patients whose tumours have been analysed using genomic diagnostic tests.

AUGUST/SEPTEMBER 2015 Research Information 11

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40