LABORATORY INFORMATICS
Data diagnostics Genestack CEO Dr Misha Kapushesky explains how metadata can unlock data for analysis
The diverse nature of data held in public and private repositories, particularly of genotype/phenotype associations and clinical information on outcomes, is creating the opportunity to mine data to reveal new drug candidates. However, this type of translational medicine is being hindered by the effort required to find the relevant data and process it for analysis. Across the industry, research and
development directors tell me that this is one of the major bottlenecks for drug discovery and a challenge in which they are investing significant resource to solve. The opportunity is immense. Drugs
licensed for one purpose may also be beneficial for treatment of other,
22 Scientific Computing World October/November 2017
more niche indications. This offers the opportunity for repurposing drugs that are already in clinical use for rare diseases, making new drugs available at a fraction of the cost of new drug development. But to detect these types of off-label benefits you need easy access to the public and proprietary data loaded, indexed, searchable and ready for analysis.
With this in place I see great potential
for scientists to develop an algorithm, run some analysis, truth-check the results and then re-run the pipeline using different data sets. This would allow the selection of promising candidates without the need for further clinical trials. These are the exciting next steps that can be unlocked with better data management supported by strong metadata systems, capable of automating the process of data indexing.
The challenge is multifaceted Firstly, getting a grasp of the data is problematic. Scientists want to find out where all the data is, both in the organisation and in public repositories, then figure out what is relevant to them. It needs to be possible for biologists
at large pharma to make simple queries such as: ’Find all patients that are over 30, female, with particular type of breast cancer, non-smokers, with a mutation in a particular gene for whom I also have transcription data’. At present this is difficult, as not all genomics data that is produced may be accessible: some may reside in private silos, some in public repositories. There may be untraceable data sets, which exist in various locations and span different therapeutic areas. Additionally, the size and diversity
of data sets is increasing and it’s not just genomics. Projects like Genomics England’s 100,000 genomes project and precision medicine in the US are generating large scale collections of clinical data, and this will soon be joined by real-world data streamed from sensors and wearables. Researchers want to have access to all of this in a format that enables them to conduct meta-analyses and view their
@scwmagazine |
www.scientific-computing.com
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36