LABORATORY INFORMATICS
prevalence of new technologies and commercial offerings. Ontological, taxonomical and semantic tagging are all set to become mainstream, as the technology and application integration becomes easier and vendors deploy their tools in the cloud. A corporate data lake must be defined
and viewed as the place to go to find, search, interrogate and aggregate data – making it easier for data scientists to investigate and build data sets for their work. Find and search are two separate concepts here – one is where you know what you are looking for – the other is when you don’t know what you are looking for – and want to explore the data. A data lake must be integrated into all
systems that are part of the data lifecycle, crudely: creation, capture, analysis and reporting, so that all aspects of the R&D data landscape can be consumed and leveraged, re-indexed and continually enriched. A data lake should not be viewed as a regulatory or intellectual property store – it needs to be a living ecosystem of data and indices that adapts to the needs of the science and business.
Pharma is looking to shift to a situation where it can be much more data-driven.
But first, data must be discoverable for scientists, data scientists and the applications they use. These data jockeys need access to vast quantities of highly curated data to do their jobs – and data lakes are likely the best answer. AI and other tools like deep learning, augmented intelligence and machine learning all need a similar set of inputs to
”A corporate data lake must be defined and viewed as the place to go to find, search, interrogate and aggregate data”
data scientists – lots of well annotated data. Adding more tags and metadata to a set of data is something that sits at the heart of what a true data lake should be – and the impact could be far reaching. The data volumes are huge and this leads to a couple of issues. Where should this data be stored? And how can it be made searchable? This is where the cloud helps. Whilst searching is often discussed in a macro sense – Google-type searching for
example – the questions that scientists want to answer are not always ‘keyword’ or phrase-based. Scientific questions are far more intricate and need more than just typical text indices: they require fact- based searching and relationship-based searching too. This requirement means data must
be treated as a living organism – and structured in a way that can handle tricky questions. This means each of the ‘index’ types need to be aware of each other so you can jump concepts, while also remaining easily updatable for when new data types are introduced. This is not easy, but rapid progress is being made through the deployment and use of cloud storage, semantic enrichment, alternate data structures, data provisioning, data ingestion, analysis tools and AI. All these have a part to play and their level of use depends on the questions being asked of the data. The cloud is the best way to leverage these technologies in a cost effective and consumable manner – vendors just need to make sure their applications are prepared.
This article was written by Paul Denny-Gouldson, VP, Strategic Solutions at IDBS
White Papers now available online
Using Chromeleon 7 Chromatography Data System for Enhanced Data Integrity By Thermo Fisher
Data governance is an integral part of a regulated company’s quality system. Having a chromatography data system can simplify system administration and ensure regulatory compliance (including 21 CFR Part 11) and adherence to data integrity guidelines.
Empower Software Audit Trails and Logs: A guide to the different locations of audit trails in Empower and what information they provide to reviewers By Waters
Audit trails are considered the key to the security of a system since they track changes to data and metadata. In this way, an incomplete or absent audit trail can impact data integrity or even product quality. The absence of an audit trail is considered to be, “highly significant when there are data discrepancies” according to the FDA.
*Registration required
www.scientific-computing.com/white-papers
Backup vs Archive: What’s the difference and why you need both By Waters
Today, laboratory-based organizations face a wide variety of unaddressed data management challenges, and yet ultimately the scientific data is the currency with which they trade. Proper data management may not pay shareholders but it fundamentally defines the integrity of the organization and it’s purpose for existing. Being the cheapest, the fastest or the most definitive is desirable but it is all meaningless if the data is untrustworthy.
VIEW FOR FREE*
Looking Beyond Analytical Data Standardization By ACD/Labs
Externalization of R&D activities and the deluge of instrumental analytical data generated on a daily basis has resulted in increasing interest in analytical data standardization. Any standardization efforts, however, to either a single format or for data exchange between formats; should be weighed against the requirements of different users of that data, and hardware innovations.
SCIENTIFIC COMPUTING WORLD
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28