search.noResults

search.searching

note.createNoteMessage

search.noResults

search.searching

orderForm.title

orderForm.productCode
orderForm.description
orderForm.quantity
orderForm.itemPrice
orderForm.price
orderForm.totalPrice
orderForm.deliveryDetails.billingAddress
orderForm.deliveryDetails.deliveryAddress
orderForm.noItems
LABORATORY INFORMATICS Data lakes and


cloud computing Data needs to be continually enriched and augmented with learnings, writes Paul Denny-Gouldson


For research and development organisations, the rise of instrument and process automation is leading to a phenomenal increase in the amount, variety and complexity of scientific data that is gathered. All this data needs to be made available


so it can be integrated into projects and new scientific approaches, both now and in the future. The requirement to be useable has been growing over the past decade and is reaching a critical point. Instrument data is driving new science and, as organisations move to large image-based and high-density data structures to support their work e.g. phenotypic screening, the data types used are advancing from the simple text formats of old. To ensure these new data types are


(re)useable in R&D and are consumable by existing and emerging technologies such as Artificial Intelligence (AI) and machine learning, the data has to be accessible, clean and adequately tagged with metadata. These high value ‘data lakes’ can become silted up and quickly turn into swamps if data is not properly tagged with all relevant contextual information – projects, tested molecules, results, downstream use, conclusions, derived data, related data etc. Designing and keeping data lakes in good health requires constant work and effort, but cloud computing strategies like new storage (S3) and adaptive indexing technologies (NOSQL, Triples) will help. Whilst some people think of data lakes, or even data, as a static picture after it has


been captured, in reality, data needs to be continually enriched and augmented with learnings. Often, informatics organisations consider the data as the record – and in some cases, it is – but it does not have to be cast in stone and ‘stored’. Intellectual property (IP) records can be captured and stored in other systems – while the working data is stored in other data structures and ‘put to work’. Enrichment is a hot topic in the pharma informatics domain. We have


16 Scientific Computing World December 2017/Janauary 2018


seen the emergence of many tools that all essentially do the same thing: make data more consumable or discoverable by scientists and computers. Semantic enrichment or natural language processing has been around for many years and has shown good benefits particularly in the healthcare domain, where it is used to extract and normalise data from clinical trials. In Pharma R&D, the enrichment approach is gaining traction with the


@scwmagazine | www.scientific-computing.com


Rawpixel.com/Shutterstock.com


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28