search.noResults

search.searching

dataCollection.invalidEmail
note.createNoteMessage

search.noResults

search.searching

orderForm.title

orderForm.productCode
orderForm.description
orderForm.quantity
orderForm.itemPrice
orderForm.price
orderForm.totalPrice
orderForm.deliveryDetails.billingAddress
orderForm.deliveryDetails.deliveryAddress
orderForm.noItems
LABORATORY INFORMATICS


Controlling your data


SOPHIA KTORI TAKES A LOOK AT TECHNOLOGY WHICH HELPS LABS CAREFULLY MANAGE AND STORE DATA SECURELY


Given that one of the overarching aims of any R&D – or data-driven organisation – is to maximise the


amount of useful information and knowledge that can be derived from vast quantities of disparate data, how do today’s software-fuelled labs make sense of and contextualise that data, not just for today, but so it remains relevant and insightful years down the line? Longevity comes up in many


discussions. ‘The key is to think about the data lifecycle, so you can then build in sufficient control and data management,’ suggests Nick Lynch, Pistoia Alliance investment lead and one of the founders of the organisation. How is that data generated, where does it go, what do you want to understand from it, and how might it be used in the future?’


Scientific data is not disposable, it has a


life beyond its initial creation, and this is a key consideration for labs that are setting up or upgrading their data management systems. ‘Whether that data is generated at the R&D stage, or at latter-stage clinical trials, having complete oversight and control of every aspect of that data is imperative, so it will still be accessible, relevant and usable in the context of future experiments and analysis, and especially re-analysis using AI/machine learning methods, perhaps 20 years down the line.’


FAIR principles The concept of controlled data should also fit in with the foundational principles of FAIR data: findability, accessibility, interoperability, and reusability,’ Lynch continues. ‘I would add ‘quality’ to those principles.' The commonality of high- throughput and high content workflows, and the breadth of data now generated


16 Scientific Computing World August/September 2019


Human responsibilities The role of the scientist, and their experience and skills in designing experiments is also critical, and shouldn’t be underestimated, but scientists must also understand the necessity to collect all of the contextual and metadata around an experiment. ‘It’s very much a human responsibility to make sure that data and metadata are correct at the point of creation, Lynch noted. ‘You don’t want to have to try and fix data terminology or language, somewhere down the line to make it fit the required format. Not only would you then run a risk of reducing the usability of that data, but human error may come into play. It makes good sense to have set in place enterprise-wide data standardisation – this also marries with the findable, and interoperable principles of the FAIR data guidelines.’ The Pistoia Alliance is developing a


means that labs can’t just look at their data in terms of its endpoints, but must have systems in place to accurately manage all of the metadata, whether that means accurately documenting which cell lines have been used and from which supplier they came, who carried out the experiments, the source of reagents and consumables, and the maintenance of any equipment. Context and control go hand- in-hand, he suggests. ‘Only then can you truly compare your data, with that from future or past experiments.’ Set in place systems that can effectively husband and provide access to all data and metadata, and you will have the quality of data needed to exploit AI and ML tools and algorithms that can further identify patterns and generate new insights from data streams originating from different sources. ‘I think there are two aspects to consider when bringing in AI/ML,' Lynch notes. 'These are, “what can I do to ensure my data is of high enough quality to feed in to these algorithms?” And “what can AI/ ML – perhaps more accurately described as augmented intelligence – then do with that data to help build models that I can be confident are relevant?” If your data isn’t up to scratch, complete and accessible, there is no point.’


FAIR toolkit to help companies adhere to FAIR principles, and encourage the use of best practice and learning throughout an organisation. ‘The term FAIR is actually quite wide-reaching, so you need to bring it down to practical levels for day-to- day operation. It’s not necessarily about reinventing anything, but more about making sure that everyone is aware of and implements best practice.’ Pistoia Alliance is working with many of the other FAIR initiatives, including IMI Fairplus project, to get the best outcomes.


Unified Data Model The initiative’s Unified Data Model (UDM) supports experiment language standardisation, so that it becomes possible to share that data both within an organisation and to third parties. Standardisation of vocabularies and ontologies reduces the likelihood that data is misinterpreted, and increases confidence in that data, Lynch indicates. ‘You are also improving your efficiency day-to-day and also for the longer term,


@scwmagazine | www.scientific-computing.com


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32