to additional data from other sources. However this also creates additional challenges, as data needs to be standardised, cleaned and transformed and comparable in order for the model to generate insights. Dotmatics Browser is a web-based

search and reporting solution that provides federated searching across Dotmatics and third party research databases. Dotmatics principal consultant Dan

Ormsby explains that the development of Browser has helped Dotmatics by providing a platform for the company to create AI solutions that leverage data that has been organised and neatly stored for use in Browser. ‘Our best-selling product is called Browser and it provides querying and reporting on top of data you’ve already got. But in order to deliver querying and reporting, you have to organise your data really neatly.’ ‘If you want to use artificial intelligence

on your data that you’ve got stored, the first thing you have to do is exactly what Browser has already accomplished. You have to arrange the data neatly, and do any massaging or aggregation that’s needed,’ Ormsby added. ‘If you’ve got an ID of a compound that you’ve made, then you need to be able to link that ID to who made it, and to what bottles contain batches of that material where aliquots of those batches have been tested in assays, if they’re active or inactive against a biological target, for example.’

Feature engineering Generating data and transforming that data so that it can be used for AI is just one step in the process. As Ormsby explains, feature engineering is used to apply domain knowledge to that data in order to make it amenable to computation: ‘Feature engineering is one really crucial thing I’ve been working on. I’m starting with early research of small molecule drug discovery, like lead optimisation. Only because Dotmatics has lots and lots of customers who have data in which that type of workflow is done.’ ‘Every customer data is now a training

set. Because in every set of customer data, we have the arrow of time, we know what they did in the beginning of the project,’ states Ormsby. ‘We know what they did the week after that. So one of the first things my model does when you turn it on, is it will take the project data, it will take the early data and see if it was predictive of the next window of time. Then it will take a larger section of early data and predict the next window of time. It will walk through the data in windows of time or temporal splits, to see if a model was retrospectively available,’ | @scwmagazine

‘So every customer dataset, then gets a kind of a modelability metric to say how modelable that ambition on the project was versus the compounds that were made.’ This provides organisations with detailed

information on their project and its goals that they can use to see if the work that has been done previously is predictive of work done later in the project. This can help an organisation determine whether project goals have been met and how achievable they might have been based on the actual progress achieved. ‘If you find that it was predictive, then

you know that that the model is finding an actual signal in there, there is something modelable, and it gives you more confidence to believe in it,’ said Ormsby.

Applications for early AI adoption The explosion of AI and ML is creating a wide range of new avenues for scientists and researchers to find insight in scientific data but these are still fairly new ways of solving problems and the number of use cases continues to expand. While that is the high level of how it works,

what scientists are doing with this software is a very open-ended question,’ noted Biovia’s Hayward. ‘You can do different statistical analyses, and apply it to different machine learning problems. We’ve used it in the lab for different things like predictive maintenance of equipment. Another example that we often give is talking about image analysis.’ Predictive maintenance can provide

information about the use patterns, types of samples and experiments being run on the instrument, support and maintenance and analyse those data points. The resulting data can provide detailed information about when maintenance and service actions need to take place for each instrument or group of instruments based on their specific parameters. Hayward also pointed out the potential

for image analysis and how this has been applied to images of crystal structures. ‘When you’re analysing crystal structures, and trying to identify structures, what components you’re seeing and the attributes of the crystals that you’re looking at.’ He noted that this type of analysis can be very time-consuming and so effectively automating the steps to interpret large sets of images can provide huge benefits to laboratory efficiency. ‘But then the question is: what parameters Do I need to look at here? How does a computer see this image in a digital way? How do I describe it in digital terms? Said Hayward.’ Once you have a good idea of what

features you are looking for and how to describe those in digital terms you can

”When we talk about machine learning and AI, it’s really Pipeline Pilot that is core to that”

begin to leverage AI to find answers to some complex questions. ‘That’s where you can start training the

system using a set of training images. The model can build out its own set of algorithms and come to the conclusion of how to sort out these images, and then you can apply it in that way going forward,’ noted Hayward. ‘By leveraging AI for these types of tasks, users can save huge amounts of time once the model has been trained.’

One of the main challenges facing

scientists and research organisations that want to make use of AI is ensuring that their data is appropriately stored and organised to ensure that it can be used for AI and ML research without huge amounts of manual work that needs to be done before a project can start. If organisations take the time now to

ensure that data is ready to be used in AI in the future, then they can save huge amounts of time further down the road when AI and ML become a prerequisite to laboratory success. ‘It’s a big challenge,’ notes Hayward.

‘That is what we’ve been seeing with our customers – they are often trying to standardise data so that they can compare it across different labs and make sure that it’s consistent. ‘You want to make sure that the data

that you’re putting into the system can be opened up later and compared across the different locations. So regardless of where an experiment is taking place, you can look at it, see these two data sets, compare them and know that that’s an accurate representation.,’ Hayward added. ‘For anybody that’s implementing a new platform, that’s a big concern to them, because they don’t want to get locked into just one thing, and then their old data is inaccessible or uncomparable.’ Ormsby also further stressed this point noting that organisations will need to adopt AI in order to stay competitive. ‘You’re either doing machine learning, or you’re going to be out-competed in the next few years,’ stated Ormsby. ‘You just have to be doing this now, to

have a chance of staying in business, because people who really embrace this will win.’

Summer 21 Scientific Computing World 25

Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42