revolution An AI


The confluence of data, compute power and advances in the design of algorithms for AI (artificial

intelligence) and ML (machine learning) are driving new approaches in the laboratory. This gives scientists access to additional tools that can open new avenues for research or accelerate existing workflows. The increasing interest in AI and ML is driving software companies to examine how they can develop their own software frameworks, or integrate functionality into existing laboratory software to support laboratory scientists’ use of AI. Some examples of domain areas that

are already seeing benefits of early AI adoption include predictive maintenance of instruments; predicting efficacy and potential of small molecules for drug discovery; and image analysis for a variety of different use cases such as crystallography and medical imaging. Stephen Hayward, product marketing manager at Biovia, Dassault Systèmes highlights the steps the company has taken to integrate AI functionality into its software: ‘We have a product called Biovia Pipeline Pilot, which is all about data science, data preparation, connecting data sources together and performing various functions on it. When we talk about machine learning and AI, it’s Pipeline Pilot that is core to that.’ As the adoption of AI and ML techniques

becomes more widespread the techniques are beginning to transform how scientific research is conducted. However, organisations need to ensure their teams are focused on their scientific goals rather than trying to develop expertise in advanced computational methods. While it is true that there should be some staff with a good understanding of AI and the software frameworks they are using to build these

24 Scientific Computing World Summer 2021

intelligent systems, it is unreasonable to think that specialised domain expert lab scientists should be expected to develop skills in computer science or the development of AI frameworks. This is driving software companies such as Biovia and Dotmatics to develop applications that can support AI and ML while also obfuscating some of the complexity so that domain scientists can make use of the technology with limited support from AI experts inside their own organisation. Pipeline Pilot, for example, allows data

scientists to train models with only a few clicks, compare the performance of model types and save trained models for future use. However, expert users can also embed custom scripts from Python, Perl or R to maximise their use across the organisation. ‘Pipeline Pilot ‘is, as the name implies, a data pipelining tool,’ says Hayward. ‘So it is able to open up many different formats of data and perform different functions on it. Sometimes it can be used for cleaning data or transforming data into different formats, or performing statistical operations on it.’ ‘One of the key features of it is that it’s

sort of a visual tool so you build literal pipelines of data and you can see what each step is going to be performed along this process that you put together,’ Hayward continued. ‘That in turn becomes what’s known as a Pipeline Pilot Protocol. So where you’re starting with a certain data format and one location, you’re performing a variety of functions on it, and then you’re outputting it somewhere else, whether that’s a new data file, or whether it’s pushing it into a different system at the end of the protocol.’ Since every model is tied to a protocol,

organisations have insight into where the data comes from, how it is cleaned and what models generate the results. With the demand for custom data science solutions increasing, software developers need ways to streamline protocol creation. Pipeline Pilot wraps complex functions in simple drag-and-drop components that can be strung into a workflow. These protocols can be shared between users and groups for reuse, ensuring that solutions are developed faster and standardised. The software enables scientists to start using AI and ML by making use of built-in AI and ML models. These can be utilised to run scientific calculations and analyses from multiple data sources including; Image data, spectral data, DNA/RNA/protein sequences, chemistry, text, streaming (IoT/IoE), financial records and location data. Pipeline Pilot enables users to automate model building using more than 30 supervised and unsupervised machine learning algorithms, including: random forest, XGBoost, neural networks, linear regression, support vector machines, principle component analysis (PCA), genetic function approximation (GFA). It is well understood in AI that the more data that can be used to train AI models the more accurate they will become. This means that it is imperative that organisations can access large data sets within their own organisation, or get access

@scwmagazine |


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42