LMUK 50.2

LIMS & Lab Automation

Deciphering the code of chemical and biological processes: The role of AI in laboratory research

Oliver King-Smith, smartR AI

This article explores the transformative role of artifi cial intelligence (AI) in deciphering the complex molecular and genetic codes that govern biological and chemical processes. From molecular interactions to genetic regulation, AI is enabling researchers to uncover new insights, identify patterns, and optimise laboratory workfl ows. By applying machine learning (ML) algorithms and leveraging large language models (LLMs), AI is offering a powerful tool to accelerate research in both the biological and chemical sciences. Expert contributors in this fi eld discuss the current advancements and the potential of AI to revolutionise our understanding of chemical and biological systems.

Introduction

Artifi cial Intelligence (AI) has emerged as one of the most transformative technologies of our time, revolutionising fi elds from autonomous vehicles to healthcare. At its core, AI refers to computer systems that can perform tasks that typically require human intelligence – including pattern recognition, learning from experience, and making complex decisions. These systems use various approaches, from rule-based algorithms to sophisticated machine learning models that can identify patterns in vast amounts of data.

The evolution of AI has been particularly remarkable in recent years, driven by advances in computing power, the availability of massive datasets, and breakthroughs in machine learning architectures. Deep learning, a subset of machine learning inspired by the human brain’s neural networks, has enabled computers to process and analyse information in increasingly sophisticated ways. Large language models (LLMs) like GPT-4 have demonstrated unprecedented capabilities in understanding and generating human-like text, while specialised AI models have achieved superhuman performance in specifi c tasks such as image recognition and game playing.

In the scientifi c realm, AI is proving to be an invaluable tool for researchers, offering new ways to analyse complex data, automate routine tasks, and uncover patterns that might be invisible to human observers. Laboratory research, in particular, stands at the frontier of AI application, where the technology is helping scientists decipher the intricate codes of biological and chemical processes. From interpreting genetic sequences to predicting molecular interactions, AI is accelerating the pace of scientifi c discovery and opening new avenues for investigation.

This article explores how AI is specifi cally transforming laboratory research in both the biological and chemical sciences. By applying machine learning algorithms and leveraging large language models, researchers are gaining new insights into molecular interactions, genetic regulation, and chemical reactions. These advances are not just incremental improvements to existing methods – they represent a fundamental shift in how scientifi c research is conducted, promising to accelerate discoveries that could have far-reaching implications for human health, environmental protection, and technological advancement.

In recent years, AI has made signifi cant strides in laboratory research, providing tools that enhance the way scientists interpret complex data. One area where AI is particularly impactful is genomics. Techniques like basecalling—decoding the electrical signals from DNA sequencing into actual DNA sequences—have become industry standards, and AI is central to improving their accuracy. Neural networks, for example, have been used to interpret sequencing data, and new AI-powered techniques combined with Duplex Basecalling have improved accuracy by an order of magnitude across various applications.

AI’s Role in Genomics:

From Basecalling to Cell Type Annotation AI’s potential in genomics extends beyond basecalling. One of the most exciting recent applications of AI is in the annotation of cell types using large language models (LLMs). The software library GPTCelltype uses GPT-4, a powerful LLM, to automatically annotate cell types from single-cell RNA sequencing data. By utilising marker gene information, GPT-4 generates annotations that align closely with manual annotations, which can signifi cantly reduce the time and expertise required for this task. Evaluated across hundreds of tissue types and cell types, the model has shown impressive accuracy and effi ciency. As James Prendergast, Professor of Bioinformatics at the Roslin Institute, explains, “There are an ever-increasing number of papers using LLMs in research. For

example, we are using them to understand which bits of the genome are functional and shape important livestock traits.”

Genomic Language Models:

Unlocking hidden patterns in DNA The success of LLMs in genomics has inspired the development of ‘gLMs’ (Genomic Language Models), which are specifi cally designed to handle the structure of DNA. Given that DNA can be viewed as a long list of letters, much like text, gLMs are trained to predict the next base in a sequence, similar to how LLMs predict the next word in a sentence. These auto-regressive models study billions of base pairs across multiple species to identify hidden patterns that may not be immediately obvious to researchers. Applications of gLMs are already emerging in areas such as fi tness prediction, sequence design, and transfer learning. As James Prendergast, Professor of Bioinformatics at the Roslin Institute, explains: “There are an ever-increasing number of papers using LLMs in research. For example, our lab is using them to understand which bits of the genome are functional and that shape important livestock traits.”

One of the challenges for gLMs is ironically limited data. Normally you don’t think of DNA as ‘limited’ but the latest LLMs are often getting trained on 10’s trillions of tokens, where each token containing 100s of times more information than a base pair. So, to build an equivalent dataset that is used to train the latest state of the art LLMs would require more than Quadrillion 1,000,000,000,000,000,000 base pairs.

Furthermore, gLMs must contend with the limited ‘context window’ of the data they process - currently, LLMs can handle up to 128,000 tokens, but for some long-range interactions in genomic sequences, this window is too small. Basepairs may interact megabases apart, which won’t fi t into the context windows of state-of-the-art LLMS.

AI-driven virtual labs:

Accelerating scientifi c discovery AI is also making its mark in other areas of biological research, such as protein engineering. A recent study created an AI-driven virtual lab to design nanobodies that could bind to the SARS-CoV-2 virus. This ‘actor/critic’ AI framework involves several agents working together to simulate and predict nanobody effi cacy. The Immunology Agent focuses on understanding the immune response and designing the nanobodies, the Computational Biology Agent models interactions between the nanobodies and the virus, and the Machine Learning Agent develops algorithms to predict their effectiveness. This collaborative virtual lab accelerated the discovery of nearly 100 nanobody structures in a fraction of the time it would have taken a human team. The rapid success of such AI-driven models highlights the growing role of machine learning in optimising scientifi c workfl ows.

AI’s expanding role in chemistry

While AI’s application in genomics and biology is making headlines, it is also gaining traction in the fi eld of chemistry. Computational chemistry has traditionally been a time-consuming and expensive endeavour, with signifi cant reliance on human intuition. Despite advances in the last 50 years, computational chemistry has not seen the same dramatic speed improvements as biochemistry or genomics. However, AI holds promise in reshaping this fi eld by providing deeper insights into chemical processes.

Chemistry presents unique challenges for AI applications. Unlike DNA sequences or proteins, small organic molecules cannot easily be described as linear objects.

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48

orderForm.title