LIG 2022

LABORATORY INFORMATICS GUIDE 2022

companies to automate the processing of large datasets, using supervised and 100 per cent reproducible machine learning algorithms to monitor the immune systems of cancer patients who are enrolled in worldwide clinical trials.

Those are examples of really

important stages for the success of bringing a new therapy to market. We’re using our technology in those kinds of highly regulated spaces, and that’s why we’ve got a massive focus within the company on compliance and validation.

What makes the Aigenpulse platform different to other AI and ML tools on the market? In the market already there are these more generalised machine learning platforms, where the users themselves have to pre-process their data, get it in the right shape, organise it, label it, make sure they’ve got clean data. In the life sciences sector, that’s a lot of work. Those operations require a lot of specialist skill, it takes a lot of time, and can cause a lot of frustration. The way we architected the product with the experiment suites, really streamlines each of those processes for each experimental area. The reason we have developed

these modules is because they’re actually quite different. For example, in proteomics, the way you process data, the overall structure might be similar, but the tools and the way in which you do that are very specific to proteomics compared to say, cytometry or transcriptomics, for instance. Having known this from the start, a lot of the underlying structures within the platform and our technology stack enable this flexibility. This allows us to create a proteomics suite that can help scientists to process, integrate, structure data and visualise that data for proteomics. But then we can create – on the same technology stack – a genomic suite or a cytometry suite, and implement and integrate these tools within the platform. There are several key benefits of this. One is that a lot of users and a lot of organisations need more than one lab technique. So they will do proteomics, transcriptomics and cytometry in either the same group or adjacent groups. But they’ll be using the same samples, so there’s no other platform that enables them to integrate all of that data. That allows an organisation to integrate and

10

analyse its proteomics, transcriptomics and cytometry data for a batch of patients. That works really well both from

the scientists’ point of view, as each specialist can focus on their own workflows and data. Traditionally, when you are trying to pull all of this data together, what you do is either look on your shared drives, or you actually go up to the scientist who’s generated that data and ask them, which is obviously a slow process. This increases the chance for data quality issues and data control issues. If these are on very sensitive samples, it can be quite a risk. So, given all of that, having it in one

space is a lot more efficient – it’s a lot more secure. We’ve seen massive benefits to storing and managing data in this way. The other practical point is that we can start small within the company. We can deploy once and we can add to that over time, as the company grows, or their use-cases broaden. Usually it’s not ‘I want to do AI or

ML on my data’, it’s more ‘I need to get answers from my data and what are the ways in which we can do that?’ And so that’s how we frame it, it’s really about the output rather than the method. One of the major issues if you are using AI is around data completeness, and having enough labelling or metadata, so that any feature can be associated with something that’s biologically relevant. That takes a lot of work. It’s one one thing to generate your experimental data, it’s another thing cleaning, aggregating it, making sure it’s complete.

But then there’s that next problem, which is, I can take this data and run it through any number of AI algorithms, but when it pulls out certain features what does that mean? That is where you need labelling and metadata. Within our platform, we have a lot of ways in which to help users to provide that metadata. If the data has been generated by using samples from patients, for instance, having as much information about that patient, their medical history, integrating their genotype data, and experimental information as well. This means you can discount any

features that are generated, that can be linked back to certain types of technical variation. For example, the samples were done on a different day, to make sure that you’re not putting up any false positives.

One of the major issues if you are using AI is around data completeness, and having

“

enough labelling or metadata

“ And then finally, making sure that you

have enough heterogeneity or a broad enough data set, so the models aren’t too narrow, and are prone to overfitting. You see a lot of this in genomics, where there has been a focus on western populations and now thankfully, there’s massive initiatives across Asia and Africa and the Middle East to equalise that, which is very important.

How does this platform work with existing LIMS or ELN software in an organisation? If you have your data assets already there, you can integrate them straight away. We can help you kind of provide that metadata to the platform as well, if it’s incomplete, or you need to pull it from different sources.

I think the major thing is that end

users do retain a lot of this information when they will be on their other systems, like LIMS, ELN’s or within their lab notebooks and Excel sheets that they’ve got distributed across an organisation. Most customers do have LIMBS or

ELN systems, and we’re not displacing them by any means – we’re actually integrating with them and can pull that metadata automatically from those systems and integrate it with the proteomics data set or the cytometry data set. That’s traditionally, what the limbs in the ELN systems aren’t good for is kind of storing, analysing, visualising and retaining these larger datasets. And so we have designed an adjacent solution for that within the overall ecosystem. l

Dr Satnam Surae is chief product officer at Aigenpulse. He has been active in the life sciences for more than 10 years. While originally focussing on biochemistry, he discovered early on his passion for applying information technologies to biological

www.scientific-computing.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38

orderForm.title