LABORATORY INFORMATICS
Participants take part in the EMBL-EBI training course
diverse data from chemistry, biology and pharmacology, say, at the same time. And so here at OpenEye we are particularly interested in machine learning, because we have all these tiers of data available, and we want to be able to see the correlations between them. That will make it easier to answer critical questions, such as how a virtual screening ties in with the end result of your downstream workflows. And from our perspective, while this is a challenge, if you have an integrated platform where all of the rigorously validated data is available, in the right format, then developing and applying those kinds of machine learning tools just becomes much easier.’ But at the same time, there is a real drive
to consider technical complexity, and make the use of tools much simpler, so that additional complexity in the chemical space can be addressed, he suggested. ‘The drug discovery space, for example, is moving beyond the traditional classes of small molecules, into compounds that have completely novel modes of activity. These compounds present a different level of complexity, and gaining access into new areas of chemical space and, thus
www.scientific-computing.com | @scwmagazine
biological space, will obviously be very significant.’
Open source chemistry resources The European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) maintains a range of freely available cheminformatics resources that allow users to share data, undertake and analyse the results of complex queries in the chemicals space. ‘Probably the best known of these
resources is ChEMBL, an open data resource of binding, functional and bioactivity data,’ explained Andrew Leach, head of chemical biology and head of industry partnerships at EMBL-EBI. ‘But our suite of resources also includes SureChEMBL, a searchable database that contains information extracted from patent documents, together with ChEBI, a dictionary of small chemical molecular entities, and UniChem, which gives users the ability to cross-reference chemical structures across different databases. ‘Containing data on more than 2 million
compounds and 1.4 million assays, ChEMBL is a key resource for activity
“People are going to want to visualise and analyse not just more data, but more diverse data from chemistry, biology and pharmacology”
data’, Leach commented. ‘ChEMBL is widely considered to be the world’s leading expertly curated resource of its type that is completely open, and can be used without restriction by anyone in the scientific community. Its utility ranges from basic searches for compounds that have specific properties and activity against particular targets, to the development of new AI algorithms and machine learning tools.’ Launched more than 10 years ago, the ChEMBL database is now on its 29th release, and is derived from data in more than 80,000 published documents. ChEBI – chemical entities of biological
interest – is the longest-standing chemistry-related database at EMBL-EBI, and is a compendium of small molecules
g Autumn 2021 Scientific Computing World 15
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34