International Labmate 38.3

43

Within a series of compounds with a common ‘scaffold’ or ‘core’ structure, it may be possible to visualise these relationships, as illustrated in the SAR plot in Figure 2. Alternatively, exploring the effects of the same substitutions at similar positions on different scaffolds can identify replacements that have significant impacts on compounds’ properties, so-called ‘activity cliffs’ [9].

More diverse data sets can be analysed using quantitative structure-activity relationship (QSAR) methods that use statistical algorithms to model these effects. A general process for generating QSAR models is illustrated in Figure 3. Typically, a data set is split into two or more subsets. The first of these is used to train models of the SAR by fitting mathematical functions that relate descriptors of simple compound characteristics with the measured values. Common descriptors include whole molecule properties such as molecular weight, volume, charge or lipophilicity, 2- dimensional descriptors such as counts of specific functional group and 3-dimensional shape descriptors. It is essential that statistically trained models are carefully validated to ensure that they are robust and generalise to compounds that are not in the training set. Therefore, a second set of compound data is typically used to compare models based on their ability to make predictions on compounds that are not in the set used to train the models. Finally, if many models are compared against the same validation set, it is possible that one may achieve a good result by chance and therefore it is good practice to retain an external, independent test set to confirm the predictive power of the final model.

Predictive models can be used interactively to get instant feedback on how properties are likely to change as optimisation strategies are explored. An example of such an interactive designer is shown in Figure 4. However, when a property value is predicted for a new compound, the first questions are often “Why?” and “How can I improve this property?” To help with this, the SAR encoded in a predictive model can be visualised directly on the structure of a compound, as illustrated in Figure 4, to focus redesign efforts on those regions of a compound that are likely to have the biggest impact on improving its properties [10].

Figure 4. The interactive designer in StarDrop with Glowing Molecule™ visualisation. When the compound structure is changed in the editor above, all of the predicted properties on the right update instantly. This provides feedback on the impact of redesign strategies on the overall balance of properties. The colour scale highlighting the compound structure clearly identifies the regions having a strong effect to increase (red) or decrease (blue) the predicted property, helping to understand the SAR and guide the design of compounds with improved properties.

Conclusion

Generating drug discovery compound data is time consuming and expensive and it is important to get the most value from this effort. This article has presented two approaches to use these data to ensure that they influence compound selection decisions and provide information to guide the design of new compounds.

These approaches use advanced statistical methods to analyse the data. However, it is no-longer necessary to be an expert computational scientist to use them effectively. Well designed software can make these available in an intuitive and user-friendly way to build, validate and apply predictive models [11] or to achieve true multi-parameter optimisation [10].

Figure 2. Example SAR plot. The substitutions at positions R1 and R2 on the scaffold (inset) are displayed on the vertical and horizontal axes respectively. The size of the circle in each cell shows the number of compounds with the corresponding R1, R2 combination and the colour indicates their average activity from high in yellow to low in red.

References

1. Paul S, Mytelka D, Dunwiddie D, Persinger C, Munos B, Lindborg S, Schacht A. How to improve R&D productivity: the pharmaceutical industry's grand challenge. Nat. Rev. Drug Discov. 2010;9:203-14.

2. Dimasi J, Hansen R, Grabowski H. The price of innovation: new esitmates of drug development costs. J. Health Econ. 2003;22:151-85.

3. Kennedy T. Managing the drug discovery/development interface. Drug Discov. Today. 1997;2(10):436–444.

4. Tarbit MH, Berman J. High-throughput approaches for evaluating absorption, distribution, metabolism and excretion properties of lead compounds. Curr. Opin. Chem. Biol. 1998;2(3):411-416.

5. Kola I, Landis J. Can the pharmaceutical industry reduce attrition rates. Nature Rev. Drug Discov. 2004;3(8):711-716.

6. Chadwick AT, Segall MD. Overcoming psychological barriers to good discovery decisions. Drug Discov. Today. 2010;15((13/14)):561-569.

7. Segall M, Beresford A, Gola J, Hawksley D, Tarbit M. Focus on Success: Using in silico optimisation to achieve an optimal balance of properties. Expert Opin. Drug Metab. Toxicol. 2006;2:325-337.

8. Segall MD. Multi-Parameter Optimization: Identifying high quality compounds with a balance of properties. Curr. Pharm. Des. 2012;18(9):1292-1310.

9. Hu X, Hu Y, Vogt M, Stumpfe D, Bajorath J. MMP-Cliffs: Systematic Identification of Activity Cliffs on the Basis of Matched Molecular Pairs. J. Chem. Inf. Model. 2012;52(5):1138-1145.

10. Segall M, Champness E, Obrezanova O, Leeding C. Beyond profiling: Using ADMET models to guide decisions. Chem. & Biodiv. 2009;6(11):2144-2151.

11. Obrezanova O, Gola JMR, Champness E, Segall MD. Automatic QSAR modeling of ADME properties: blood-brain barrier penetration and aqueous solubility. J Comp Aid Mol Design. 2008;33:431-440.

Figure 3. Illustration of the process for the automatic generation and validation of QSAR models. The initial data set is split into three separate subsets: training, validation and test. The training set is used to build multiple models using different modelling methods, e.g. partial least squares (PLS), radial basis functions (RBF), Gaussian processes (GPs) and random forests (RF). These models are compared using the validation set to identify the model with the best predictive performance. Finally, the best model is tested against an independent test set to confirm that the model is robust and may be used with confidence for the chemistry of interest. The model may then be easily deployed to make predictions for new compounds of potential interest.

The resulting models can be applied to new compound structures to make predictions of their properties and help to select good compounds. However, when applying a model it is important to ensure that these new compounds lie within the model’s ‘domain of applicability,’ i.e. the chemical space that is well represented by the set of compounds used to train the model. QSAR models are not good at extrapolating to previously unseen chemistry and predictions for compounds that lie outside of the domain of applicability should be treated with caution.

Interested in publishing a Technical Article?

Contact Gwyneth on +44 (0)1727 855574 or email: gwyneth@intlabmate.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52 | Page 53 | Page 54 | Page 55 | Page 56 | Page 57 | Page 58 | Page 59 | Page 60 | Page 61 | Page 62 | Page 63 | Page 64 | Page 65 | Page 66 | Page 67 | Page 68

orderForm.title