22 August / September 2019
compounds in the dataset. Examination of the PCA data, however, was able to identify this to give added assurance to the user if needed.
Conclusion Prediction of tR
for 653 pesticides on a Figure 5. Collinearity analysis for all 16 molecular descriptors and tR
The descriptors used here were chosen from a larger list of >200 compound properties that were initially user-curated based on having known relevance to RPLC [7]. Following this, neural network-based descriptor selection was used to narrow the list down to a set of 16 molecular descriptors that yielded the best accuracy overall. An alternative approach to feature selection is to generate thousands of molecular descriptors at random and perform statistical feature selection using statistical algorithms [14, 15]. This may yield alternative descriptors giving similar (or better) performance to the user-curated approach, but may not necessarily have any relationship to mechanisms in RPLC. Both methods were performed here, and both gave similar results in initial experiments. Indeed, pursuit of alternative descriptors to generate a second parallel model may boost confidence in predictions on biphenyl phases, and similar to previous work in our laboratory on prediction of passive sampler uptake rates where both approaches were used. However, this was beyond the scope of this work as
the generalisability of the previous C18 optimised model was considered a priority to enable simultaneous predictions across multiple types of RPLC methods using the same descriptor set. The limitation of a user-curated approach was that some moderate collinearity existed between some variables (Figure 5). Collinearity can add some unnecessary imprecision or inaccuracy to models if left undetected and can lead to overfitting (which was not observed here, due to good consistency between the training, verification and blind test set data). Particularly, collinearity can affect mechanistic interpretation of the model. The highest positive Pearson correlations of >0.8 were observed, unsurprisingly, for MlogP and AlogP and between AlogP and
.
logD. Therefore, mechanistic interpretations between these specific descriptors in terms of relative weighting should be taken with caution. Both were among the most
positively correlated descriptors with tR along with logD and nC over all others. However, Pearson correlations for all descriptors with tR
biphenyl reversed-phase stationary phase under gradient elution conditions was possible using machine learning for the first time. In particular, an ensemble of four two- layer MLPs achieved the best results within an acceptance threshold set at ±39 s of the true value. Although the data was curated on an LC-MS/MS system in targeted mode, prediction of tR
becomes especially useful
for unknown identification workflows using full-scan high resolution mass spectrometry. This approach represents an efficient way to rapidly shortlist suspect compounds before investing in expensive reference materials or synthesis.
was <0.8, showing that
no one descriptor was likely useful to model retention accurately. As above, removal of the collinear descriptors worsened the predictions (likely as a result of learning from slight differences in calculation of logP, for example), so these were retained despite being collinear. All other correlations were below a threshold of 0.8 and were considered acceptable for use here. No excessive negative correlation was observed between any of the other descriptors, which might be expected from a user-curated approach. Principal component analysis of the shortlisted descriptor data for all compounds in Figure 6(a) revealed clear clustering for most molecules to define an applicability domain generally. A few outliers existed in principal component 2, which may highlight poor molecular description and a limited applicability domain for these molecules in particular. A closer examination revealed that most of these were macromolecules such as gibberellic acid, avermectins, doramectin, azadirachtin and spinosad which contained larger numbers of rings than the rest of the compounds (Figure 6(b)). However, the predicted tR
for these compounds were mostly within the 39 s threshold except for three compounds isonoruron (tR
absolute error
=59 s), sulfosulfuron (43 s) and spinetoram (48 s). Therefore, the selected descriptors for these types of molecules were likely insufficient for accurate predictions, but this was considered a very minor limitation given that this represented <1% of the number of
References
1. J. Beens, R. Tijssen, J. Blomberg, Prediction of comprehensive two- dimensional gas chromatographic separations. A theoretical and practical exercise, Journal of Chromatography A 822(2) (1998) 233-251.
2. A. Burel, M. Vaccaro, Y. Cartigny, S. Tisse, G. Coquerel, P. Cardinael, Retention modeling and retention time prediction in gas chromatography and flow-modulation comprehensive two-dimensional gas chromatography: The contribution of pressure on solute partition, Journal of Chromatography A 1485 (2017) 101-119.
3. M. Harju, T. Hamers, J.H. Kamstra, E. Sonneveld, J.P. Boon, M. Tysklind, P.L. Andersson, Quantitative structure- activity relationship modeling on in vitro endocrine effects and metabolic stability involving 26 selected brominated flame retardants, Environmental Toxicology and Chemistry 26(4) (2007) 816-826.
4. N. Strehmel, J. Hummel, A. Erban, K. Strassburg, J. Kopka, Retention index thresholds for compound matching in GC-MS metabolite profiling, Journal of Chromatography B: Analytical Technologies in the Biomedical and Life Sciences 871(2) (2008) 182-190.
5. K. Petritis, L.J. Kangas, P.L. Ferguson, G.A. Anderson, L. Paša-Tolić, M.S. Lipton, K.J. Auberry, E.F. Strittmatter, Y. Shen, R.
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36 |
Page 37 |
Page 38 |
Page 39 |
Page 40 |
Page 41 |
Page 42 |
Page 43 |
Page 44 |
Page 45 |
Page 46 |
Page 47 |
Page 48 |
Page 49 |
Page 50 |
Page 51 |
Page 52 |
Page 53 |
Page 54 |
Page 55 |
Page 56 |
Page 57 |
Page 58 |
Page 59 |
Page 60 |
Page 61 |
Page 62 |
Page 63 |
Page 64 |
Page 65 |
Page 66 |
Page 67 |
Page 68