search.noResults

search.searching

dataCollection.invalidEmail
note.createNoteMessage

search.noResults

search.searching

orderForm.title

orderForm.productCode
orderForm.description
orderForm.quantity
orderForm.itemPrice
orderForm.price
orderForm.totalPrice
orderForm.deliveryDetails.billingAddress
orderForm.deliveryDetails.deliveryAddress
orderForm.noItems
21


3.3 Collinearity, sensitivity analysis and applicability domain


Very low single descriptor correlations with tR were observed and this further strengthened


=0.7896), it yielded inferior results to neural networks in general with mean tR


(R2


the need for the multi-input model approach (Figure 5). Indeed, initial investigations of simpler, multiple linear regression (MLR) models showed that, although a correlation existed between measured and predicted tR


Figure 4. Sensitivity analysis of the single MLP model (blue) and ensemble model (orange). Error ratios >1 represent high model dependency on that descriptor.


error of 31 s from the measured tR value


which represented 4.3% of the analyte retention range, and again similar to previous performance on C18


. Over 86% of


all compounds were predicted to within 60 s of the measured value (n=567) and the 75th percentile of all errors was 43 s, which was set as the threshold for matching. The worst performance was for triazoxide, mesotrione and cyclosulfamuron with errors of -3.91 min, 2.97 min and -2.5 min, respectively. Each of these have high sulphur or nitrogen content, which is not covered explicitly in the descriptor dataset. All three of these compounds were present in the training set. The worst predictions in the verification and blind test sets were for acibenzolar-S-methyl and allidochlor, respectively and errors for both were within 2 min. However, on the whole, this model generalised very well to this new stationary phase type.


Across all three datasets, there was a small negative bias to the prediction overall (-1.2 s). Closer inspection of Figure 3(b), however, revealed an underlying trend to the errors obtained. Early eluting compounds were very slightly over-predicted in comparison to later eluting compound (average error of the first and second half of all eluting compounds was +17.1 and -17.8 s respectively). The stability of artificial neural networks can be improved by ‘ensembling’ models. Four replicate MLPs were retrained and combined. Overall, the bias of the ensemble model reduced significantly to 0.22 s. The trend in bias reduced marginally to 15.2 and 13.3 s for the first and second


half of eluting species, respectively, but was still evident. This ensemble model marginally reduced the average errors in comparison to the single model alone and the correlations improved generally (Figure 3(c) and (d)). Errors were 28 ±27 s, 28 ±23 s and 29 ±25 s for the training, verification and blind test sets respectively. Performance remained poorest for triaxoxide as before (error = -4.02 min), but slightly improved for both mesotrione (2.05 min) and cyclosulfuron (-2.234 min). Overall, there were fewer outliers than with a single MLP model. Therefore, it was decided to proceed with the ensemble as the preferred approach. The 75th percentile of all absolute errors was 39 s and this was set as the match threshold. In the blind test set for the ensemble model, a few compounds with structural commonality lay outside of this range that are worth noting. Specifically, these included several substituted nitroaniline species, such as butralin, isopropalin, pendimethalin and nitralin. Closer examination of these structures highlighted that no descriptor was included to represent nitro groups specifically, although it was hoped this would have been reflected in logD/logP data indirectly. This could also be partially explained by the existence of only two similar compounds (i.e., oryzalin and flumetralin) in the training set. However, overall, the performance of the ensemble model was considered excellent and represents the first successful prediction of gradient retention times of such a large number of compounds on a biphenyl stationary phase and to this accuracy level.


media. The next most important descriptors were slightly different between the single and ensemble models and also to previous models on C18


. In particular, some


descriptors were likely prioritised given the aromatic character on the stationary phase (e.g., nR06, Hy, nBnz). The high contribution of Hy in particular is likely to also reflect the observed effect of increased retention of polar, early eluting compounds as it is related to hydrophilicity [13].This molecular descriptor includes variables such as the number of hydrophilic groups (-OH, -SH, -NH), nC and nSK the number of atoms excluding hydrogen and was the second highest contributor to predictions using the ensemble model. As can also be seen, descriptors that were retained in the model design stage, but which had near zero values were deprioritised in both models (i.e., nTB, nR04, nR07-09). For the single model, error ratio values for nTB was <1.0 meaning there was a slight improvement in the model when it was removed. With the ensemble model however, and even though very few compounds possessed triple bonds, it still did contribute overall to the prediction. This is likely where stability of the ensemble approach was observed, leading to better generalisability.


inaccuracy for all compounds of 2.35 ±0.83 min. The non-linear neural network approach was far superior, as shown above. Unlike in MLR, where coefficients can be interpreted to help understand contributions of each input to the output, interrogation of input dependency for neural networks is more complex. The dependency of both the best single model and ensemble on each molecular descriptor was then evaluated where each molecular descriptor was systematically removed and the change in performance from the complete dataset calculated to produce an error ratio. Values less than 1.0 indicated that the model was sub-optimal and that descriptor data was worsening predictions. As can be observed in Figure 4, by far the largest contribution to the prediction for both models was logD and in line with similar models on C18


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52  |  Page 53  |  Page 54  |  Page 55  |  Page 56  |  Page 57  |  Page 58  |  Page 59  |  Page 60  |  Page 61  |  Page 62  |  Page 63  |  Page 64  |  Page 65  |  Page 66  |  Page 67  |  Page 68