Drug Discovery World Winter010

Free online resources:Layout 1 14/1/10 19:53 Page 36
Cheminformatics
Figure 3
ChemSpider provides links to
Wikipedia articles, links out to
the original data sources and
commercial suppliers, links out
to patents and articles on
PubMed. Flexible search
capabilities are available,
together with visualisation
tools such as a real time 3D
optimisation engine and display
module
Continued from page 34
data types that can be archived and then selective- reports indicate that this problem should be given
7 Wishart, DS et al (2006).
ly shared among colleagues or openly shared in serious attention. For instance, benchmarking stud-
DrugBank: a comprehensive standardised formats, at each research group’s dis- ies by a large group of collaborators from six labo-
resource for in silico drug
cretion. A focus of CDD is facilitating the growth ratories
13,14
have clearly demonstrated that the
discovery and exploration.
of global collaborative research networks for neg- type of chemical descriptors has much greater influ-
Nucleic Acids Res 34
lected diseases such as malaria, African sleeping ence on the prediction performances of QSAR mod-
(Database issue), D668-672.
8 Wishart, DS et al (2008).
sickness, Chagas disease and tuberculosis. els than the nature of the model optimisation tech-
DrugBank: a knowledgebase Subsequently there are currently 50 datasets avail- niques. Furthermore, in another recent seminal
for drugs, drug actions and
able to the public upon registration which can be publication
15
, the authors clearly pointed out the
drug targets. Nucleic Acids Res
readily substructure or similarity searched. importance of chemical data curation in the context
36 (Database issue), D901-906.
9
of QSAR modelling (eg incorrect structures gener-
Ma’ayan, A et al (2007).
Network analysis of FDA
The importance of chemical data ated from either correct or incorrect SMILEs).
approved drugs and their targets. curation in QSAR modelling Their main conclusions were that small structural
Mt Sinai J Med 74 (1), 27-32.
Molecular modellers and cheminformaticians alike errors within a dataset could lead to significant
10 Irwin, JJ and Shoichet, BK
typically analyse data generated by other losses in the predictive abilities of QSAR models. At
(2005). ZINC – a free database
researchers providing, in general, experimental the same time they further demonstrated that man-
of commercially available
compounds for virtual
data. Consequently, when it comes to the quality of ual curation of the structural data leads to a sub-
screening. J Chem Inf Model these data modellers are always at the mercy of the stantial increase in the model predictivity
15
.
45 (1), 177-182.
providers. Practically any modelling cheminfor- In their report highlighting the importance of
11 Irwin, JJ et al (2005). Virtual
matics study entails the calculation of chemical gathering accurate information to build the WOM-
screening against
descriptors that are expected to accurately reflect BAT and WOMBAT-database, Oprea et al
16
dis-
metalloenzymes for inhibitors
and substrates. Biochemistry
the intricate details of the underlying chemical cussed the error rate in medicinal chemistry publica-
44 (37), 12316-12328. structures. Obviously, any error in the structure tions. They found an average of approximately two
12 Hohman, M et al (2009).
translates into either an inability to calculate the errors per publication in the almost 6,800 papers
Novel web-based tools
descriptors for erroneous chemical records or into indexed in the WOMBAT database. With a median
combining chemistry
informatics, biology and social
erroneous descriptors. Naturally, the models built of 25 compounds per series in a publication this
networks for drug discovery.
with this data are either restricted to only a frac- implied an overall error rate of 8% with errors
Drug Disc Today 14, 261-270. tion of the formally available data or, worse, they including
17
: incorrectly drawn or written structures,
13 Tetko, IV et al (2008).
are merely inaccurate. As both data and models of unspecified position of attachment of substituents,
Critical assessment of QSAR
the data, as well as the body of scholarly publica- structures with the incorrect backbone, incorrect
models of environmental
toxicity against Tetrahymena
tions in cheminformatics, continue to grow, it generic names or chemical names or duplicates.
pyriformis: focusing on
becomes increasingly important to address the The basic steps to curate a dataset of compounds
applicability domain and issue of data quality that inherently effects the have been either considered trivial or ignored by
overfitting by variable
quality of models. the experts in the field. For instance, several years
selection. J Chem Inf Model 48
How significant is the problem of accurate struc- ago a group of experts in QSAR modelling devel-
(9), 1733-1746.
ture representation as it concerns the adequacy and oped what is now known as OECD QSAR model-
Continued on page 37 accuracy of cheminformatics models? A few recent ing and validation principles
18,19
that the
36 Drug Discovery World Winter 2009/10
Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52 | Page 53 | Page 54 | Page 55 | Page 56 | Page 57 | Page 58 | Page 59 | Page 60 | Page 61 | Page 62 | Page 63 | Page 64 | Page 65 | Page 66 | Page 67 | Page 68 | Page 69 | Page 70 | Page 71 | Page 72 | Page 73 | Page 74 | Page 75 | Page 76 | Page 77 | Page 78 | Page 79 | Page 80 | Page 81 | Page 82 | Page 83 | Page 84 | Page 85 | Page 86 | Page 87 | Page 88