This page contains a Flash digital edition of a book.
statistical tools in genetics


forest) are being sequenced and high volume sequencing methods are advancing; with suitable phenotyping to match, association mapping will follow. One high-value cash crop tree, the cocoa tree (Teobroma cacao), has already[12]


received close AM attention: almost


250 samples, from 17 countries in Latin and South American or the Caribbean, yielding close to 150,000 expressed sequence tags and a high-density genetic map. Looking to the future, the ways in which


researchers analytically interact with data seem set to change in wide ranging ways. Current corporate providers of solutions are, as in every other area of computing, being joined by open systems and imaginative uses of distributed access. Te Discovery Environment provided by iPlant Collaborative (a non-profit virtual organisation, funded by the US National Science Foundation, centred at the University of Arizona, and now in its third year), to take just one example, provides a web portal through which botanists and other plant scientists can both provide and access analytic tools. Tose tools sit on high-performance computing platforms, will handle terabyte scale data sets and can be used by anyone through a semi- friendly graphical user interface. Data can be stored, analyses run, results shared. Looking at the ways in which distributed


Top and centre: distribution of over 7,000 Arabidopsis accessions, Bottom left: Comparison of expected and observed occurrences of 8,133 independent premature stops in 4,263 protein coding genes (from Weigel[10]


; data from Cao[11] et al, maps by George Wang)


an application of linkage disequilibrium (LD) mapping techniques, has a solid history in study of disease in humans. In this connection, as one approach to formalised study of genetic disease architectures, probabilistic graphical models (already well established in bioinformatic gene expression and linkage analyses) are appearing in support of AM methods up to genome scale, although there are limitations in that respect. AM has tended to emphasise high-frequency alleles, but development of statistical models is addressing this and some interesting plant studies exploit them. Brassicæ once again raise their leafy little heads[10, 11]


here and not


without reason. Tough statistically powerful and capable of very high mapping resolution, AM is dependent upon well-established understanding of single nucleotide polymorphisms (SNPs) within the organism being studied. It can, therefore, be most effectively applied to those subjects whose genomes are already known


14 SCIENTIFIC COMPUTING WORLD


and, conversely, is least useful in those not yet recorded in sufficient detail. Tat limitation is, of course, fluid and


progressively under revision as new genomes are explored, mapped and published. An increasing number of trees (both orchard and


Further information BGI bgiamericas.com Broad Institute www.broadinstitute.org


The Cancer Genome Atlas (TCGA) cancergenome.nih.gov


Complete Genomics www.completegenomics. com


IDBS www.idbs.com


Illumina www.illumina.com


iPlant Collaborative www.iplantcollaborative.org


Life Technologies www.lifetechnologies.com


SGI www.sgi.com


Systat Software www.systat.com


Thermo Fisher Scientific www.fisher.co.uk


VSN International www.vsni.co.uk


Wolfram Research www.wolfram.co.uk


and/or cloud-based computing structures are spreading and establishing themselves, it seems likely that this model or something like it is the pattern to expect. How exactly it will interact and cohabit with present corporate providers is anyone’s guess, but that’s not a question peculiar to genetics. Some companies have already moved experimentally down the ‘free up to a point’ route opened up by small shareware and similar vendors: Wolfram Research, for example, has for some years provided several web-based access points through which anyone with a web browser can make use of Mathematica facilities on a small scale basis, thereby setting out its stall for those who need more and are willing to pay for it. Statistical soſtware publishers, like office suite publishers, get support from customers who do not want to rely on the excellent but unsupported free alternatives. Perhaps genetic analysis will support the same kind of mixed market. Whatever the mechanisms, they will for the foreseeable future be following an ever-upward spiral of size, speed and complexity.


References and Sources For a full list of references and sources, visit www.scientific-computing.com/ features/referencesjun12.php


www.scientific-computing.com


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52