SCW_AUGSEPT10

GENOMICS

and clinical trials and LabVantage’s Sapphire Laboratory Information Management System (LIMS) with its BioBanking functionality is used for sample tracking and workflow management. The laboratory conducts genotyping using Genome-Wide Association Studies (GWAS), which involves interrogating the entire genome for associations between specific SNPs (single-nucleotide polymorphisms) or other markers and the disease of interest. Fine mapping sequencing is also used to improve the resolution surrounding the SNP association found in GWAS analysis. ‘GWAS data sets are very large and informatics solutions are necessary to manage these,’ explains Jody Sylvia, senior bioinformatics project manager of the respiratory, genetics and epidemiology division. ‘Where it used to be single SNP genotyping on a sub-set of people, now a very large set of people are genotyped for a whole range of markers and then the researchers drill down into the data to find the particular SNPs that are associated with whatever disease or sub-set of phenotypes they are looking for.’ LabVantage’s Sapphire LIMS is used to

manage the biorepository along with storage of subject data, some of the phenotype data, and the case control status, while a tier of other LIMS software is used to manage project-level work. Genetic markers and the sequencing platform used are linked to any other data within the project-level LIMS, so that the investigator has a good understanding of the project. ‘Managing the work in this way means investigators have all relevant information at their fingertips,’ says Sylvia. ‘It also makes collaboration much easier, because each project can be accessed by different collaborators. This is important for how we work; we support a team of investigators and most of them are working on projects in which they want to look at existing data and associate it with a subset of data a fellow researcher has completed.’

The high-throughput laboratory uses sequencing, not only for fine mapping regions of genes, but also for discovery. ‘We have samples from a lot of rare diseases,’ Sylvia says. ‘We’re looking for a specific genetic variant that might not be in the overall population, but for the subset of the disease group this variant means a lot – it changes the severity of the disease.’

Needle in a haystack With genomics technologies, it’s not only the pure data size that’s a challenge, but also the data mining. ‘It’s like finding a needle in a haystack,’ remarks Dr Jens Hoefkens, head of Genedata Expressionist at software provider, Genedata. ‘There is this huge number of measurements and what you’re looking for is a handful of biomarkers. There’s now more and more hay while the number of needles has remained more or less the same.’ Data mining is where informatics solutions, like Genedata Expressionist, play an important role. The Genomics Centre at King’s College

London is using Qlucore’s Omics Explorer, along with other informatics packages, to interpret its microarray gene expression data. ‘Informatics software is the gateway to the result,’ states Dr Matthew Arno, manager of the Genomics Centre. ‘Anybody can apply samples to microarrays and generate the raw data – that’s extremely straightforward now, robots can do it. But to be able to interpret and analyse the huge amount of raw data and generate, at the very least, a list of significantly differentially expressed genes is a challenge. The informatics software we use makes that possible.’ Arno explains that there’s a lot of redundancy

at the level of the probes on the array, as there are multiple probes for each gene. Therefore, multiple measurements are summarised to calculate a gene-level expression value. In addition, factors like the annotation of the gene come into play. Arno also notes that being able

Predicting genetics

A University of Toledo doctoral candidate in biomedical sciences has used supercomputers at the Ohio Supercomputer Center (OSC) to compute a novel algorithm for the prediction of exon and intron genomic sequences. Samuel Shepard based the predictions on mid-range sequence patterns of 20 to 50 nucleotides in length, which are said to display a non-random clustering of bases referred to

as mid-range inhomogeneity (MRI). Shepard and his team hypothesised that the MRI patterns were different for exons and introns and would serve as a reliable predictor. Shepard developed a technique known as

binary-abstracted Markov modelling (BAMM), which involves creating rules that reduce mountains of nucleotide information into a much smaller binary code, based upon the

DNA ‘word’ length and the nucleotide bases found within those words. To test rules for long word lengths, Shepard utilised the Ohio Supercomputer Center and its 9,500-node IBM Cluster 1350. Shepard comments: ‘During the project, our algorithm read 12 million nucleotides of exons and introns each, and three million each were used to test the predictions.’

The lab at EdgeBio, which provides next- generation sequencing services

to visualise the data is important as it allows scientists to identify trends in the data. ‘The software [Qlucore’s Omics Explorer] doesn’t just produce an anonymous table; researchers can see what’s happening in the data throughout the course of a gene expression study occurring in real-time.’ Visualisation is important for analysts working with genomics data and is functionality available in most software packages used for genomics, including Genedata Expressionist and CLC Bio’s software.

Standardisation A problem that Sylvia, of the Channing Laboratory, identifies is tracking what algorithm or what analysis tool is used in a study. ‘Everybody uses different tools,’ she says. Not having standard file types complicates working with and exchanging data, although there are groups, such as the Genomic Standards Consortium (GSC), that are trying to implement greater levels of standardisation in the area of genomics. The problem of non-standardisation is probably an inevitable one. William Mounts, of Pfizer, comments: ‘Data management and data analysis are not the first things the vendor has to address – the first thing is the technology platform itself.’

➤

www.scientific-computing.com

SCIENTIFIC COMPUTING WORLD AUGUST/SEPTEMBER 2010

13

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48