by Michael Schnall-Levin AL
Long-Range Sequencing is Required to Unlock the Full Genome
Next-generation sequencing (NGS) provides unprecedented access to one of biology’s greatest mysteries—the genome. Information obtained from sequencing allows researchers to identify changes in genes, associate them with diseases and phenotypes and uncover potential new therapeutic targets.
Sequencing technologies have become faster and less expensive, but limitations remain. Short-read sequencing generates data inex- pensively and with low single-base-pair error rates, but produces only a partial picture of the underlying DNA. Thus, many diseases that are likely the result of a genetic change are not yet assigned to a mutation.
While much of the genome has remained inac- cessible due to the confines of current tools, newer technologies are utilizing long-range sequencing to fill in the information gaps inher- ent with short-read sequencing.
A shortcoming of most long-range sequencers when compared to short-read sequencers is greater per-base cost and higher error rate. In contrast to this, the GemCode platform (10X Genomics, Pleasanton, Calif.) builds on short- read sequencing technology, maintaining low cost and minimal error rates while provid- ing long-range information on the scale of 100 kilobases and higher (Figure 1). Long-range sequencing retains the information present in the original DNA sample, providing the ability to phase variants, improve detection of struc- tural changes in human genomes, increase accuracy, achieve single-molecule sensitivity and quantitate and assemble genomes de novo.
Phasing Long-range sequencing allows phasing: char-
acterization of the chromosomal origin of variants located within a diploid genome (dip- loid organisms, like humans, carry two copies of each chromosome). By identifying haplotype information, phased sequencing can be used to study complex traits that are often influenced by interactions among multiple genes and alleles. Whole-genome and whole-exome sequencing produce a single consensus sequence without
differentiating between variants on homolo- gous chromosomes. Phased sequencing can identify which alleles are on either maternal or paternal chromosomes—information that can be critical to understanding the genetics under- lying a disease and for studying expression and gene regulation.
Phasing enables other downstream applica- tions as well. Because reads can be separated by haplotype, variant-calling can be performed in a haploid, rather than in a diploid context. Thus, mutations (which are consistent with haplotype structure) can be separated from errors intro- duced by the sequencing technology (which are orthogonal to haplotype structure), resulting in the ability to provide variant calls that have much higher accuracy than those called by short-read sequencing. This is especially important for identifying mutations present in only a fraction of the sample, such as cancer sequencing and noninvasive prenatal testing (NIPT).
Structural variation Short-read sequencing is sufficient for identify-
ing point mutations in the genome, as well as small insertions and deletions at a reasonable accuracy. However, the larger variants that occur across the underlying genome (known as structural variations) are more difficult and, in many cases, impossible to detect reliably with short-read sequencing. Examples include: 1) Large-scale deletions in which 10s or more kilobases of DNA are removed and totally lost. 2) Inversions in which a section of a chromo- some is not lost or amplified, but whose orientation has been flipped. These can be very long—on a multi-megabase level, ap- proaching the length of a chromosome. 3) Interchromosomal translocations in which different chromosomes that would not normally be connected become connected. Human genetics studies and cancer research require an understanding of these variants. In fact, structural variation is the fundamental driver of the oncogenic process in many types of cancer.
One of the challenges in identifying structural variation with short-read sequencing is that the
AMERICAN LABORATORY 44 MARCH 2016
Figure 1 – Critical long-range genomic information can be unlocked with the Chromium system, a microfluidics-based benchtop instrument powered by GemCode technology.
variants are typically mediated by sequences that are repeated throughout the genome. Long- range sequencing goes outside the immediate vicinity of that structural variant into a wider window where there is likely a unique sequence.
Repeat-rich areas Another interesting application for long-range
sequencing is calling variants in the nearly 10% of the genome that is repeat-rich. These regions are very similar and thus difficult to interrogate with short-read sequencers. Sometimes known as “genomic dark matter,”
repeat-rich areas
originated from a duplication process in which a gene or a region around a gene is copied and turned into two copies or more in the genome. As a result, either may become a pseudogene and still be very similar at the sequence level, but no longer be a functional gene due to the mutation. Alternatively, two copies may be maintained as different genes, but acquire dif- ferent functions by one or more mutations.
The challenge in aligning 100–200 base reads from a short-read sequencer is that it will not be clear from which of these copies the read was generated. For example, in a gene with one pseudogene, whether reads were generated from the gene or from the pseudogene will not be known. Reads suggesting a mutation could have radically different implications, depending on which of the two copies they came from. A variant from a pseudogene likely affects non- functional DNA as the pseudogene no longer
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36 |
Page 37 |
Page 38 |
Page 39 |
Page 40 |
Page 41 |
Page 42 |
Page 43 |
Page 44 |
Page 45 |
Page 46 |
Page 47 |
Page 48 |
Page 49 |
Page 50 |
Page 51 |
Page 52 |
Page 53 |
Page 54 |
Page 55 |
Page 56 |
Page 57 |
Page 58 |
Page 59 |
Page 60