August 2015

DNA RESEQUENCING continued

Capture

8x Coverage Reference Sequence Amplicon 5x Coverage Mutation of Interest

Figure 3 – Parameters that aff ect theoretical limit of detection of an NGS experiment.

Figure 2 – One data set with fi ve bioinformatic workfl ows. GATK, Genome Analysis Toolkit; SOAPsnp, part of Short Oligonucleotide Analysis Package; SNVer (full name); SAMTools, Sequence Alignment/Map Tools; GNUMAP, Genomic Next-generation Universal MAPper.

pathway includes: 1) quality control (e.g., FastQC, a software application from the Babraham Institute in Cambridge, U.K.); 2) de-duplication (re- moving PCR-duplicated molecules) and trimming, if needed; 3) alignment and 4) variant calling.

On-machine software for the MiSeq and Ion PGM sequencers provides a straightforward means of generating variant call data. Depending on the goals or fl exibility of the assay, an external, off -the-shelf software package or custom-developed pipeline is also an option. There may be variations among the analysis software options for parameters like align- ment stringency (i.e., how closely the sequence must match the reference) or threshold of coverage required to make a variant call (e.g., position must be sequenced at coverage of 4× or greater). For these parameters, it is important to evaluate how changes to the pipeline aff ect the fi nal data output. A predictable, orthogonally validated reference standard makes this easy to achieve for software updates.

Calculating theoretical limit of detection To enable disease discovery, it is imperative that NGS assays are validated for

the detection of the lowest possible allelic frequencies. This is particularly true concerning the analysis of cell-free DNA (cfDNA) for oncology-relevant mutations. In these patient samples, copies of the disease-relevant mutant DNA may be very low relative to the healthy/wild-type DNA. This is also the case with the early detection of organ rejection following transplantation, noninvasive prenatal analysis, and with disease monitoring during treat- ment. Reference standards are perhaps even more useful for addressing the challenge faced by those adopting NGS for such clinically relevant analyses. Specifi cally, as alluded to earlier, the theoretical limit of detection for an NGS experiment is infl uenced by such parameters as molecular uniqueness (de- duplication), input amount and coverage (see Figure 3 and equations below).5

Molecular uniqueness In a traditional amplicon panel, short amplicons are generated with the same forward and reverse primer sequences, targeting a specifi c region of interest (gray “reads” represented in Figure 3). The identical ends of these sequences make it diffi cult to disambiguate unique molecules from those

“duplicated” during PCR amplifi cation. Ideally, only unique molecules would be used to calculate allelic frequencies of mutations in order to avoid errors and bias from amplifi ed DNA. Capture-based libraries (navy blue “reads” in Figure 3) overcome this issue because DNA is fi rst randomly sheared, generating random start/stop sites that can be disambiguated or de-duplicated. However, the input amount required for capture-based libraries is generally higher.

Importantly, because coverage distribution is not uniform across a sample, it may be necessary to sequence much deeper to achieve the desired cov- erage at each position of interest. The calculations shown in the equations below demonstrate the theoretical limit. When experimental noise is fac- tored in, the actual limit of detection may be much higher. Determination of an assay’s limit of detection using a known reference standard is good practice and is required by CLIA to process patient samples.

Input amount The amount of input material in an NGS experiment greatly infl uences the lowest possible allelic frequencies that may be detected. Essentially, it is not possible to cover the position of interest more times than there are copies of DNA in a biological sample. It is not unusual for an informatics pipeline to remove PCR duplicates (as described above) and require a minimum of four reads to show the mutation of interest in order to make a variant call. Therefore, if the number of copies of DNA present in a sample is estimated using 3.3 pg for a single, haploid copy, then for DNA samples of 10 ng, the theoretical limit of allelic frequency detection is near 0.1%:

As the input amount increases, the theoretical detection limit decreases. This is a theoretical calculation; when experimental noise is taken into account, the actual limit of detection may be much higher.

Coverage Sequencing coverage also plays a role in an assay’s limit of detection. If

one is not limited by input material, then the denominator of the equa- tion below is not restricted by molecules but, rather, becomes reads exclusively:

Again, assuming four reads that contain the mutation of interest are re- quired to detect allelic frequencies of 0.01% or lower, this positon must be

AMERICAN LABORATORY • 18 • AUGUST 2015

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52