15
Figure 3. Untargeted Analysis of 4 Types of Rice Blast Fungus. The left bubble plot shows all compound features with F values as bubble sizes. The right bubble plot shows only the compound features that are detected as either unique or common with variances as bubble sizes. Orange arrows point out some compound features that are detected as unique for Guy11.
Methods
A new workflow and associated tools are developed to extend the Investigator framework with specialised detection and identification constraints based on chromatographic and mass spectral information to distinguish targeted compounds. Their purpose is to provide searching and visualisation methods that operate on a feature database to find common and unique compounds across many samples for either multi-sample classes or one-sample classes.
Starting with the Investigator framework, each chromatogram is analysed using a template comprising: (1) a set of peaks that can be reliably recognised across chromatograms, which are used for chromatographic alignment, and (2) a comprehensive set of peak-regions, which are used as features for semi-quantitative sample comparisons. The reliable peaks are determined from the bidirectional pairwise matching of all possible pairs of chromatograms [9]. The peak-region features are delineated by peak detection in the composite chromatogram created by aligning and summing all chromatograms [5]. For the analysis of each chromatogram, the template is aligned using the reliable peaks, then each peak-region is regarded as a compound feature. The problem of recognising the same compound feature in each chromatogram is automatically and implicitly resolved because measures are taken in the same peak-region aligned for each chromatogram.
Afterwards, both chromatographic and spectral information of compound features are extracted from chromatograms and
deposited to a feature database. The chromatographic attributes extracted include retention times, signal-noise ratio (SNR), and the total intensity count (TIC) in each peak-region. The TIC value provides a relative measure for a compound feature in a chromatogram. In order to normalise across chromatograms, the TIC measures are normalised by the total TIC values for all peak-regions in the same chromatogram to give a measure of percent-response. The spectral information extracted includes the spectrum of each compound feature and its base peak, which can be used to confirm feature correspondences based on spectral similarity. Then, a Compound-Sample-Class hierarchical association index (HAI) is built and pruned by applying specified criteria that can give analytically useful information on compound deviations between samples as shown in Figure 2.
The pruning process on the HAI uses the following three general filters:
• Detection Filter: SNR is used to filter out compound features from peak-regions that contain only background signals.
• Cross-Sample Significance Check: For multi-sample classes, variance-based statistics (i.e., FDR or F value) are used to select compound features. Low variances indicate commonality and high variances indicate uniqueness. For single-sample classes, samples are selected by a threshold applied to relative measures normalised by the maximum value across samples. If all samples are selected, the corresponding compound feature may be common. If only one sample is selected, the compound feature may be unique.
• Cross-Sample Identification Check: The
spectra of the same compound feature are compared across samples by match scores and base peaks. A compound unique to a sample should have low match scores when compared with other samples. A compound common among all samples should have high match scores across all samples.
The pruning result is visualised with a color-labeled bubble plot. Each bubble represents a common or unique compound feature found. The colour of the bubble indicates the class for which the compound is detected. The size of the bubble can be set to indicate its significance, for example, SNR for single-sample classes or F value for multi-sample classes. All bubbles are placed based on their retention times. The resulting bubble plot provides not only metric values, but also instructive predictions as to which features are effective for distinguishing samples as demonstrated in the following results.
Experiments
Two example analyses are presented here to demonstrate the effectiveness of the new workflow. The data were processed and visualised using a developmental release of GC Image GCxGC-HRMS Edition Software (Version 2.7, GC Image, Lincoln NE, USA).
Multi-Sample Class Example: Rice Blast Fungus
The first example analysed data from 4 types of rice blast fungus (Magnaporthe oryzae) including the wild-type (wt) Guy11 strain and mutant strains resulting from
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36 |
Page 37 |
Page 38 |
Page 39 |
Page 40 |
Page 41 |
Page 42 |
Page 43 |
Page 44 |
Page 45 |
Page 46 |
Page 47 |
Page 48 |
Page 49 |
Page 50 |
Page 51 |
Page 52 |
Page 53 |
Page 54 |
Page 55 |
Page 56 |
Page 57 |
Page 58 |
Page 59 |
Page 60 |
Page 61 |
Page 62 |
Page 63 |
Page 64 |
Page 65 |
Page 66 |
Page 67 |
Page 68