Discovery of single nucleotide polymorphisms (SNPs) requires analysis of redundant sequences such as those available in large public databases. The ability to detect SNPs, especially those of low frequency, is dependent on the depth and scale of the discovery effort. Large numbers of SNPs have been identified by mining large-scale EST surveys and whole genome sequencing projects. These surveys however are subject to ascertainment bias and the inherent errors in large-scale single pass sequencing efforts. For example, the number of steps involved in the construction and sequencing of cDNA libraries make ESTs highly error prone, resulting in an increased frequency of nonvalid SNPs obtained in these surveys. Sequences of mtDNA genes are often incorporated into cDNA libraries as an artifact of the library construction process and are typically either subtracted from cDNA libraries or are considered superfluous when evaluating the information content of EST datasets. Sequences of mtDNA genes provide a unique resource for the analysis of SNP parameters in EST projects. This study uses sequences from four turkey muscle cDNA libraries to demonstrate how mtDNA sequences gleaned from collections of ESTs can be used to estimate SNP parameters and thus help predict the validity of SNPs.
- Meleagris gallopavo