Coverage maps of SNP chips and their coverage variation across genome

Genome-wide association (GWA) studies rely on commercial SNP genotyping panels, for which a common evaluation criterion has been the global coverage of the genome. However, the level of variation in coverage is also important for evaluation of SNP chips. Here, we provide a detailed coverage map for currently available SNP chips.

Coverage map

This coverage map (Supplementary Figure 1 of our paper, gzipped version, 6.05MB) contains a detailed, high-resolution graph of the local coverage rate of four commercial SNP chips: Affymetrix SNP Array 5.0 (in black) and 6.0 (blue), and Illumina HumanHap300 (red), HumanHap550 (green), HumanHap650Y (cyan), and Human1M (purple). The red bars at the top and bottom indicate the transcription regions of known protein coding genes (based on the knownGene table obtained from the UCSC human genome release hg17)

The variation of coverage for the five SNP chips (global coverage shown as dotted lines):

Raw data for local coverage: CEU, CHB, JPT, YRI. The explanations of the columns are here.

Coverage of known genes

The variation of coverage for known genes with ≥5 HapMap common SNPs in between the transcriptional start and end positions:

Supplementary Table 1 of our paper for gene coverage: CEU, CHB, JPT, YRI. The explanations of the columns are here.

Methodology

For each region, we use the formula of Barrett and Cardon (Nat Genet 2006;38: 659-662) to estimate coverage rate: [L / (R – T) × (G – T) + T] / G, where
  • R: The number of common SNPs in the HapMap
  • T: The number of common SNPs on the SNP chip
  • L: The number of common SNPs not on the SNP chip but are tagged at r2≥.8 by at least one SNP in the chip within 250 kb
  • G: The total number of common SNPs in the region, including those that have already been discovered and those that have yet to be discovered. For a 1 Mb region, the average number of common SNPs is estimated to be about 2,631 based on the estimated numbers of common SNPs (7.5×106) [Barrett and Cardon 2006] and euchromatic base pairs (2.85×109) in the human genome.
  • Note: We recognize that different estimates of G may lead to different values of local coverage rate. However, the above formula can be rewritten as L / (R – T) + [1 – L / (R – T)] × T / G, which indicates that the value of G has little effect on the final estimate as long as the fraction of common SNPs included in the SNP panel, T / G, is small, which is true for the five SNP chips we evaluated.

SNP Array 6.0 and Human1M: Each chip has about 10% SNPs that are not on the HapMap. According to Affymetrix, the SNP Array 6.0 has 934,968 SNPs, but with 99,854 SNPs (10.7%) not on the HapMap, including 72,379 common SNPs for CEU, 76,016 for CHB, 70,356 for JPT, and 83,412 for YRI. According to Illumina, the Human1M has 1,072,820 SNPs, but with 125,688 SNPs (11.7%) not on the HapMap including 70,995 common SNPs for CEU, 67,453 for CHB/JPT, and 77,729 for YRI. Because of this, their genomic coverage may be underestimated if only the HapMap SNPs were considered in coverage calculation.

To address this problem, we calculated an alternative coverage estimate as follows, using the SNP Array 6.0 as an example. Suppose there is an “updated HapMap dataset” that consists of the current HapMap SNPs and the SNPs on the SNP Array 6.0. Based on this “updated data”, we could estimate the number of common SNPs, denoted as R1, and the number of common SNPs on the chip, denoted as T1; for example, for Caucasians, R1 = R + 72,379 and T1 = T + 72,379. However, due to the lack of LD information between the “new” SNPs and the other HapMap SNPs, we do not know how many HapMap SNPs are tagged by these “new” SNPs, therefore L1 cannot be directly estimated. However, if we assume that the number of tagged common SNPs that are not on the chip increases proportionally with the number of common SNPs on the chip, that is, T1 / T = L1 / L, then L1 can be estimated as (T1 / T) × L. Therefore, based on the “updated HapMap data”, we could calculate the genomic coverage of the SNP Array 6.0 as [L1 / (R1 – T1) × (G – T1) + T1] / G.

Associated paper

Li M, Li C, Guan W (2007) Evaluation of coverage variation of SNP chips for genome-wide association studies. European Journal of Human Genetics (in press)

Other papers of relevance

Please contact Chun Li (chun.li@vanderbilt.edu) if you have any questions.

Edit | Attach | Print version | History: r25 | r24 < r23 < r22 < r21 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r23 - 21 Dec 2007, ChunLi
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback