You are here:
Vanderbilt Biostatistics Wiki
>
Main Web
>
TWikiUsers
>
ChunLi
>
SNPChipCoverage
(18 Jan 2008,
ChunLi
)
(raw view)
E
dit
A
ttach
---+++ Coverage maps of SNP chips and their coverage variation across genome Genome-wide association (GWA) studies rely on commercial SNP genotyping panels, for which a common evaluation criterion has been the global coverage of the genome. However, the level of variation in coverage is also important for evaluation of SNP chips. Here, we provide a detailed coverage map for currently available SNP chips. ---++++ Coverage map This [[%ATTACHURL%/CoverageMap.pdf.gz][coverage map]] (*Supplementary Figure 1* of our paper, gzipped version, 6.05MB) contains a detailed, high-resolution graph of the local coverage rate of four commercial SNP chips: [[http://www.affymetrix.com][Affymetrix]] SNP Array 5.0 (in black) and 6.0 (blue), and [[http://www.illumina.com][Illumina]] <nop>HumanHap300 (red), <nop>HumanHap550 (green), <nop>HumanHap650Y (cyan), and Human1M (purple). The red bars at the top and bottom indicate the transcription regions of known protein coding genes (based on the [[http://hgdownload.cse.ucsc.edu/goldenPath/hg17/database/knownGene.txt.gz][knownGene table]] obtained from the UCSC human genome [[http://hgdownload.cse.ucsc.edu/goldenPath/hg17/database/][release hg17]]) The variation of coverage for the five SNP chips (global coverage shown as dotted lines): [[%ATTACHURL%/coveragevariation.jpg][<img src="%ATTACHURLPATH%/coveragevariation.jpg" width="600" />]] *Raw data* for local coverage: [[%ATTACHURL%/CEUcoveragemap.txt.gz][CEU]], [[%ATTACHURL%/CHBcoveragemap.txt.gz][CHB]], [[%ATTACHURL%/JPTcoveragemap.txt.gz][JPT]], [[%ATTACHURL%/YRIcoveragemap.txt.gz][YRI]]. The explanations of the columns are [[%ATTACHURL%/coveragemap_keys.txt][here]]. ---++++ Coverage of known genes The variation of coverage for known genes with ≥5 <nop>HapMap common SNPs in between the transcriptional start and end positions: [[%ATTACHURL%/genecoverage.jpg][<img src="%ATTACHURLPATH%/genecoverage.jpg" width="600" />]] *Supplementary Table 1* of our paper for gene coverage, in which gene regions are defined between transcriptional start and end points (inclusive): [[%ATTACHURL%/CEUgenecoverage.txt.gz][CEU]], [[%ATTACHURL%/CHBgenecoverage.txt.gz][CHB]], [[%ATTACHURL%/JPTgenecoverage.txt.gz][JPT]], [[%ATTACHURL%/YRIgenecoverage.txt.gz][YRI]]. The explanations of the columns are [[%ATTACHURL%/genecoverage_keys.txt][here]]. Gene coverage results when 5kb is added to both end of a gene: [[%ATTACHURL%/CEUgenecoverage.pm5k.txt.gz][CEU.pm5k]], [[%ATTACHURL%/CHBgenecoverage.pm5k.txt.gz][CHB.pm5k]], [[%ATTACHURL%/JPTgenecoverage.pm5k.txt.gz][JPT.pm5k]], [[%ATTACHURL%/YRIgenecoverage.pm5k.txt.gz][YRI.pm5k]]. Gene coverage results when 10kb is added to both end of a gene: [[%ATTACHURL%/CEUgenecoverage.pm10k.txt.gz][CEU.pm10k]], [[%ATTACHURL%/CHBgenecoverage.pm10k.txt.gz][CHB.pm10k]], [[%ATTACHURL%/JPTgenecoverage.pm10k.txt.gz][JPT.pm10k]], [[%ATTACHURL%/YRIgenecoverage.pm10k.txt.gz][YRI.pm10k]]. *Note:* For short genes, the results may change dramatically across the above three definitions of gene regions. ---++++ Methodology For each region, we use the formula of Barrett and Cardon (Nat Genet 2006;38: 659-662) to estimate coverage rate: [L / (R – T) × (G – T) + T] / G, where * *R*: The number of common SNPs in the <nop>HapMap * *T*: The number of common SNPs on the SNP chip * *L*: The number of common SNPs not on the SNP chip but are tagged at r<sup>2</sup>≥.8 by at least one SNP in the chip within 250 kb * *G*: The total number of common SNPs in the region, including those that have already been discovered and those that have yet to be discovered. For a 1 Mb region, the average number of common SNPs is estimated to be about 2,631 based on the estimated numbers of common SNPs (7.5×10<sup>6</sup>) [Barrett and Cardon 2006] and euchromatic base pairs (2.85×10<sup>9</sup>) in the human genome. * Note: We recognize that different estimates of G may lead to different values of local coverage rate. However, the above formula can be rewritten as L / (R – T) + [1 – L / (R – T)] × T / G, which indicates that the value of G has little effect on the final estimate as long as the fraction of common SNPs included in the SNP panel, T / G, is small, which is true for the five SNP chips we evaluated. *SNP Array 6.0 and Human1M*: Each chip has about 10% SNPs that are not on the <nop>HapMap. According to Affymetrix, the SNP Array 6.0 has 934,968 SNPs, but with 99,854 SNPs (10.7%) not on the <nop>HapMap, including 72,379 common SNPs for CEU, 76,016 for CHB, 70,356 for JPT, and 83,412 for YRI. According to Illumina, the Human1M has 1,072,820 SNPs, but with 125,688 SNPs (11.7%) not on the <nop>HapMap including 70,995 common SNPs for CEU, 67,453 for CHB/JPT, and 77,729 for YRI. Because of this, their genomic coverage may be underestimated if only the <nop>HapMap SNPs were considered in coverage calculation. To address this problem, we calculated an alternative coverage estimate as follows, using the SNP Array 6.0 as an example. Suppose there is an “updated <nop>HapMap dataset” that consists of the current <nop>HapMap SNPs and the SNPs on the SNP Array 6.0. Based on this “updated data”, we could estimate the number of common SNPs, denoted as R<sub>1</sub>, and the number of common SNPs on the chip, denoted as T<sub>1</sub>; for example, for Caucasians, R<sub>1</sub> = R + 72,379 and T<sub>1</sub> = T + 72,379. However, due to the lack of LD information between the “new” SNPs and the other <nop>HapMap SNPs, we do not know how many <nop>HapMap SNPs are tagged by these “new” SNPs, therefore L<sub>1</sub> cannot be directly estimated. However, if we assume that the number of tagged common SNPs that are not on the chip increases proportionally with the number of common SNPs on the chip, that is, T<sub>1</sub> / T = L<sub>1</sub> / L, then L<sub>1</sub> can be estimated as (T<sub>1</sub> / T) × L. Therefore, based on the “updated <nop>HapMap data”, we could calculate the genomic coverage of the SNP Array 6.0 as [L<sub>1</sub> / (R<sub>1</sub> – T<sub>1</sub>) × (G – T<sub>1</sub>) + T<sub>1</sub>] / G. ---++++ Associated paper Li M, Li C, Guan W (2008) Evaluation of coverage variation of SNP chips for genome-wide association studies. European Journal of Human Genetics (in press) ---++++ Other papers of relevance * [[SNPChipCostEfficiency][Cost efficiency of SNP chips]]: Comparison of cost efficiency of SNP chips * [[GWAsimulator][GWAsimulator]]: A rapid whole genome simulation program * [[http://content.karger.com/ProdukteDB/produkte.asp?Aktion=ShowAbstract&ArtikelNr=109730&Ausgabe=234102&ProduktNr=224250][Prioritized Subset Analysis]] Please contact Chun Li (chun.li@vanderbilt.edu) if you have any questions. <!-- * Set ALLOWTOPICCHANGE = ChunLi * Set TOPICLAYOUTURL = /twiki/pub/Main/ChunLi/ChunLiLayout.css * Set TOPICSTYLEURL = /twiki/pub/Main/ChunLi/ChunLiStyle.css -->
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r25
<
r24
<
r23
<
r22
|
B
acklinks
|
V
iew topic
|
Edit
w
iki text
|
M
ore topic actions
Topic revision: r25 - 18 Jan 2008,
ChunLi
Main
Department Home Page
Biostatistics Graduate Program
Vanderbilt University Medical Center
Main Web
Main Web Home
Search
Recent Changes
Changes
Topic list
Biostatistics Webs
Archive
Main
Sandbox
System
Register
|
Log In
Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki?
Send feedback