Department of Biostatistics Seminar/Workshop Series
Selecting SNPs to Correctly Predict Ethnicity
Joshua Sampson, PhD
Post-Doctoral Fellow, Department of Biostatistics
Yale University
Wednesday, April 1, 1:30-2:30pm, MRBIII Conference Room 1220
Intended Audience: Persons interested in applied statistics, statistical theory, epidemiology, health services research, clinical trials methodology, statistical computing, statistical graphics, R users or potential users
Background: An individual's genotype at a group of Single Nucleotide
Polymorphisms (SNPs) can be used to correctly predict that individual's
ethnicity, or ancestry. In medical studies, knowledge of a subject's ethnicity
can eliminate possible confounding, and in forensic applications, such
knowledge can help direct investigations. In these cases, genotyping is often
performed for the explicit purpose of identifying ancestry and the prediction
rule, mapping genotype to ancestry, needs to be based on previously collected
information.
Results: There are two goals:
1) Given the Human Genome Diversity Project
(HGDP), a database with genotypes for 100's of individuals from 54 populations, and a specific set of SNPs, select a prediction rule that minimizes the expected error rate.
2) Design a method for selecting a set of N SNPs that minimizes the
error rate for the chosen prediction rule. Both goals have been previously
addressed. Here, we offer ways to improve the currently available methods, and
greatly increase the accuracy in predicting ancestry. As both goals require
good estimates of population specific allele frequencies, we show how to use a
known phylogenetic tree to improve these estimates. Furthermore, we
introduce a new method for estimating the error rate. We demonstrate the performance of
these methods on both simulated data and the HGDP data.
Authors: Joshua Sampson, Kenneth K. Kidd, Judith R. Kidd, Hongyu Zhao
CV