Measurement error in compositional data and the replicability of microbiome studies
Amy Willis, PhD Assistant Professor of BiostatisticsSchool of Public Health, University of Washington
The composition of bacterial taxa in a microbiome is an important parameter to estimate given the critical role that microbiomes play in human and environmental health. However, high throughput sequencing distorts the true composition of microbial communities. Sequencing mock communities -- artificially constructed microbiomes of known composition -- clearly illustrates that observed composition is a biased estimate of true composition, with certain taxa consistently overobserved or underobserved compared to their true relative abundance. We propose a statistical model for microbiome data that reflects this observation and illustrate its performance and usage in a variety of settings. We conclude with recommendations for the design and analysis of microbiome studies.