Methods for Data Audits and Measurement Error

I do a lot of research with observational HIV databases, where we are doing analyses with secondary-use data (e.g., electronic health record data). We have audited/validated some of these data and have found high error rates for some key study variables. This line of research uses the information learned from data audits to correct otherwise biased estimates. We have developed new statistical methods that extend the measurement error literature to handle situations where both covariates and outcomes are measured with error, and the magnitude of these errors are correlated. My colleague, Pam Shaw, and I received R01 funding from the NIH (R01AI131771) and a methods grant from PCORI (R-1609-36207) to further develop this research. We have been studying methods to address errors in dependent variables, with a particular emphasis on time-to-event analyses. We have also been studying optimal sampling designs for data validation. We have a great team from Vanderbilt, the University of Pennsylvania, and the University of Auckland who are participating in this research and super colleagues in the Tennessee Center for AIDS Research, CCASAnet, and the IeDEA network with whom we have been collaborating and applying these methods. We have a team webpage (linked to Pam's webpage), and we have a Github webpage where some of our code can be found. Here are a few of our papers:

Shepherd BE, Yu C (2011). Accounting for data errors discovered from an audit in multiple linear regression. Biometrics 67: 1083-1091. code; paper; supplementary material. The published version can be found at Biometrics

Shepherd BE, Shaw PA, Dodd LE (2012). Using audit information to adjust parameter estimates for data errors in clinical trials. Clinical Trials 9: 721-729. paper, Supplementary Material, code. The published version can be found at Clinical Trials.

Shepherd BE, Shaw PA. Errors in multiple variables in HIV cohort and electronic health record data: statistical challenges and opportunities. Statistical Communications in Infectious Diseases 2020; 12: 20190015. paper

Giganti MJ, Shaw PA, Chen G, Bebawy SS, Turner MM, Sterling TR, Shepherd BE. Accounting for dependent errors in predictors and time-to-event outcomes using electronic health record, validation samples, and multiple imputation. Annals of Applied Statistics 2020; 14: 1045-1061. paper; code

Giganti MJ, Shepherd BE. Multiple imputation variance estimation in studies with missing or misclassified inclusion criteria. American Journal of Epidemiology 2020; 189: 1628-1632. paper; code in web material

Shaw PA, He J, Shepherd BE. Regression calibration to correct correlated errors in outcome and exposure. Statistics in Medicine 2021; 40: 271-286. paper

Tao R, Lotspeich SC, Amorim G, Shaw PA, Shepherd BE. Efficient semiparametric inference for two-phase studies with outcome and covariate measurement errors. Statistics in Medicine 2021; 40: 725-738. paper; code

Oh EJ, Shepherd BE, Lumley T, Shaw PA. Raking and regression calibration: methods to address bias from correlated covariate and time-to-event error. Statistics in Medicine 2021; 40: 631-649. paper; code

Han K, Lumley T, Shepherd BE, Shaw PA. Two-phase analysis and study design for survival models with error-prone exposures. Statistical Methods in Medical Research 2021; 30: 857-874. paper; code

Oh EJ, Shepherd BE, Lumley T, Shaw PA. Improved generalized raking estimators to address dependent covariate and failure-time outcome error. Biometrical Journal 2021; 63: 1006-1027. paper

Amorim G, Tao R, Lotspeich S, Shaw PA, Lumley T, Shepherd BE. Two-phase sampling designs for data validation in settings with covariate measurement error and continuous outcome. Journal of the Royal Statistical Society, Series A 2021; 184: 1368-1389. paper; code

Lotspeich S, Shepherd BE, Amorim G, Shaw PA, Tao R. Efficient odds ratio estimation under two-phase sampling using error-prone data from a multi-national HIV research cohort. Biometrics 2022; 78: 1674--1685. paper; code

Lotspeich S, Amorim G, Shaw PA, Tao R, Shepherd BE. Optimal multi-wave validation of secondary use data with outcome and exposure misclassificationEfficient odds ratio estimation under two-phase sampling using error-prone data from a multi-national HIV research cohort. Canadian Journal of Statistics (in press)

Shepherd BE, Han K, Chen T, Bian A, Pugh SK, Duda SN, Lumley T, Heerman WJ, Shaw PA. Multi-wave validation sampling for error-prone electronic health records. Biometrics (in press). paper; code

Topic revision: r4 - 27 Jun 2023, BryanShepherd
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback