While electronic health records were first intended to support clinical care, financial billing, and insurance claims, these databases are now used for clinical investigations aimed at preventing disease, improving patient care, and informing policymaking. However, both responses and predictors of interest can be captured with errors and their discrepancies correlated. Odds ratios and their standard errors estimated via logistic regression using error-prone data will be biased. A cost-effective solution to a complete data audit is a two-phase design. During Phase I error-prone variables are observed for all subjects, and this information then used to select a Phase II validation subsample. Previous approaches to outcome misclassification using two-phase design data are limited to error-prone categorical predictors and make distributional assumptions about the errors. We propose a semiparametric approach to two-phase designs with a misclassified, binary outcome and categorical or continuous error-prone predictors, allowing for dependent errors and arbitrary second-phase selection. The proposed method is robust because it yields consistent estimates without making assumptions about the predictors error mechanisms. An EM algorithm was devised to maximize the likelihood function. The resulting estimators possess desired statistical properties. Performance is compared to existing approaches through extensive simulation studies and illustrated in an observational HIV study. Sarah C. Lotspeich, Bryan E. Shepherd, Gustavo G. C. Amorim, Pamela A. Shaw, and Ran Tao
I | Attachment | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|
jpeg | 728D9CD5-2CB3-4FF8-8EB6-A9AA9EA603AF-e1536162124786.jpeg | manage | 138 K | 24 Oct 2019 - 11:29 | TawannaPeters | Auto-attached by ImagePlugin |