This web page will be used to post information and software packages of statistical methods for data validation and audits. Our basic idea is to use information learned from an audit/validation subsample and to apply this to the larger dataset, most of which will not be validated. These methods are particularly useful when one has access to lots of data of uncertain quality (e.g., electronic medical records), but cannot afford to validate all records.
Our initial work developed methods for multiple linear regression. See papers and R code below. We have put in a proposal to the NIH to develop additional methods and tools. If this proposal is funded, we will add additional papers, packages, and web applications to this site.
Papers/Code:
Shepherd BE, Yu C (2011). Accounting for data errors discovered from an audit in multiple linear regression.
Biometrics 67: 1083-1091.
code;
paper;
supplementary material. The published version can be found at
Biometrics
Shepherd BE, Shaw PA, Dodd LE. Using audit information to adjust parameter estimates for data errors in clinical trials.
Clinical Trials 2012; 9: 721-729.