In the world of telephone surveys, the gold standard is the Random digit dial (RDD) methodology to produce representative samples of the underlying population. RDD can be a very costly methodology for a rare population, but much less-costly listed or web-based samples are not representative when used by themselves and can induce bias. In practice, we might supplement an optimal probability sample with a non-representative convenience sample (such as inbound web hits or listed sample) to get a reasonable number of surveys while containing costs. In this present study to measure the prevalence of child care and the quality of pre-schooling in California, we had to survey households with a 3- or 4- year old. Due to the low prevalence rate of this group and the lack of a natural sampling frame, we could only afford a RDD sample of 1000 households. Therefore, we added on a listed (convenience) sample of another 1000 households for a relatively small additional cost. These phone records were purchased from a marketing firm that collects information on households.
Now the question is how to combine these two samples to produce meaningful summaries of the data. There are two ways to combine the estimates derived from the RDD and the listed sample. The traditional approach treats the listed sample as a sub-set or stratum within the larger population, and combines the two estimates using a stratified estimator. In this case, the listed sample is typically down-weighted heavily, contributing little to the combined estimate and incurring a large design effect that lead to large standard errors. We apply an alternate method developed by Elliott and Haviland (2007) to produce biased but minimum mean squared error (MSE) prevalence estimates from two independent samples, one of which produces unbiased estimates from a complete frame (RDD) and one of which produces biased estimates from an incomplete frame (inbound web hits or a listed sample). The amount of bias in the convenience sample is estimated relative to the RDD estimate (the gold standard), and the web or listed sample contributes differentially to different parameter estimates as a function of estimated bias. We discuss the extent to which this new approach improves the MSE of estimates relative to RDD and to the traditional stratified estimate.
Further, we explored the quality of these listed samples by measuring the degree of agreement between the listed sample characteristics and the actual self-reports for the same households from the completed surveys. We provide some guidance for using list samples and the potential for Type 1 and Type 2 errors. We also discuss the cost-effectiveness of the combined sample relative to the RDD sample.