Tentative Schedule
Date and Time: 03/13/08 (Thursday) 4:15 - 6:45
Location: 8th Floor, 2525 West End Ave. Nashville.
NOTE: In actuality, there is no 26th Street. From West End Ave...
  • Turn onto Natchez Trace (At the McDonalds)
  • On the left you'll see a Marriott Hotel. Turn into that parking lot.
  • The parking GARAGE you want is straight ahead between the 2525 West End Ave building and the Marriott
  • Drive to the 3rd floor and find the crosswalk to 2525 West End Ave
  • Go straight through the doors and take the elevators to the 8th floor. You'll see us

  • 4:15 - 5:15 Snacks, soft drinks, and socializing in the Centennial Park Lounge
  • 5:15 - 6:00 Updates, discussions, and planning
  • 6:00 - 6:45 Bonnie Dastidar-Ghosh PhD, RAND Corporation

Combining probability and convenience samples: for minimum-MSE estimation in a rare population: An application to families with young children.

Bonnie Dastidar-Ghosh PhD
RAND Corporation

In the world of telephone surveys, the gold standard is the Random digit dial (RDD) methodology to produce representative samples of the underlying population. RDD can be a very costly methodology for a rare population, but much less-costly listed or web-based samples are not representative when used by themselves and can induce bias. In practice, we might supplement an optimal probability sample with a non-representative convenience sample (such as inbound web hits or listed sample) to get a reasonable number of surveys while containing costs. In this present study to measure the prevalence of child care and the quality of pre-schooling in California, we had to survey households with a 3- or 4- year old. Due to the low prevalence rate of this group and the lack of a natural sampling frame, we could only afford a RDD sample of 1000 households. Therefore, we added on a listed (convenience) sample of another 1000 households for a relatively small additional cost. These phone records were purchased from a marketing firm that collects information on households.

Now the question is how to combine these two samples to produce meaningful summaries of the data. There are two ways to combine the estimates derived from the RDD and the listed sample. The traditional approach treats the listed sample as a sub-set or stratum within the larger population, and combines the two estimates using a stratified estimator. In this case, the listed sample is typically down-weighted heavily, contributing little to the combined estimate and incurring a large design effect that lead to large standard errors. We apply an alternate method developed by Elliott and Haviland (2007) to produce biased but minimum mean squared error (MSE) prevalence estimates from two independent samples, one of which produces unbiased estimates from a complete frame (RDD) and one of which produces biased estimates from an incomplete frame (inbound web hits or a listed sample). The amount of bias in the convenience sample is estimated relative to the RDD estimate (the gold standard), and the web or listed sample contributes differentially to different parameter estimates as a function of estimated bias. We discuss the extent to which this new approach improves the MSE of estimates relative to RDD and to the traditional stratified estimate.

Further, we explored the quality of these listed samples by measuring the degree of agreement between the listed sample characteristics and the actual self-reports for the same households from the completed surveys. We provide some guidance for using list samples and the potential for Type 1 and Type 2 errors. We also discuss the cost-effectiveness of the combined sample relative to the RDD sample.
Topic revision: r4 - 26 Apr 2013, JohnBock

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback