Outcome Dependent Sampling for Longitudinal Data Analysis: Publications and Software

This page provides references to papers and links to software associated with our research related to two-phase, outcome dependent and outcome related sampling designs. Most, but not all of this work has involved longitudinal data. All research below was at least partially supported by our grant Outcome Dependent Sampling for Longitudinal Data: Design and Analysis (R01-HL094786) which was funded by the NHLBI.

People involved: Chiara Di Gravio, MS; Shawn Garbett, MS; Sebastien Haneuse, PhD; Patrick Heagerty, PhD; Jake Maronge, PhD; Lee McDaniel, PhD; Nate Mercaldo, PhD; Peter Mueller, PhD; Paul Rathouz, PhD; Claudia Rivera-Rodriguez, PhD; Sarah Sauer, PhD; Jonathan Schildcrout, PhD; Bryan Shepherd, PhD; Ran Tao, PhD; Leila Zelnick, PhD

Outcome Dependent Sampling for Longitudinal Binary Data

  1. Tao R, Mercaldo ND, Haneuse S, Maronge JM, Rathouz PJ, Heagerty PJ, Schildcrout JS. Two-wave two-phase outcome-dependent sampling designs, with applications to longitudinal binary data. Stat Med. 2021 Apr 15;40(8):1863-1876. doi: 10.1002/sim.8876. Epub 2021 Jan 13. PMID: 33442883; PMCID: PMC8110123. Here

  2. Schildcrout JS, Schisterman EF, Mercaldo ND, Rathouz PJ, Heagerty PJ. Extending the Case-Control Design to Longitudinal Data: Stratified Sampling Based on Repeated Binary Outcomes. Epidemiology. 2018 Jan;29(1):67-75. doi: 10.1097/EDE.0000000000000764. PubMed PMID: 29068838; PubMed Central PMCID: PMC5718932. Here

    ODSCode.R ODSCodeFunctions.R FullCohort.RData: An example of an R script that will conduct ODS analyses using a simulated dataset (FullCohortData.RData). This code contained in ODSCode.R calls ODSCodeFunctions.R and loads FullCohortData.RData in order to conduct analyses described in the manuscript.

  3. Schildcrout JS, Heagerty PJ. Outcome-dependent sampling from existing cohorts with longitudinal binary response data: study planning and analysis. Biometrics. 2011 Dec;67(4):1583-93. doi: 10.1111/j.1541-0420.2011.01582.x. Epub 2011 Apr 2. PubMed PMID: 21457191; PubMed Central PMCID: PMC3134621. Here

  4. Schildcrout JS, Heagerty PJ. On outcome-dependent sampling designs for longitudinal binary response data with time-varying covariates. Biostatistics. 2008 Oct;9(4):735-49. doi: 10.1093/biostatistics/kxn006. Epub 2008 Mar 27. PubMed PMID: 18372397; PubMed Central PMCID: PMC2733177. Here

Outcome Related Sampling for Longitudinal Binary Data

  1. Schildcrout JS, Schisterman EF, Aldrich MC, Rathouz PJ. Outcome-related,Auxiliary Variable Sampling Designs for Longitudinal Binary Data. Epidemiology. 2018 Jan;29(1):58-66. doi: 10.1097/EDE.0000000000000765. PubMed PMID: 29068841; PubMed Central PMCID: PMC5718926. Here

  2. Schildcrout JS, Mumford SL, Chen Z, Heagerty PJ, Rathouz PJ. Outcome-dependent sampling for longitudinal binary response data based on a time-varying auxiliary variable. Stat Med. 2012 Sep 28;31(22):2441-56. doi: 10.1002/sim.4359. Epub 2011 Nov 16. PubMed PMID: 22086716; PubMed Central PMCID: PMC3432177. Here

  3. Schildcrout JS, Rathouz PJ. Longitudinal studies of binary response data following case-control and stratified case-control sampling: design and analysis. Biometrics. 2010 Jun;66(2):365-73. doi: 10.1111/j.1541-0420.2009.01306.x. Epub 2009 Aug 10. PubMed PMID: 19673861; PubMed Central PMCID: PMC3051172. Here

Outcome Dependent Sampling for Longitudinal Quantitative Data

  1. Di Gravio C, Tao R, Schildcrout JS. Design and analysis of two-phase studies with multivariate longitudinal data. Biometrics. 2022 Jan 11. doi: 10.1111/biom.13616. Epub ahead of print. PMID: 35014029. Here

  2. Schildcrout JS, Haneuse S, Tao R, Zelnick LR, Schisterman EF, Garbett SP, Mercaldo ND, Rathouz PJ, Heagerty PJ. Two-Phase, Generalized Case-Control Designs for the Study of Quantitative Longitudinal Outcomes. Am J Epidemiol. 2020 Feb 28;189(2):81-90. doi: 10.1093/aje/kwz127. PMID: 31165875; PMCID: PMC7298772. Here

  3. Zelnick LR, Schildcrout JS, Heagerty PJ. Likelihood-based analysis of outcome-dependent sampling designs with longitudinal data. Stat Med. 2018 Jun 15;37(13):2120-2133. doi: 10.1002/sim.7633. Epub 2018 Mar 15. PubMED PMID: 29542170. Here

  4. Schildcrout JS, Rathouz PJ, Zelnick LR, Garbett SP, Heagerty PJ. BIASED SAMPLING DESIGNS TO IMPROVE RESEARCH EFFICIENCY: FACTORS INFLUENCING PULMONARY FUNCTION OVER TIME IN CHILDREN WITH ASTHMA. Ann Appl Stat. 2015 Jun;9(2):731-753. PubMed PMID: 26322147; PubMed Central PMCID: PMC4551501. Here

  5. Schildcrout JS, Garbett SP, Heagerty PJ. Outcome vector dependent sampling with longitudinal continuous response data: stratified sampling based on summary statistics. Biometrics. 2013 Jun;69(2):405-16. doi: 10.1111/biom.12013. Epub 2013 Feb 14. PubMed PMID: 23409789; PubMed Central PMCID: PMC3880022. Here

Outcome Dependent Sampling for Cluster Correlated Data

  1. Sauer S, Hedt-Gauthier B, Haneuse S. Optimal allocation in stratified cluster-based outcome-dependent sampling designs. Stat Med. 2021 Aug 15;40(18):4090-4107. doi: 10.1002/sim.9016. Epub 2021 Jun 2. PMID: 34076912. Here

  2. Sauer S, Hedt-Gauthier B, Rivera-Rodriguez C, Haneuse S. Small-sample inference for cluster-based outcome-dependent sampling schemes in resource-limited settings: Investigating low birthweight in Rwanda. Biometrics. 2021 Jan 14:10.1111/biom.13423. doi: 10.1111/biom.13423. Epub ahead of print. PMID: 33444459; PMCID: PMC8277876. Here

  3. Rivera-Rodriguez C, Spiegelman D, Haneuse S. On the analysis of two-phase designs in cluster-correlated data settings. Stat Med. 2019 Oct 15;38(23):4611-4624. doi: 10.1002/sim.8321. Epub 2019 Jul 29. PMID: 31359448; PMCID: PMC6736737. Here

  4. Rivera-Rodriguez C, Haneuse S, Wang M, Spiegelman D. Augmented pseudo- likelihood estimation for two-phase studies. Stat Methods Med Res. 2020 Feb;29(2):344-358. doi: 10.1177/0962280219833415. Epub 2019 Mar 5. PMID: 30834815; PMCID: PMC7659466. Here

  5. Haneuse S, Rivera-Rodriguez C. On the Analysis of Case-Control Studies in Cluster-correlated Data Settings. Epidemiology. 2018 Jan;29(1):50-57. doi: 10.1097/EDE.0000000000000763. PubMed PMID: 29068840; PubMed Central PMCID: PMC5718962 Here

Two-phase Designs for Validation Studies with Measurement Error

  1. Tao R, Lotspeich SC, Amorim G, Shaw PA, Shepherd BE. Efficient semiparametric inference for two-phase studies with outcome and covariate measurement errors. Stat Med. 2021 Feb 10;40(3):725-738. doi: 10.1002/sim.8799. Epub 2020 Nov 3. PMID: 33145800; PMCID: PMC8214478. Here

  2. Lotspeich SC, Shepherd BE, Amorim GGC, Shaw PA, Tao R. Efficient odds ratio estimation under two-phase sampling using error-prone data from a multi-national HIV research cohort. Biometrics. 2021 Jul 2. doi: 10.1111/biom.13512. Epub ahead of print. PMID: 34213008. Here

Applications of Efficient Study Design / Survey Sampling Methods

  1. Mitani AA, Mercaldo ND, Haneuse S, Schildcrout JS. Survey design and analysis considerations when utilizing misclassified sampling strata. BMC Med Res Methodol. 2021 Jul 11;21(1):145. doi: 10.1186/s12874-021-01332-8. PMID: 34247586; PMCID: PMC8273975.

  2. Mercaldo ND, Brothers KB, Carrell DS, Clayton EW, Connolly JJ, Holm IA, Horowitz CR, Jarvik GP, Kitchner TE, Li R, McCarty CA, McCormick JB, McManus VD, Myers MF, Pankratz JJ, Shrubsole MJ, Smith ME, Stallings SC, Williams JL, Schildcrout JS. Enrichment sampling for a multi-site patient survey using electronic health records and census data. J Am Med Inform Assoc. 2018 Dec 27. doi: 10.1093/jamia/ocy164. [Epub ahead of print] PubMed PMID: 30590688. Here

  3. Shi Y, Graves JA, Garbett SP, Zhou Z, Marathi R, Wang X, Harrell FE, Lasko TA, Denny JC, Roden DM, Peterson JF, Schildcrout JS. A Decision-Theoretic Approach to Panel-Based, Preemptive Genotyping. MDM Policy Pract. 2019 Aug 17;4(2):2381468319864337. doi: 10.1177/2381468319864337. PMID: 31453360; PMCID: PMC6699004. Here

  4. Schildcrout JS, Shi Y, Danciu I, Bowton E, Field JR, Pulley JM, Basford MA, Gregg W, Cowan JD, Harrell FE Jr, Roden DM, Peterson JF, Denny JC. A prognostic model based on readily available clinical data enriched a pre-emptive pharmacogenetic testing program. J Clin Epidemiol. 2016 Apr;72:107-15. doi: 10.1016/j.jclinepi.2015.08.028. Epub 2015 Nov 25. PubMed PMID: 26628336; PubMed Central PMCID: PMC4779720. Here

ODS for Generalized Linear Models

  1. Tao R, Zeng D, Lin DY. Optimal Designs of Two-Phase Studies. J Am Stat Assoc. 2020;115(532):1946-1959. doi: 10.1080/01621459.2019.1671200. Epub 2019 Oct 29. PMID: 33716361; PMCID: PMC7954143.

ODS for Semiparametric Generalized Linear Models

  1. Maronge JM, Tao R, Schildcrout JS, Rathouz PJ. Generalized case-control sampling under generalized linear models. Biometrics. 2021 Sep 29. doi: 10.1111/biom.13571. Epub ahead of print. PMID: 34586638.

Semiparametric Generalized Linear Models

  1. Wurm MJ, Rathouz PJ. Semiparametric Generalized Linear Models with the gldrm Package. R J. 2018 Jul;10(1):288-307. PMID: 30873295; PMCID: PMC6414059.

  2. Huang A, Rathouz PJ. Orthogonality of the Mean and Error Distribution in Generalized Linear Models. Commun Stat Theory Methods. 2017;46(7):3290-3296. doi: 10.1080/03610926.2013.851241. Epub 2016 Nov 17. PMID: 28435181; PMCID: PMC5396964.

  3. Huang A, Rathouz PJ. Proportional likelihood ratio models for mean regression. Biometrika. 2012 Mar;99(1):223-229. doi: 10.1093/biomet/asr075. PMID: 24421412; PMCID: PMC3888642.

Marginalized Models for Binary and Ordinal Response Data

  1. Schildcrout JS, Harrell FE Jr, Heagerty PJ, Haneuse S, Di Gravio C, Garbett SP, Rathouz PJ, Shepherd BE. Model-assisted analyses of longitudinal, ordinal outcomes with absorbing states. Stat Med. 2022 Jun 30;41(14):2497-2512. doi: 10.1002/sim.9366. Epub 2022 Mar 7. PMID: 35253265; PMCID: PMC9232888.

Related Work

  1. !McDaniel, Lee S., Nicholas C. Henderson, and Paul J. Rathouz. "Fast Pure R Implementation of GEE: Application of the Matrix Package." R JOURNAL 5.1 (2013): 181-187. Here


  1. Lee S. McDaniel Jonathan S. Schildcrout, and Paul J. Rathouz. Generalized linear models under biased sampling designs: A sequential offsetted regression approach.

Links to Software

Topic attachments
I Attachment Action Size Date Who Comment
FullCohort.RDataRData FullCohort.RData manage 87.4 K 16 Aug 2017 - 17:30 JonathanSchildcrout Full cohort data that is used by ODSCode.R to run analyses
ODSCode.RR ODSCode.R manage 3.7 K 16 Aug 2017 - 17:24 JonathanSchildcrout Example to conduct an ODS analysis using a simulated dataset. This code runs the full cohort model with maximum likeliheood and the ODS sample using ACML, WL, MI
ODSCodeFunctions.RR ODSCodeFunctions.R manage 15.7 K 16 Aug 2017 - 17:28 JonathanSchildcrout Functions called by ODSCode.R to run analyses
Topic revision: r37 - 19 Oct 2022, JonathanSchildcrout

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback