Bayesian Modeling of Latent Heterogeneity in Complex Survey Data and Electronic Health Records
Rebecca Anthopolos, PhD Columbia University
In longitudinal data analysis, latent heterogeneity can manifest as unobserved groups of individuals with distinctive health trajectories. For such data, growth mixture models (GMMs) allow identifying multiple, unobserved subpopulations and estimating group-specific average growth trajectories and individual-specific random effects. Despite the utility of GMMs, their application is underdeveloped in two types of data often encountered in public health research, namely, complex survey data and electronic health records (EHRs). Complex survey data are generated from a designed study, but do not represent a simple random sample. Valid inferences with complex survey data require accounting for features of the complex sample design, including unequal probability sampling, stratification, and clustering. In contrast, EHRs represent a convenience sample and are well-known to contain high levels of missing values. Analysis with EHRs requires assumptions about the missing data mechanisms. In a Bayesian framework, we propose methods for applying GMMs in 1) complex survey data accounting for complex sample design, and 2) EHRs accounting for different assumptions about missing data mechanisms.