Weighted pairwise likelihood for multilevel models in multistage samples
Thomas Lumley, PhD University of Auckland
It is surprisingly challenging to fit mixed models to data from complex samples, because the log likelihood is not just a sum over observations, and because sampling interacts confusingly with the bias: variance tradeoffs of BLUPs. Simply modeling the design variables is possible, but requires that the design variables are available (often untrue in public-use data) and that they are legitimate predictor variables (eg, not in the causal pathway). The design-based approach in use at the moment requires that the sampling units in the design are the same as the clusters in the model, which is a strong restriction. I will describe an approach that allows arbitrary models and designs, and an R implementation that currently allows multilevel linear models and multistage samples. The approach is based on composite likelihood, generalizing an idea by JNK Rao and co-workers. Our motivating example, family genetic models in the Hispanic Community Health Survey, is still not accessible but is within sight.