Novel Nonparametric Random-Forests-based and Semiparametric Models for Between-subject
Attributes: Applications to Survey and Observational Studies and Beyond
Tuo Lin, PhD candidate Division of Biostatistics and Bioinformatics University of California, San Diego
In modern statistical analysis, how to effectively analyze and interpret data with outliers is a challenging problem. Classical mean-based methods such as t-test and generalized linear model (GLM) generally yield uninterpretable results when applied to such data. Rank-based methods such as the Mann-Whitney-Wilcoxon Rank Sum Test (MWWRST) are effective alternatives. We have been developing a new regression paradigm of modeling “between-subject attributes” to allow the MWWRST for broader applications such as in survey data and causal inference as well as to model high-dimensional data such as microbiome beta diversity. These newly developed methods have wide applications in survey, observational and longitudinal studies.
In this talk, I will first introduce the distinction between sampling weights in survey studies and “probability weights” in observational and longitudinal studies with missing data. Next, I will elucidate semiparametric methods of modeling between-subject attributes in survey studies along with a study of their efficiency. As machine learning methods have become unprecedently popular in the Big Data Era and motivated by a recent seminal work of Wager and Athey (2018) on asymptotic properties of random forests, I will end my talk by presenting our latest extension of random forests to between-subject attributes for non-parametric regression analysis, with an illustrative example for causal inference using the MWWRST.
Hybrid: Meeting Room and Zoom Link to Follow 20 January 2023 1:30pm