Targeting underrepresented populations in precision medicine through multi-source data integration
Tian Gu, PhD Postdoctoral fellow Department of Biostatistics Harvard T.H. Chan School of Public Health
The increasing numbers of large-scale biobanks and institutional data networks have brought unique opportunities to link patients’ genomics, electronic health records, and survey data for studying complex human diseases, especially to address the diminished model performance in minority and disadvantaged groups due to their low representation in biomedical research. In this talk, I will introduce two statistical learning methods targeting underrepresented populations by integrating data from multiple biobanks, different ancestries, and related health outcomes. These methods protect data privacy by learning from pre-trained models in external data sources without sharing patient-level data and account for potential data heterogeneity. We provide theoretical guarantees for the model performance and insights regarding when the external model can be helpful to the target model. We demonstrate the superiority of our methods compared to benchmark methods, with examples using data from the UK biobank and the electronic Medical Records and Genomics (eMERGE) Network.
Virtual: Zoom Link to Follow 11 January 2023 1:30pm