Biostatistics Weekly Seminar

Targeting underrepresented populations in precision medicine through multi-source data integration

Tian Gu, PhD
Postdoctoral fellow
Department of Biostatistics
Harvard T.H. Chan School of Public Health

The increasing numbers of large-scale biobanks and institutional data networks have brought unique opportunities to link patientsí genomics, electronic health records, and survey data for studying complex human diseases, especially to address the diminished model performance in minority and disadvantaged groups due to their low representation in biomedical research. In this talk, I will introduce two statistical learning methods targeting underrepresented populations by integrating data from multiple biobanks, different ancestries, and related health outcomes. These methods protect data privacy by learning from pre-trained models in external data sources without sharing patient-level data and account for potential data heterogeneity. We provide theoretical guarantees for the model performance and insights regarding when the external model can be helpful to the target model. We demonstrate the superiority of our methods compared to benchmark methods, with examples using data from the UK biobank and the electronic Medical Records and Genomics (eMERGE) Network.

Virtual: Zoom Link to Follow
11 January 2023

Speaker Itinerary

Topic revision: r1 - 09 Jan 2023, CierraStreeter

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback