Biostatistics Weekly Seminar

Bridging the gap between noisy healthcare data and knowledge: causality and portability

Xu Shi, PhD
Harvard School of Medicine

Routinely collected healthcare data present numerous opportunities for biomedical research but also come with unique challenges. For example, critical issues such as data quality, unmeasured and mismeasured confounding, high-dimensional covariates, and patient privacy concerns naturally arise. In this talk, I present tailored causal inference methods and automated data quality control pipeline that aim to overcome these challenges and make the transition from data to knowledge. I detail the challenge of inconsistent “languages” used by different healthcare systems and coding systems. In particular, different healthcare providers may use alternative medical codes to record the same diagnosis or procedure, limiting the transportability of phenotyping algorithms and statistical models across healthcare systems. I formulate the idea of medical code translation into a statistical problem of inferring a mapping between two sets of multivariate, unit-length vectors learned from two healthcare systems, respectively. The statistical problem is particularly interesting because the training data is corrupted by a fraction of mismatch in the response-predictor pairs, whereas classical regression analysis tacitly assumes that the response and predictor are correctly linked. I propose a novel method for mapping recovery and establish theoretical guarantees for estimation and model selection consistency.

6 February 2019

Speaker Itinerary

Topic revision: r2 - 08 Jan 2019, TawannaPeters

This site is powered by FoswikiCopyright © 2013-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback