Biostatistics Weekly Seminar

A network clustering approach to diagnosis codes in electronic medical records for unsupervised learning of disease heterogeneity and patient subgroups

Yaomin Xu, PhD
Vanderbilt University Medical Center

Unsupervised clustering of patients using high dimensional EMR data could improve our understanding of disease heterogeneity and identify new disease subtypes. The international Statistical Classification of Diseases and Related Health Problems (ICD) is the most commonly used categorization of diseases and is routinely recorded in EMR for classifying diagnoses and describing patient visits. We hypothesize that the low-dimensional latent structure of multivariate ICD patterns in patients could provide useful information about patient characteristics and disease heterogeneity. In this talk, I will present a network-based community detection approach for unsupervised learning of the topological structure of patients based on their shared co-occurrence patterns in the ICDs recorded in a large-scale EMR. We aimed at building a statistically principled approach that is highly robust when applied to real world data. We pursued this by following a two-step strategy: (1) We estimated an consensus graph based on an ensemble of stochastic block model estimations according to bipartite, patient-ICD relationships; (2) We then constructed a hierarchical topological structure of the consensus graph using a top-down recursive partitioning. I will demonstrate a functional interpretation of our approach by applying to a genetic study of a cancer driver mutation in MPN patients and illustrate the findings that recapitulate the existing knowledge as well as those are potentially novel. This work is developed by collaborating with several VUMC clinical scientists and geneticists on real data problems. It is still a work in progress and we eagerly look forward to hearing your feedback.

MRBIII, Room 1220
28 August 2019

Speaker Itinerary

Topic revision: r3 - 27 Aug 2019, YaominXu

This site is powered by FoswikiCopyright © 2013-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback