Taking a network view of EHR and Biobank data to find explainable multivariate patterns
Nick Strayer, PhD Vanderbilt University Medical Center
Extracting informative and meaningful results from EHR and Biobank data is an important task in genomic data science. Up to now statistically principled methods that can find robust and explainable multivariate patterns in these data have yet to take hold. Existing analysis frameworks have difficulty dealing with both the high-dimensionality of such data and also how to communicate the results in an intuitive and actionable way. In this talk I will go over some of the recent work we have done with PheWAS data that views it as a bipartite network of individual-feature relationships instead of the traditional observation and variable view; in addition we will show new visualization tools we've developed that allow collaborators to interact with these networks to interrogate genotype-phenotype associations. These methods have already been successfully deployed in real-world collaboration scenarios; furthermore, simulations of the theoretical properties of the methods have shown they are consistent and powerful in extracting multivariate patterns. By merging network-science and statistics these methods have the potential to produce more efficient and intuitive analysis tools for dealing with complex EHR and biobank data.