Department of Biostatistics Seminar/Workshop Series
Augmented Weighted Support Vector Machines Covariates
Thomas G. Stewart, PhD Biostatistics
University of North Carolina, Chapel Hill
Support vector machines (SVM) are a popular tool for a wide variety of classification tasks. A key feature of SVMs is the flexibility to generate both linear and non-linear decision rules. This feature is particularly helpful when the relationship between the outcome and the predictor variables is complex, as is frequently the case in biomedical studies. A practical challenge for SVMs, as with many other classification methods, is the common and real-world issue of missing covariates. Currently, many researchers and users of SVMs rely on complete-case or imputation solutions which may introduce bias and lead to reduced classification accuracy. Other approaches are limited to specific missing data scenarios or limited by computational issues. In this presentation, I discuss an EM-motivated solution to the incomplete covariate problem for SVMs. In this method, the hinge-loss for observations with missing covariates is replaced with its quasi-expectation conditional on the observed data and postulated model parameters. Simulations show that the proposed method often yields classification rules with higher accuracy than existing methods. We apply the approach to analyze data from HCV-TARGET, a longitudinal study of Hepatitis C patients.