A unified approach for inference on algorithm-agnostic variable importance
Brian Williamson, PhD Fred Hutchinson Cancer Research Center
Assessing the relative contribution of subsets of features in predicting the response is often of interest in predictive modeling applications. The variable importance measure used is commonly determined by the prediction technique employed, creating a tradeoff: restrictive assumptions are often necessary for valid statistical inference on the true importance. Rather than considering importance as a summary of a specific prediction algorithm, it is useful to consider variable importance as a summary of the true data-generating mechanism. In this talk, I will focus on a notion of variable importance that captures the best-case predictive potential attributable to one variable or a set of variables. I will discuss general conditions under which a simple estimator of this importance is nonparametric efficient and valid statistical inference on the true importance can be obtained, even when flexible machine learning-based techniques are used as part of the estimation strategy. Finally, I will illustrate the use of the proposed framework with data from a study of an antibody against HIV-1 infection.