11.5 Summary
Identifying the subset of features that produces an optimal model is often a goal of the modeling process. This is especially true when the data consists of a vast number of predictors.
A simple approach to identifying potentially predictively important features is to evaluate each feature individually. Statistical summaries such as a t-statistic, odds ratio, or correlation coefficient can be computed (depending on the type of predictor and type of outcome). While these values are not directly comparable, each can be converted to a p-value to enable a comparison across multiple types. To avoid finding false positive associations, a resample approach should be used in the search.
Simple filters are ideal for finding individual predictors. However, this approach does not take into account the impact of multiple features together. This can lead to a selection of more features than necessary to achieve optimal predictive performance. Recursive feature elimination is an approach that can be combined with any model to identify a good subset of features with optimal performance for the model of interest. The primary drawback of RFE is that it requires that the initial model be able to fit the entire set of predictors.
The stepwise selection approach has been a popular feature selection technique. However, this technique has some well-known drawbacks when using p-values as a criteria for adding or removing features from a model. Instead, alternative metrics should be used that do not have the same issues as p-values. Anther drawback to the stepwise approach is that it cannot be used with many modern predictive models.