2 Illustrative Example: Predicting Risk of Ischemic Stroke
As a primer to feature engineering, an abbreviated example is presented with a modeling process similar to the one shown in Figure 1.4. For the purpose of illustration, this example will focus on exploration, analysis fit, and feature engineering, through the lens of a single model (logistic regression).
To illustrate the value of feature engineering for enhancing model performance, consider the application of trying to better predict patient risk for ischemic stroke (Schoepf et al., n.d.). Historically, the degree arterial stenosis (blockage) has been used to identify patients who are at risk for stroke (Lian et al. 2012). To reduce the risk of stroke, patients with sufficient blockage (> 70%) are generally recommended for surgical intervention to remove the blockage (Levinson and Rodriguez 1998). However, historical evidence suggests that the degree of blockage alone is actually a poor predictor of future stroke (Meier et al. 2010). This is likely due to the theory that while blockages may be of the same size, the composition of the plaque blockage is also relevant to the risk of stroke outcome. Plaques that are large, yet stable and unlikely to be disrupted may pose less stroke risk than plaques that are smaller, yet less stable.
To study this hypothesis, a historical set of patients with a range of carotid artery blockages were selected. The data consisted of 126 patients, 44 of which had blockages greater than 70%. All patients had undergone Computed Tomography Angiography (CTA) to generate a detailed three-dimensional visualization and characterization of the blockage. These images were then analyzed by Elucid Bioimaging’s vascuCAP (TM) software which generates anatomic structure estimates such as percent stenosis, arterial wall thickness, and tissue characteristics such as lipid-rich necrotic core and calcification. As an example, consider Figure 2.1 (a) which represents a carotid artery with severe stenosis as represented by the tiny tube-like opening running through the middle of the artery. Using the image, the software can calculate the maximal cross-sectional stenosis by area (MaxStenosisByArea) and by diameter (MaxStenosisByDiameter). In addition, The gray area in this figure represents macromolecules (such as collagen, elastin, glycoproteins, and proteoglycans) that provide structural support to the arterial wall. This structural region can be quantified by its area (MATXArea). Figure 2.1 (b) illustrates an artery with severe stenosis and calcified plaque (green) and lipid-rich necrotic core (yellow). Both plaque and lipid-rich necrotic core are thought to contribute to stroke risk, and these regions can be quantified by their volume (CALCVol and LRNCVol) and maximal cross-sectional area (MaxCALCArea and MaxLRNCArea). The artery presented in 2.1 (c) displays severe stenosis and outward arterial wall growth. The top arrow depicts the cross-section of greatest stenosis (MaxStenosisByDiameter) and the bottom arrow depicts the cross-section of greatest positive wall remodeling (MaxRemodelingRatio). Remodeling ratio is a measure of the arterial wall where ratios less than 1 indicate a wall shrinkage and ratios greater than 1 indicate wall growth. This metric is likely important because coronary arteries with large ratios like the one displayed here have been associated with rupture (Cilla et al. 2013; Abdeldayem et al. 2015). The MaxRemodelingRatio metric captures the region of maximum ratio (between the two white arrows) in the three-dimensional artery image. A number of other imaging predictors are generated based on physiologically meaningful characterizations of the artery.
The group of patients in this study also had follow-up information on whether or not a stroke occurred at a subsequent point in time. The association between blockage categorization and stroke outcome is provided in Table 2.1. For these patients, the association is not statistically significant based on a chi-squared test of association (p = 0.42), indicating that blockage categorization alone is likely not a good predictor of stroke outcome.
Stroke=No | Stroke=Yes | |
---|---|---|
Blockage < 70% | 43 | 39 |
Blockage > 70% | 19 | 25 |
If plaque characteristics are indeed important for assessing stroke risk, then measurements of the plaque characteristics provided by the imaging software could help to improve stroke prediction. By improving the ability to predict stroke, physicians may have more actionable data to make better patient management or clinical intervention decisions. Specifically, patients who have large (blockages > 70%), but stable plaques may not need surgical intervention, saving themselves from an invasive procedure while continuing on medical therapy. Alternatively, patients with smaller, but less stable plaques may indeed need surgical intervention or more aggressive medical therapy to reduce stroke risk.
The data for each patient also included commonly collected clinical characteristics for risk of stroke such as whether or not the patient had atrial fibrillation, coronary artery disease, and a history of smoking. Demographics of gender and age were included as well. These readily available risk factors can be thought of as another potentially useful predictor set that can be evaluated. In fact, this set of predictors should be evaluated first to assess their ability to predict stroke since these predictors are easy to collect, are acquired at patient presentation, and do not require an expensive imaging technique.
To assess each set’s predictive ability, we will train models using the risk predictors, imaging predictors, and combinations of the two. We will also explore other representations of these features to extract beneficial predictive information.