USING MACHINE LEARNING MODELS TO DETERMINE FEATURE IMPORTANCE BY AGE FOR STROKE RISK PREDICTION

Session Type
Scientific Communication
Date
Wed, 01.09.2021
Session Time
17:15 - 18:45
Room
Hall I
Lecture Time
18:08 - 18:16
Presenter
  • Elizabeth A. Hunter (Ireland)

Abstract

Background And Aims

Models to predict stroke risk typically look at the whole population. We examine differences in feature importance by age-group.

Methods

We train machine learning models (logistic regression, random forest) using the risk factor datasets from Framingham-Cohort and Framingham-Offspring data from the NHLBI for four age-groups (under-50, 50-59, 60-69, over-70) with six features (sex, systolic blood pressure, diabetes, BMI, blood-pressure treatment, smoking). We sample two controls for each stroke and use 20% of the data for testing and 80% for training.

Results

Across the age-groups, logistic regression achieved AUC, F1, and Spiegelhalter's p-values of {0.75,0.67,0.51}, {0.81, 0.36, 0.78}, {0.73, 0.36,0.20}, {0.68, 0.22, 0.53}, and random forest {0.69,0.25,0.67}, {0.61,0.46,0.93}, {0.69, 0.45,0.05}, {0.57, 0.40, 0.16}. The F1 scores are low and AUC and p-values are high suggesting good calibration, but a suboptimal threshold.

In all models, systolic blood pressure is a top feature. The other top features are: Logistic regression under-50 {smoking, diabetes, sex}, 50-59 {blood-pressure treatment, smoking, diabetes}, 60-69 {sex, smoking, blood-pressure treatment}, 70+ {BMI, diabetes, blood-pressure treatment}. Random forest under-50 {BMI, smoking, sex}, 50-59 {BMI, blood-pressure treatment, smoking}, 60-69 {BMI, sex, smoking}, 70+ {BMI, diabetes, sex}.

Conclusions

Our results show there are differences in feature importance by age that should be considered in predicting stroke risk.

This work was supported by: PRECISE4Q Predictive Modelling in Stroke project funded from the EU’s Horizon 2020 research and innovation programme grant agreement No. 777107; ADAPT Research Centre, funded under the SFI Research Centres Programme (Grant 13/ RC/2106) co-funded under the European Regional Development Funds.

Hide