Pneumonia accounts for many deaths in children aged under 5 years in developing countries. A reliable and generalizable tool to predict mortality and thus assess the severity of pneumonia would aid patient management.
We used a dataset of 11,012 children admitted with clinical pneumonia to develop a model to predict mortality. Using a High Performance Computing platform, we generated multiple models for all possible feature combinations, applying support vector machine, neural networks, random forests and logistic regression to 2/3 of the dataset with repeated cross-validation (5 repetitions, 10 folds). We chose the final model based on its performance and on the number of and measurement reliability of the included features to increase generalizability. In the validation stage, we applied the selected model to the held-out dataset to test its performance on unseen cases.
Not only did the selected model have good sensitivity and specificity (both >80%) on the training set, but more importantly, it had promising performance when applied to the test set.
Our predictive model performed well not only in cross-validated data, but also in our test dataset, increasing our confidence in its generalizability.