Simple Beats Sophisticated in Early Warning Tools

October 16, 2024

Simple, freely available clinical scores can predict the deterioration of patients in the hospital better than some sophisticated, AI-based models, say researchers who warn of widely varying accuracy among early warning tools.

Although the non-AI and publicly available National Early Warning Score (NEWS) can be calculated without a computer, it performed significantly better than two more statistically advanced AI-based systems.

“The performance differences in these scores were sizable and could affect patient outcomes and resource allocation,” noted the researchers led by Dana Edelson, MD, from the University of Chicago, in the journal JAMA Network Open.

NEWS predictions were notably better than the most widely available of all AI tools, the Epic Deterioration Index (EDI), which was among the worst performing of all scores.

Only the AI-based eCARTv5 (henceforth referred to as eCART) proved more predictive than NEWS in flagging up deteriorating patients, with fewer false alarms and more time to intervene than other tools.

The study assessed the predictive ability of six early warning scores among adults admitted to medical-surgical wards at seven hospital campuses in the Yale New Haven Health System.

Three statistically advanced scores, consisting of eCART, the Rothman Index (RI), and the EDI were compared with three simpler, points-based scores: the NEWS; NEWS2; and Modified Early Warning Score (MEWS).

Scores were applied to nearly 363,000 medical-surgical ward encounters to assess their ability to predict the primary outcome of clinical deterioration, defined as death or transfer from the ward to an intensive care unit within 24 hours of the prediction.

eCART, which was the only model based on machine learning, performed best of all the early scoring systems at identifying clinical deterioration in hospital, with an area under the receiving operating characteristics curve (AUROC) of 0.895.

This was followed by NEWS2 at 0.831 NEWS was 0.829, RI at 0.828, EDI at 0.808, and MEWS at 0.757.

Positive predictive values for eCART were more than 60% higher at the moderate- and high-risk thresholds than the EDI. For NEWS these values were at least 20% higher compared with the EDI.

The high-risk threshold for the EDI had a median lead time of one hour compared with 11 hours for eCART and eight hours for NEWS. There was no threshold where the trade-off between alerts and deterioration detection was favorable for the EDI.

The researchers suggest the superior performance of eCART may lie with its reliance on a gradient-boosting machine framework, which handles interaction and missing variables.

eCART, with its 97 predictors, also includes dozens of additional inputs that likely decrease the chance of false alarms associated with the chronic but stable physiological abnormalities common in hospitalized patients, such as atrial fibrillation and end-stage kidney disease.

“In fact, two of the most heavily weighted variables in eCART, namely the maximum respiratory rate and minimum systolic blood pressure in the prior 24 hours, are not included in any of the other models,” the authors noted.

In a commentary article accompanying the study, Amol Verma, PhD, from the University of Toronto, pointed out that, despite their widespread use, “the evidence base for early warning scores remains surprisingly thin.”

He added that many scores have serious methodological flaws or had not been externally validated, with relatively few scores shared openly.

“There have been few rigorous evaluations of clinical impact, with only a small number of studies showing improved patient outcomes,” he continued.

“Thus, despite their promise, there is still substantial uncertainty about which early warning scores should be used and how they should be implemented.”

Source link