Two radiologists independently reading mammograms for signs of cancer is good, but two radiologists plus an artificial intelligence (AI) system is even better, even when rolled out nationwide.
An AI system called Vara MG increased breast cancer detection rates (BCDR) while maintaining recall rates in a standard practice called double mammography readings, when two radiologists independently examine the same mammograms for signs of cancer. Vara MG was tested in the largest study of its kind, with over 460,000 women and 119 radiologists across 12 sites over nearly two years through the German Mammography Screening Programme, a population-based program launched in 2005 that invites women ages 50–69 to get a mammogram every two years.
This study, published in Nature Medicine, highlights the potential of AI to enhance breast cancer examination metrics in a nationwide mammography screening initiative and alleviate the radiologist shortages, possibly leading to a new standard that does not require double readings with two human radiologists. The significance of these findings is underscored by the reality that, while numerous AI systems have exhibited robust performance in simulations utilizing historical screening data, there is a paucity of prospective studies assessing the real-world impact of AI in clinical practice, which are essential to ascertain the safe and efficient translation of AI into clinical settings.
An AI safety net for mammographies
The PRAIM Study (PRospective multicenter observational study of an integrated artificial intelligence (AI) system with live Monitoring), led by Alexander Katalinic from the University of Lübec, evaluated whether Vara MG supported radiologists based on two of the system’s features called “normal triaging” and “safety net.” In “normal triaging,” Vara MG selects a subset of all examinations deemed highly unsuspicious by the AI model, tagging them as ‘normal’ in the worklist. For “safety net,” Vara MG selects a subset of all examinations deemed highly suspicious by the AI model and is activated when there is a conflicting interpretation by the radiologists, prompting the radiologist to review their decision and either accept or reject the safety net’s suggestion.
Integration of Vara MG resulted in a 17.6% higher BCDR compared to the control group, with one additional cancer detected per 1,000 women screened. AI-supported screening showed higher BCDRs across different subgroups, including screening rounds, breast densities, and age groups. Vara MG slightly improved the recall rate, lowering it by -2.5%, though not statistically significant.
The PRAIM study also evaluated the performance of Vara MG in a fictitious scenario in which the screening examinations triaged as normal by AI were not read by radiologists. Instead, after an AI prediction of “normal,” the examination received the final classification “normal.” Consequently, it is improbable that any breast cancer indicators overlooked by AI were identified by the radiologists, resulting in a recall or a cancer diagnosis. In this simulation, Vara MG achieved a 56.7% reduction in reading workload, a 16.7% enhancement in BCDR, and a 15.0% decline in recall rate.
While future follow-up studies are required to evaluate the long-term impact of AI on interval cancer rates and stage distribution, the results addressed potential concerns about overdiagnosis, particularly with the increased detection of ductal carcinoma in situ (DCIS). They substantially contribute to the expanding body of evidence indicating that AI-supported mammography screening is feasible and safe and can reduce workload, with significant policy implications. The evidence regarding breast cancer detection, recall rates, positive predictive values of biopsy, and time savings can be used to support the broad adoption of AI in mammography screening programs and to incorporate AI-supported mammography into screening guidelines.